AI services behind a clean HTTP API.
If you don't want to operate a GPU yourself, use the hosted services. We run the models on our hardware; you call an endpoint and pay for what comes back.
What you can use today
Qwen-3 inference
OpenAI-compatible chat-completions API in front of Qwen-3, hosted on H200s with B200 capacity for the larger variants. Use it as a drop-in for general-purpose chat, retrieval-augmented generation, agentic tool use, or code assistance. We keep up with the current open-weights frontier; new models roll in behind the same endpoint.
Use cases: code assistants, document Q&A over a private corpus, customer-service automation, internal chat tools.
Fine-tuning service
Upload a JSONL of training pairs; we run a LoRA fine-tune on a base model of your choice and hand back a hosted endpoint. Suitable for domain-specific text generation, classification, or task-specific reasoning where the base model is close but not quite right.
Hosted endpoint: AUD $1.95/hour on RTX PRO 6000 Blackwell up to AUD $4.20/hour on H200.
Coming next
Embeddings API (Qwen-based, multilingual), batch-inference for large jobs at a discount, image generation behind the same key. Tell us what you'd actually use — we prioritise on demand, not on guesses.
How people are using it
A Newcastle dev shop adds an AI code helper
Sample request to Qwen-3 inference from inside the IDE plugin. Cost on the order of cents per developer per day; no data sent to a US cloud.
A research group fine-tunes on private clinical notes
Training data stays inside our facility; the resulting model is reachable only via the group's API key. Inference billed against a reserved RTX PRO 6000 Blackwell endpoint.
A government desk runs a citizen-services chatbot
Qwen-3 inference behind a thin router. Data residency requirement is satisfied because all traffic stays at Mayfield West. Token volume capped per session.
A consultancy embeds a Q&A interface for clients
RAG pipeline backed by the inference API. Documents indexed once, retrieved per query, reasoned over by Qwen-3. Cost reported back to clients line-by-line.
Call the inference API
OpenAI-compatible at https://api.compute.newcastlerising.com.au/v1. Swap the base URL and the API key and most existing client libraries work unchanged.
curl https://api.compute.newcastlerising.com.au/v1/chat/completions \
-H "Authorization: Bearer $NC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-3-coder",
"messages": [
{"role": "user", "content": "Write a Python function that returns prime numbers up to n."}
],
"stream": false
}'
Available models:
qwen-3-instruct— general purpose, long contextqwen-3-coder— tuned for code generation and tool useqwen-3-small— cheaper, faster, smalleryour-org/your-fine-tune— if you've trained one with us