Applications

AI services behind a clean HTTP API.

If you don't want to operate a GPU yourself, use the hosted services. We run the models on our hardware; you call an endpoint and pay for what comes back.

Available services

What you can use today

Inference

Qwen-3 inference

OpenAI-compatible chat-completions API in front of Qwen-3, hosted on H200s with B200 capacity for the larger variants. Use it as a drop-in for general-purpose chat, retrieval-augmented generation, agentic tool use, or code assistance. We keep up with the current open-weights frontier; new models roll in behind the same endpoint.

128k context Tool use / JSON mode Streaming OpenAI-compatible
Pricing: AUD $0.50 / 1M input tokens · AUD $1.50 / 1M output tokens.
Use cases: code assistants, document Q&A over a private corpus, customer-service automation, internal chat tools.
Training

Fine-tuning service

Upload a JSONL of training pairs; we run a LoRA fine-tune on a base model of your choice and hand back a hosted endpoint. Suitable for domain-specific text generation, classification, or task-specific reasoning where the base model is close but not quite right.

LoRA · QLoRA Qwen / Llama / Mistral bases Private data stays in NSW
Pricing: from AUD $180 per job (7–14B base, single-epoch LoRA on ≤ 100MB of training data). Larger jobs scoped per request.
Hosted endpoint: AUD $1.95/hour on RTX PRO 6000 Blackwell up to AUD $4.20/hour on H200.
Roadmap

Coming next

Embeddings API (Qwen-based, multilingual), batch-inference for large jobs at a discount, image generation behind the same key. Tell us what you'd actually use — we prioritise on demand, not on guesses.

Use cases

How people are using it

A Newcastle dev shop adds an AI code helper

Sample request to Qwen-3 inference from inside the IDE plugin. Cost on the order of cents per developer per day; no data sent to a US cloud.

→ Code sample

A research group fine-tunes on private clinical notes

Training data stays inside our facility; the resulting model is reachable only via the group's API key. Inference billed against a reserved RTX PRO 6000 Blackwell endpoint.

→ Discuss the engagement

A government desk runs a citizen-services chatbot

Qwen-3 inference behind a thin router. Data residency requirement is satisfied because all traffic stays at Mayfield West. Token volume capped per session.

→ Talk to us

A consultancy embeds a Q&A interface for clients

RAG pipeline backed by the inference API. Documents indexed once, retrieved per query, reasoned over by Qwen-3. Cost reported back to clients line-by-line.

→ Sample API call

Quickstart

Call the inference API

OpenAI-compatible at https://api.compute.newcastlerising.com.au/v1. Swap the base URL and the API key and most existing client libraries work unchanged.

curl https://api.compute.newcastlerising.com.au/v1/chat/completions \
  -H "Authorization: Bearer $NC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-3-coder",
    "messages": [
      {"role": "user", "content": "Write a Python function that returns prime numbers up to n."}
    ],
    "stream": false
  }'

Available models:

  • qwen-3-instruct — general purpose, long context
  • qwen-3-coder — tuned for code generation and tool use
  • qwen-3-small — cheaper, faster, smaller
  • your-org/your-fine-tune — if you've trained one with us