Models — Newcastle Compute

Live now

Chat & reasoning

OpenAI-compatible chat completions endpoints. Streaming, tool use, structured output. Swap the base URL and your existing OpenAI client works.

Frontier · added 4 days ago

Qwen-3 235B Instruct

Flagship Qwen-3 mixture-of-experts. 235B parameters, 22B active per token. 128k context. Strong at reasoning, multilingual, instruction following. Hosted on H200.

128k context Tool use Multilingual

qwen-3-235b-instruct · $0.60/M in · $2.20/M out

Frontier · added 6 days ago

DeepSeek-V3.1

671B parameter MoE with 37B active. Very strong at code, mathematics, and long-form reasoning. 128k context. Hosted across multi-node H200.

128k context Reasoning Math

deepseek-v3.1 · $0.55/M in · $2.00/M out

Workhorse · added 11 days ago

Qwen-3 72B Instruct

Dense 72B model. Lower latency, lower cost than the MoE variants. Good default for most chat and RAG workloads. Hosted on H100.

128k context Tool use

qwen-3-72b-instruct · $0.40/M in · $1.20/M out

Workhorse · added 11 days ago

Llama 3.3 70B Instruct

Meta's mainstream open-weights flagship. Familiar interface for teams already on Llama. Hosted on H100. Excellent at general tasks.

128k context Tool use

llama-3.3-70b-instruct · $0.40/M in · $1.20/M out

Workhorse · added 11 days ago

Mistral Large 2

Mistral's open-weights frontier model. 123B parameters. Strong multilingual and function-calling performance. Hosted on H200.

128k context Tool use Multilingual

mistral-large-2 · $0.50/M in · $1.80/M out

Small & fast · added 11 days ago

Qwen-3 14B Instruct

Smaller variant. Faster, cheaper, still capable for many workloads. Hosted on RTX PRO 6000 Blackwell. Sub-100ms first-token latency typical.

32k context Low latency

qwen-3-14b-instruct · $0.18/M in · $0.55/M out

Live now

Code

Specialised for code generation, completion, and tool use. Used inside IDE plugins, code review automation, and agentic coding workflows.

Flagship · added 5 days ago

Qwen-3 Coder 480B

Frontier open-weights coding model. MoE with 35B active. Excels at agentic coding, repo-scale reasoning, and long-context refactors. Hosted on H200.

256k context Tool use Agentic

qwen-3-coder-480b · $0.70/M in · $2.40/M out

Fast · added 11 days ago

Qwen-3 Coder 32B

Smaller coder, dense 32B. Lower latency, cheaper, well-suited to IDE inline completion. Hosted on RTX PRO 6000 Blackwell.

128k context FIM

qwen-3-coder-32b · $0.25/M in · $0.80/M out

Live now

Vision & multimodal

Multimodal · added 9 days ago

Qwen-3 VL

Vision-language model. Document understanding, chart and table reading, screenshot QA, video frame analysis. Accepts images via URL or base64. Hosted on H100.

Image input Documents OCR

qwen-3-vl · $0.45/M in · $1.40/M out · $0.001 per image

Live now

Embeddings

Qwen-3 Embed (large)

8192-dimension embeddings. Strong on retrieval, semantic search, clustering. Multilingual with explicit Australian English handling.

qwen-3-embed-large · $0.04/M tokens

Qwen-3 Embed (base)

1024-dimension embeddings. Cheaper, faster, good default for most RAG workloads.

qwen-3-embed-base · $0.015/M tokens

Coming soon

What's next on the rack

We add models on customer demand and on technical merit. If you need one that isn't here, ask — most adds take days, not weeks.

Watching · eta weeks

Qwen-4

Expected next major Qwen release. We'll deploy the instruct + coder variants within 48 hours of weights publishing.

Watching · eta weeks

Llama 4

Meta's next open-weights flagship. Same-day deployment when it lands.

In testing · eta this week

DeepSeek Coder V3

Currently being benchmarked on H200. Expected live within the next 7 days.

How this works

Why "deployed within 48 hours" is the actual product

Most clouds take weeks to add a new open-weights model. AWS Bedrock waits for partnership negotiations. Azure runs at the speed of enterprise contracts. Even other neoclouds are infrastructure-first — they leave the model layer to you.

We do it the other way around. The infrastructure exists to make the software fast. When a new model ships, the only friction is benchmarking it, getting an inference engine to serve it well, and turning the endpoint on. That's hours, not months. By the time AWS Bedrock has whatever shipped this week, you'll have been running it on us for two months.

Combined with the AU jurisdiction story — your data stays in Mayfield West — that's a position no hyperscaler can copy.

Need a model that isn't here? Tell us. If it's open-weights and runs on our hardware, the lead time is usually a few days from request to live endpoint.

Custom

Your own fine-tune, same API

Upload a JSONL of training pairs; we run a LoRA fine-tune on any of the base models above and return a hosted endpoint behind your API key. Same OpenAI-compatible interface, same per-token billing, same residency commitments.

Training data stays at Mayfield West. We do not retain it after the run completes unless you ask us to. The resulting model is reachable only by your organisation's key.

→ Fine-tuning service details

From request to endpoint

Submit JSONL via API
Job runs (typically 30 min – 6 hrs)
Hosted endpoint at your-org/your-model
From AUD $180/job + endpoint hours

The latest open AI, running today.

Chat & reasoning

Qwen-3 235B Instruct

DeepSeek-V3.1

Qwen-3 72B Instruct

Llama 3.3 70B Instruct

Mistral Large 2

Qwen-3 14B Instruct

Code

Qwen-3 Coder 480B

Qwen-3 Coder 32B

Vision & multimodal

Qwen-3 VL

Embeddings

Qwen-3 Embed (large)

Qwen-3 Embed (base)

What's next on the rack

Qwen-4

Llama 4

DeepSeek Coder V3

Why "deployed within 48 hours" is the actual product

Your own fine-tune, same API

From request to endpoint