The latest open AI, running today.
Every model below is live on our hardware right now. New open-weights releases are typically online within 48 hours of public availability. All inference runs in Mayfield West — your data stays in Australia.
Chat & reasoning
OpenAI-compatible chat completions endpoints. Streaming, tool use, structured output. Swap the base URL and your existing OpenAI client works.
Qwen-3 235B Instruct
Flagship Qwen-3 mixture-of-experts. 235B parameters, 22B active per token. 128k context. Strong at reasoning, multilingual, instruction following. Hosted on H200.
qwen-3-235b-instruct · $0.60/M in · $2.20/M out
DeepSeek-V3.1
671B parameter MoE with 37B active. Very strong at code, mathematics, and long-form reasoning. 128k context. Hosted across multi-node H200.
deepseek-v3.1 · $0.55/M in · $2.00/M out
Qwen-3 72B Instruct
Dense 72B model. Lower latency, lower cost than the MoE variants. Good default for most chat and RAG workloads. Hosted on H100.
qwen-3-72b-instruct · $0.40/M in · $1.20/M out
Llama 3.3 70B Instruct
Meta's mainstream open-weights flagship. Familiar interface for teams already on Llama. Hosted on H100. Excellent at general tasks.
llama-3.3-70b-instruct · $0.40/M in · $1.20/M out
Mistral Large 2
Mistral's open-weights frontier model. 123B parameters. Strong multilingual and function-calling performance. Hosted on H200.
mistral-large-2 · $0.50/M in · $1.80/M out
Qwen-3 14B Instruct
Smaller variant. Faster, cheaper, still capable for many workloads. Hosted on RTX PRO 6000 Blackwell. Sub-100ms first-token latency typical.
qwen-3-14b-instruct · $0.18/M in · $0.55/M out
Code
Specialised for code generation, completion, and tool use. Used inside IDE plugins, code review automation, and agentic coding workflows.
Qwen-3 Coder 480B
Frontier open-weights coding model. MoE with 35B active. Excels at agentic coding, repo-scale reasoning, and long-context refactors. Hosted on H200.
qwen-3-coder-480b · $0.70/M in · $2.40/M out
Qwen-3 Coder 32B
Smaller coder, dense 32B. Lower latency, cheaper, well-suited to IDE inline completion. Hosted on RTX PRO 6000 Blackwell.
qwen-3-coder-32b · $0.25/M in · $0.80/M out
Vision & multimodal
Qwen-3 VL
Vision-language model. Document understanding, chart and table reading, screenshot QA, video frame analysis. Accepts images via URL or base64. Hosted on H100.
qwen-3-vl · $0.45/M in · $1.40/M out · $0.001 per image
Embeddings
Qwen-3 Embed (large)
8192-dimension embeddings. Strong on retrieval, semantic search, clustering. Multilingual with explicit Australian English handling.
qwen-3-embed-large · $0.04/M tokens
Qwen-3 Embed (base)
1024-dimension embeddings. Cheaper, faster, good default for most RAG workloads.
qwen-3-embed-base · $0.015/M tokens
What's next on the rack
We add models on customer demand and on technical merit. If you need one that isn't here, ask — most adds take days, not weeks.
Qwen-4
Expected next major Qwen release. We'll deploy the instruct + coder variants within 48 hours of weights publishing.
Llama 4
Meta's next open-weights flagship. Same-day deployment when it lands.
DeepSeek Coder V3
Currently being benchmarked on H200. Expected live within the next 7 days.
Why "deployed within 48 hours" is the actual product
Most clouds take weeks to add a new open-weights model. AWS Bedrock waits for partnership negotiations. Azure runs at the speed of enterprise contracts. Even other neoclouds are infrastructure-first — they leave the model layer to you.
We do it the other way around. The infrastructure exists to make the software fast. When a new model ships, the only friction is benchmarking it, getting an inference engine to serve it well, and turning the endpoint on. That's hours, not months. By the time AWS Bedrock has whatever shipped this week, you'll have been running it on us for two months.
Combined with the AU jurisdiction story — your data stays in Mayfield West — that's a position no hyperscaler can copy.
Your own fine-tune, same API
Upload a JSONL of training pairs; we run a LoRA fine-tune on any of the base models above and return a hosted endpoint behind your API key. Same OpenAI-compatible interface, same per-token billing, same residency commitments.
Training data stays at Mayfield West. We do not retain it after the run completes unless you ask us to. The resulting model is reachable only by your organisation's key.
From request to endpoint
- Submit JSONL via API
- Job runs (typically 30 min – 6 hrs)
- Hosted endpoint at your-org/your-model
- From AUD $180/job + endpoint hours