Wizard · Free
Pick the right LLM tier for your workload — frontier API, mid-tier, or self-hosted — based on data sensitivity, task complexity, volume, latency, fine-tuning needs, and operational appetite. Output: a recommendation with model candidates and explicit tradeoffs.
0 of 6 answered
0%
Question 1
Question 2
Question 3
Question 4
Question 5
Question 6
How it works
We don’t gate the tool behind a form. Take the assessment; share your email at the end if you want a written report.
Data sensitivity, task complexity, workload volume, latency requirement, fine-tuning needs, and operational appetite. Each option weights the three tiers.
Three-tier breakdown: frontier API / mid-tier / self-hosted. The headline recommendation is the highest-scoring tier; the breakdown shows how close the alternatives are.
Each tier has specific model candidates we'd evaluate in real engagements (Claude / GPT / Gemini frontier, Haiku / mini / Flash mid-tier, Llama / Mistral self-hosted). Tradeoffs flag what the tier doesn't solve.
Real model selection lands after running candidates against your actual workload on a curated eval dataset. The wizard gives you the directional starting point; we'll do the calibrated evaluation.
Common questions
Models change every quarter. Tier-level recommendations survive model upgrades; specific model recommendations don't. Within each tier we name candidates we'd evaluate — but the choice between Claude Haiku and GPT mini lands after running them against your real workload, not from a wizard.
Common — and usually the right answer is model tiering. Use mid-tier for routine work, frontier for the hard cases, with explicit routing logic. Most production AI deployments we ship use 2 or 3 tiers in tandem; the wizard's strongest signal is the tier you should anchor on.
No. Self-hosted economics start beating frontier-API pricing around 100M tokens/month and beating mid-tier API pricing around 1B tokens/month — assuming you have an existing GPU platform team. If you don't, the operational cost (engineering, monitoring, on-call) typically erases the per-token savings until volume gets very high.
Most enterprise AI doesn't benefit from fine-tuning. Frontier API + strong RAG covers ~85% of enterprise workloads. Fine-tuning is the right answer for narrow tasks with consistent input/output shape, latency-sensitive deployments where small models with fine-tuning beat frontier latency, or sovereignty cases where self-hosted is mandatory and fine-tuning compensates for smaller base capability.
BAA / DPA chain matters. AWS Bedrock under HIPAA-eligible accounts and Azure OpenAI under BAA are production options for ePHI workloads. For data sovereignty mandates (regulated jurisdictions, classified workloads, air-gapped environments), self-hosted is typically the only path. The wizard accounts for these constraints; bring your specific requirements to a discovery call for verification.
Curated eval datasets (built with your domain experts), faithfulness and refusal scoring, latency and cost-per-task tracking, and equity-aware subgroup evaluation where applicable. We run candidates head-to-head on your real workload before recommending a final choice — not against generic benchmarks. Most engagements end with a model decision that surprised at least one stakeholder.
More tools
15-question scored report across data, infrastructure, talent, governance, and use cases.
Open the toolProject the financial return of a proposed AI use case — labor savings, error reduction, payback period.
Open the toolScore a legacy system for replatform vs rearchitect vs replace. With phased rollout recommendations.
Open the toolTalk to us
Bring your scored report to a 30-minute call. Senior engineer plus department lead. No discovery gauntlet, no junior reps.