MODEL INTELLIGENCE

Enterprise Model Market Map. Hosted Platforms vs Open Weight

A pragmatic lens on procurement, security, latency, cost, and governance tradeoffs. What changes matter right now, which platforms are production ready, and when to self host vs use hosted APIs.

Updated

January 2025: reflects latest model releases and pricing

Coverage

OpenAI, Anthropic, Google, Meta, Mistral, Cohere + deployment tradeoffs

Use this for

Vendor selection, cost modeling, compliance planning, PoC scoping

Platform Overview

Major platforms as of January 2025, with models, strengths, and procurement readiness.

OpenAI

Hosted API

Enterprise-Ready

Key Models

GPT-4oGPT-4o-minio1o1-mini

Strengths

Market-leading reasoning and coding (o1)
Mature API, extensive tooling ecosystem
Azure OpenAI for enterprise compliance (BAA, FedRAMP)

Considerations

Data retention policies (check Azure vs API)
Rate limits and quota management for production
Cost optimization via prompt caching and batch API

Anthropic

Hosted API

Enterprise-Ready

Key Models

Claude 3.5 SonnetClaude 3 OpusClaude 3 Haiku

Strengths

200K context window for long-document analysis
Strong refusal training, excellent for safety-critical apps
AWS Bedrock integration for compliance

Considerations

Token limits on output (4K-8K max per response)
Pricing: Opus expensive for high-volume workloads
AWS Bedrock required for FedRAMP/HIPAA

Google (Gemini)

Hosted API

Enterprise-Ready

Key Models

Gemini 1.5 ProGemini 1.5 FlashGemini 2.0

Strengths

Multimodal (text, image, video, audio) in single API
1M+ token context window for massive documents
Vertex AI for GCP-native enterprise deployments

Considerations

Gemini 2.0 (experimental) - check production readiness
GCP Vertex AI pricing differs from public API
Data residency and compliance via Vertex AI

Meta (Llama)

Open-Weight

Self-Serve

Key Models

Llama 3.3 (70B)Llama 3.1 (8B, 70B, 405B)Llama Guard

Strengths

Run on-prem or in your VPC (full data control)
No per-token API costs; fixed infrastructure cost
Llama Guard for content moderation, safety filtering

Considerations

Requires ML infrastructure (GPU clusters, inference servers)
Model fine-tuning expertise for domain-specific tasks
Latency and throughput tuning vs hosted APIs

Mistral AI

Hybrid

Mixed

Key Models

Mistral LargeMistral 7BMixtral 8x7B

Strengths

Hosted API + open-weight models (flexibility)
Strong multilingual support (French, Spanish, etc.)
Azure integration for enterprise compliance

Considerations

Smaller context windows (32K-128K)
Less mature ecosystem vs OpenAI/Anthropic
Open-weight models require self-hosting expertise

Cohere

Hosted API

Enterprise-Ready

Key Models

Command R+Command REmbed v3

Strengths

Enterprise RAG focus (retrieval, reranking, embeddings)
Multi-turn conversational agents with citations
Deployment flexibility (API, AWS, GCP, Azure, on-prem)

Considerations

Best for RAG use cases; less general-purpose
Pricing model differs from token-based APIs
Check model availability in your preferred cloud

Tradeoff Matrix: Hosted API vs Open-Weight

Key decision dimensions for selecting deployment approach. No universal answer; tradeoffs depend on your constraints.

Data Control

Hosted API

Data sent to vendor (check DPA, BAA, retention policies)

Open-Weight

Full control; run in your VPC or on-prem

Recommendation

Use open weight for PII/PHI; hosted APIs for non sensitive use cases

Cost Structure

Hosted API

Per-token pricing (scales with usage, unpredictable at high volume)

Open-Weight

Fixed infra cost (GPU clusters, ML engineers, but predictable)

Recommendation

Hosted APIs for MVP/low volume; open weight at scale (1M+ requests/day)

Latency

Hosted API

Network RTT + queue time (100-500ms typical)

Open-Weight

Controlled by your infra (can optimize to <50ms p95)

Recommendation

Hosted APIs for async workflows; open weight for real time UX

Compliance

Hosted API

Vendor compliance (SOC 2, FedRAMP, HIPAA via cloud integrations)

Open-Weight

Your compliance posture; vendor only provides model weights

Recommendation

Use cloud integrated APIs (Azure OpenAI, AWS Bedrock) for regulated industries

Model Updates

Hosted API

Automatic updates (breaking changes possible, version pinning recommended)

Open-Weight

You control versions (manual updates, but stable)

Recommendation

Pin versions in production; test updates in staging first

Want a vendor selection tailored to your use case?

We deliver model selection guidance customized to your requirements: cost envelope, latency targets, data residency, compliance needs, and existing cloud contracts. Output includes vendor shortlist, cost modeling, PoC plan, and procurement ready RFP criteria.

Request selection guidance Explore more insights