AWS Bedrock vs OpenAI: A Practical Comparison

"Which LLM API should I use?" is the most common question in AI projects. The answer is rarely straightforward, because AWS Bedrock and OpenAI solve different problems. Bedrock is not an OpenAI competitor but rather a managed service that bundles foundation models from various providers under one roof. OpenAI offers proprietary top-tier models with one of the best developer experiences on the market.
This article compares both platforms based on concrete criteria: API integration with code examples, cost per token, data privacy and GDPR, RAG capabilities, and latency. At the end, you will find a decision framework to help you make the right choice for your project. If you are already running RAG systems in production, this comparison will help with platform selection for the next iteration.
What Is AWS Bedrock?
Amazon Bedrock is a fully managed service that provides foundation models through a unified API. Unlike OpenAI, AWS does not develop its own LLMs (with the exception of the Amazon Nova and Titan families) but hosts models from third-party providers.
Available Models (As of March 2026)
| Provider | Models | Strength |
|---|---|---|
| Anthropic | Claude Sonnet 4.6, Opus 4.6, Haiku 4.5 | Reasoning, coding, longer contexts |
| Meta | Llama 4 Scout, Llama 4 Maverick, Llama 3.3 70B | Open source, cost-efficient |
| Mistral | Mistral Large 3, Pixtral Large, Magistral Small | European provider, multilingual |
| Amazon | Nova Pro, Nova Lite, Nova Micro | Extremely affordable, AWS-native |
| DeepSeek | DeepSeek-R1, DeepSeek V3.2 | Reasoning, open source |
| Cohere | Command R+, Embed v4, Rerank 3.5 | RAG-optimized, embeddings |
Core Features
Beyond pure model invocation, Bedrock offers four key features:
- Knowledge Bases: Managed RAG without custom infrastructure (S3, OpenSearch, Aurora, Neptune)
- Agents: Orchestration of multi-step workflows with tool use
- Guardrails: Content filters, PII detection, topic restrictions
- Model Evaluation: Automated quality assessment of different models
Comparison Matrix: The Key Dimensions
Before diving into the details, here is an overview:
| Criterion | AWS Bedrock | OpenAI API |
|---|---|---|
| Model Selection | 100+ models from 10+ providers | Proprietary models (GPT, o-Series, DALL-E, Whisper) |
| Top Model | Claude Opus 4.6, Claude Sonnet 4.6 (via Anthropic) | GPT-4o, o1, o3-mini |
| Cheapest Model | Nova Micro ($0.035/1M Input) | GPT-4o mini ($0.15/1M Input) |
| Data Residency | eu-central-1 Frankfurt, 7+ EU regions | EU (EEA), no specific country selectable |
| Authentication | IAM Roles, no API keys needed | API key per project |
| VPC Isolation | PrivateLink, traffic never leaves AWS | Public internet |
| RAG (Managed) | Knowledge Bases (S3, OpenSearch, Neptune GraphRAG) | Assistants API with File Search |
| Fine-Tuning | Supported (model-dependent) | Supported (GPT-4o, GPT-4o mini) |
| Compliance | SOC 1/2/3, ISO 27001, HIPAA, FedRAMP | SOC 2, ISO 27001, HIPAA (BAA) |
| Developer Experience | AWS SDK (boto3), steeper learning curve | OpenAI SDK, excellent DX |
Practical Comparison: API Integration
The best comparison is code. Here is the same task with both APIs: a simple chat completion with a system prompt.
Bedrock with boto3 (Converse API)
import boto3
client = boto3.client("bedrock-runtime", region_name="eu-central-1")
response = client.converse(
modelId="eu.anthropic.claude-sonnet-4-6",
messages=[
{
"role": "user",
"content": [{"text": "Was ist Retrieval-Augmented Generation?"}]
}
],
system=[{"text": "Du bist ein hilfreicher KI-Assistent."}],
inferenceConfig={
"temperature": 0.5,
"maxTokens": 512
}
)
text = response["output"]["message"]["content"][0]["text"]
usage = response["usage"]
print(f"Input: {usage['inputTokens']}, Output: {usage['outputTokens']}")
print(text)OpenAI SDK
from openai import OpenAI
client = OpenAI() # nutzt OPENAI_API_KEY aus Umgebungsvariable
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Du bist ein hilfreicher KI-Assistent."},
{"role": "user", "content": "Was ist Retrieval-Augmented Generation?"}
],
temperature=0.7,
max_tokens=512
)
text = response.choices[0].message.content
usage = response.usage
print(f"Input: {usage.prompt_tokens}, Output: {usage.completion_tokens}")
print(text)What Stands Out?
Authentication: Bedrock uses IAM. If your code runs on EC2, ECS, or Lambda, you do not need an API key. The instance role handles authentication automatically. OpenAI always requires an API key that must be securely stored and rotated.
Message Format: Bedrock's Converse API wraps text in a content array with type objects ({"text": "..."}). This is more verbose but natively supports multimodal inputs (text + image in the same request). OpenAI's format is more compact for pure text requests.
Model Switching: With Bedrock, you only change the modelId to switch from Claude to Llama or Mistral. The Converse API abstracts model-specific differences. With OpenAI, you are bound to OpenAI models.
Streaming Comparison
Both APIs support streaming for real-time output:
Bedrock:
response = client.converse_stream(
modelId="eu.anthropic.claude-sonnet-4-6",
messages=[
{"role": "user", "content": [{"text": "Erkläre RAG in 3 Sätzen."}]}
],
inferenceConfig={"temperature": 0.5, "maxTokens": 256}
)
for event in response.get("stream", []):
if "contentBlockDelta" in event:
print(event["contentBlockDelta"]["delta"]["text"], end="")OpenAI:
stream = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Erkläre RAG in 3 Sätzen."}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")OpenAI's streaming API is slightly more elegant. Bedrock's event-based format requires more parsing logic but provides structured metadata (token usage) within the stream.
Costs in Detail
The price differences are significant and depend heavily on the chosen model and usage pattern.
On-Demand Prices per 1M Tokens (March 2026)
| Model | Input / 1M Tokens | Output / 1M Tokens | Platform |
|---|---|---|---|
| Amazon Nova Micro | $0.035 | $0.14 | Bedrock |
| Amazon Nova Lite | $0.06 | $0.24 | Bedrock |
| GPT-4o mini | $0.15 | $0.60 | OpenAI |
| Amazon Nova Pro | $0.80 | $3.20 | Bedrock |
| Claude 3.5 Haiku | $0.80 | $4.00 | Bedrock |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Bedrock |
| GPT-4o | $2.50 | $10.00 | OpenAI |
| o3-mini | $1.10 | $4.40 | OpenAI |
| o1 | $15.00 | $60.00 | OpenAI |
| Claude Opus 4.6 | $15.00 | $75.00 | Bedrock |
Cost Example: 1 Million Requests per Month
Assumptions: an average of 500 input tokens and 300 output tokens per request.
| Model | Monthly Cost |
|---|---|
| Amazon Nova Micro | $0.06 |
| GPT-4o mini | $0.26 |
| Amazon Nova Pro | $1.36 |
| Claude Sonnet 4.6 | $6.00 |
| GPT-4o | $4.25 |
Nova Micro is 70x cheaper than GPT-4o. The quality is not comparable, of course, but for simple classification, summarization, or routing decisions, a small model is often sufficient.
Provisioned Throughput vs. Rate Limits
Bedrock offers Reserved Capacity and Provisioned Throughput for predictable workloads. OpenAI uses tier-based rate limits that automatically increase with growing revenue.
For workloads with steady load (e.g., an internal RAG API with 50 requests per minute), Bedrock Provisioned Throughput can be 30 to 50% cheaper than on-demand. For sporadic usage (e.g., a chatbot with peaks), on-demand pricing is more sensible on both platforms.
Data Privacy and Compliance
For European companies, data privacy is often the deciding criterion.
Data Residency
AWS Bedrock: You can choose eu-central-1 (Frankfurt) as your region. With Cross-Region Inference (CRIS), all data stays within the EU. CloudWatch logs, CloudTrail entries, and model invocation logs are stored only in the source region. You know exactly which data center processes your data.
OpenAI: Since February 2025, OpenAI offers data residency in Europe. Data is stored in the European Economic Area (EEA). However: you cannot choose a specific country (e.g., Germany only), the option is only available for enterprise customers, and it must be activated at project creation. Existing projects cannot be migrated.
VPC Isolation
Bedrock supports AWS PrivateLink. This means traffic between your application and Bedrock never leaves the AWS network. No DNS lookup over the public internet, no exposed endpoint URL. For applications in regulated environments (banking, insurance, healthcare), this is often a mandatory requirement.
OpenAI requests go over the public internet. TLS 1.2+ encrypts the transport, but the traffic is fundamentally publicly routable.
Compliance Comparison
| Certification | AWS Bedrock | OpenAI API |
|---|---|---|
| SOC 2 Type II | Yes | Yes |
| ISO 27001 | Yes | Yes |
| ISO 27701 (Privacy) | Yes | Yes |
| HIPAA | Eligible | BAA available |
| FedRAMP | Moderate + High (GovCloud) | No (Azure OpenAI only) |
| CSA STAR Level 2 | Yes | No |
For the public sector or US government projects, Bedrock with FedRAMP certification is the only direct option. OpenAI's FedRAMP certification runs through Azure OpenAI Service, not through the direct API.
RAG Integration
Both platforms offer managed RAG but with different philosophies.
Bedrock Knowledge Bases
Bedrock Knowledge Bases is a fully managed RAG solution:
import boto3
client = boto3.client("bedrock-agent-runtime", region_name="eu-central-1")
response = client.retrieve_and_generate(
input={"text": "Wie funktioniert unser Rückgabeprozess?"},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "KB12345ABCDE",
"modelArn": "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6"
}
}
)
print(response["output"]["text"])
# Quellenangaben abrufen
for citation in response.get("citations", []):
for ref in citation.get("retrievedReferences", []):
print(f"Quelle: {ref['location']['s3Location']['uri']}")You upload documents to S3, and Bedrock automatically chunks, embeds, and indexes them. Queries return answers with source citations. Supported vector stores:
| Vector Store | Type | Highlight |
|---|---|---|
| Amazon S3 Vectors | Object Storage | Up to 90% cheaper than dedicated vector DBs |
| OpenSearch Serverless | Managed | Standard option, hybrid search |
| Aurora PostgreSQL | Relational | Hybrid search (semantic + keyword) |
| Neptune Analytics | Graph | GraphRAG for entities and relationships |
| Pinecone | Third-Party | High performance |
| MongoDB Atlas | Third-Party | Hybrid search |
OpenAI Assistants with File Search
from openai import OpenAI
client = OpenAI()
# Vector Store erstellen und Dateien hochladen
vector_store = client.vector_stores.create(name="Unternehmensdokumentation")
file = client.files.create(
file=open("handbuch.pdf", "rb"),
purpose="assistants"
)
client.vector_stores.files.create(
vector_store_id=vector_store.id,
file_id=file.id
)
# Assistant mit File Search
assistant = client.assistants.create(
name="Unternehmens-Assistent",
model="gpt-4o",
tools=[{"type": "file_search"}],
tool_resources={
"file_search": {
"vector_store_ids": [vector_store.id]
}
}
)
# Abfrage
thread = client.threads.create()
client.threads.messages.create(
thread_id=thread.id,
role="user",
content="Wie funktioniert unser Rückgabeprozess?"
)
run = client.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id
)
messages = client.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)Comparison
| Aspect | Bedrock Knowledge Bases | OpenAI Assistants |
|---|---|---|
| Setup | S3 bucket + configuration | Upload files |
| Vector Store | 6+ options (S3, OpenSearch, Aurora, Neptune, Pinecone, MongoDB) | OpenAI's own store |
| GraphRAG | Yes (Neptune Analytics) | No |
| Hybrid Search | Yes (semantic + keyword) | Semantic only |
| Chunking | Configurable (fixed, semantic, hierarchical) | Automatic (800 token chunks) |
| Cost Control | Full control over vector store | Opaque pricing ($0.10/GB/day) |
| Flexibility | High (custom embeddings, custom vector store) | Low (fully managed) |
For teams already building on AWS that need control over their RAG pipeline, Knowledge Bases are the better choice. For quick prototypes without AWS infrastructure, OpenAI's Assistants API is simpler. If you want to dive deeper into RAG architectures, you will find the theoretical foundations in our article on RAG and CRAG.
Latency and Performance
Latency is critical for interactive applications (chatbots, real-time search).
Time-to-First-Token (TTFT)
| Model | TTFT (Median) | Notes |
|---|---|---|
| GPT-4o | 200 to 400ms | Consistent, tier-dependent |
| GPT-4o mini | 150 to 300ms | Fastest OpenAI model |
| Claude Sonnet 4.6 (Bedrock) | 300 to 600ms | CRIS can add 50 to 100ms |
| Nova Pro (Bedrock) | 200 to 400ms | AWS-native, low latency |
| Nova Micro (Bedrock) | 100 to 200ms | Fastest Bedrock model |
Cold Starts on Bedrock: After extended inactivity (10+ minutes without requests), the first request can take 1 to 3 seconds longer. This primarily affects rarely used models. Provisioned Throughput completely eliminates cold starts.
OpenAI Rate Limits: OpenAI throttles by tier. Free tier: 3 RPM (requests per minute). Tier 5: 10,000 RPM. Production workloads require at least Tier 3.
Streaming Behavior
Both APIs support streaming but with different granularity. OpenAI streams individual tokens. Bedrock streams in small chunks (typically 2 to 5 tokens), resulting in minimally higher throughput with marginally higher latency between chunks.
Terraform: Setting Up Bedrock Access
For teams integrating Bedrock into their existing AWS infrastructure, here is a Terraform example:
# IAM-Rolle für die Anwendung
resource "aws_iam_role" "bedrock_app" {
name = "bedrock-app-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
Action = "sts:AssumeRole"
}
]
})
}
# Bedrock-Berechtigungen (Least Privilege)
resource "aws_iam_role_policy" "bedrock_invoke" {
name = "bedrock-invoke"
role = aws_iam_role.bedrock_app.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "BedrockInvoke"
Effect = "Allow"
Action = [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
]
Resource = [
"arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6",
"arn:aws:bedrock:eu-central-1::foundation-model/amazon.nova-pro-v1:0"
]
}
]
})
}
# VPC Endpoint für private Konnektivität (optional)
resource "aws_vpc_endpoint" "bedrock_runtime" {
vpc_id = var.vpc_id
service_name = "com.amazonaws.eu-central-1.bedrock-runtime"
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.bedrock_endpoint.id]
private_dns_enabled = true
}With OpenAI, there is no infrastructure setup. You create an API key in the dashboard and store it as an environment variable or in a secrets manager. This is simpler but offers less control over network and access.
Decision Framework
Instead of a blanket recommendation, here are three typical scenarios:
Scenario 1: Startup Without AWS Infrastructure
Recommendation: OpenAI
You have no AWS account, no VPC, no IAM. You want to quickly build and validate a prototype. OpenAI's SDK is set up in 5 minutes, the documentation is excellent, and the community is large. GPT-4o mini offers good quality at low cost.
Scenario 2: Enterprise with AWS Stack and Compliance Requirements
Recommendation: Bedrock
Your company runs on AWS. Data must stay in the EU. You need VPC isolation, IAM integration, and audit logs. Bedrock integrates seamlessly into the existing infrastructure. No additional API keys, no external dependencies. Claude Sonnet 4.6 via Bedrock delivers comparable quality to GPT-4o.
Scenario 3: Hybrid Approach
Recommendation: OpenAI for Prototype, Bedrock for Production
Start with OpenAI for rapid iteration and validation. Once the use case is proven and production is on the horizon, migrate to Bedrock. The Converse API makes the switch easier since you only need to adjust client initialization and message format. The business logic stays the same.
Decision Criteria at a Glance
| Question | If yes | If no |
|---|---|---|
| Must data stay in a specific EU region? | Bedrock | Either |
| Do you need VPC isolation? | Bedrock | Either |
| Do you already have AWS infrastructure? | Bedrock (easier integration) | OpenAI (faster start) |
| Do you need multiple model providers? | Bedrock | Either |
| Is developer experience the top priority? | OpenAI | Either |
| Do you need FedRAMP? | Bedrock | Either |
| Budget under $50/month? | Bedrock (Nova Micro) or OpenAI (GPT-4o mini) | Either |
Conclusion
AWS Bedrock and OpenAI are not direct competitors. Bedrock is a multi-model service with deep AWS integration, strong data privacy, and model diversity. OpenAI offers proprietary top-tier models with the best developer experience on the market.
The three most important decision criteria are:
-
Data Privacy and Compliance: If you need specific EU regions, VPC isolation, or FedRAMP, there is no way around Bedrock. OpenAI's EU data residency is a good start but offers less granularity.
-
Existing Infrastructure: If your stack runs on AWS, Bedrock integrates without additional credentials or network configuration. If you do not use AWS, OpenAI is the faster entry point.
-
Model Flexibility: Bedrock gives you access to Claude, Llama, Mistral, Nova, and more. If a model is not optimal for your use case, you switch with one line of code. With OpenAI, you are bound to the OpenAI ecosystem.
If you are already working on the CI/CD pipeline for AI systems, the platform choice forms the foundation for your deployment architecture. In the next article, we go one step further: RAG infrastructure on AWS with GPU clusters, ECS, and Terraform. To learn how to systematically measure the quality of your RAG pipeline, check out the article on RAG Evaluation and Testing.
Are you evaluating LLM platforms for your company and need support with the architecture decision? Contact me for a no-obligation consultation.