AWS Bedrock vs OpenAI: Ein praktischer Vergleich

"Which LLM API should I use?" is the most common question in AI projects. The answer is rarely straightforward, because AWS Bedrock and OpenAI solve different problems. Bedrock is not an OpenAI competitor but rather a managed service that bundles foundation models from various providers under one roof. OpenAI offers proprietary top-tier models with one of the best developer experiences on the market.

This article compares both platforms based on concrete criteria: API integration with code examples, cost per token, data privacy and GDPR, RAG capabilities, and latency. At the end, you will find a decision framework to help you make the right choice for your project. If you are already running RAG systems in production, this comparison will help with platform selection for the next iteration.

What Is AWS Bedrock?

Amazon Bedrock is a fully managed service that provides foundation models through a unified API. Unlike OpenAI, AWS does not develop its own LLMs (with the exception of the Amazon Nova and Titan families) but hosts models from third-party providers.

Available Models (As of March 2026)

Provider	Models	Strength
Anthropic	Claude Sonnet 4.6, Opus 4.6, Haiku 4.5	Reasoning, coding, longer contexts
Meta	Llama 4 Scout, Llama 4 Maverick, Llama 3.3 70B	Open source, cost-efficient
Mistral	Mistral Large 3, Pixtral Large, Magistral Small	European provider, multilingual
Amazon	Nova Pro, Nova Lite, Nova Micro	Extremely affordable, AWS-native
DeepSeek	DeepSeek-R1, DeepSeek V3.2	Reasoning, open source
Cohere	Command R+, Embed v4, Rerank 3.5	RAG-optimized, embeddings

Core Features

Beyond pure model invocation, Bedrock offers four key features:

Knowledge Bases: Managed RAG without custom infrastructure (S3, OpenSearch, Aurora, Neptune)
Agents: Orchestration of multi-step workflows with tool use
Guardrails: Content filters, PII detection, topic restrictions
Model Evaluation: Automated quality assessment of different models

Comparison Matrix: The Key Dimensions

Before diving into the details, here is an overview:

Criterion	AWS Bedrock	OpenAI API
Model Selection	100+ models from 10+ providers	Proprietary models (GPT, o-Series, DALL-E, Whisper)
Top Model	Claude Opus 4.6, Claude Sonnet 4.6 (via Anthropic)	GPT-4o, o1, o3-mini
Cheapest Model	Nova Micro ($0.035/1M Input)	GPT-4o mini ($0.15/1M Input)
Data Residency	eu-central-1 Frankfurt, 7+ EU regions	EU (EEA), no specific country selectable
Authentication	IAM Roles, no API keys needed	API key per project
VPC Isolation	PrivateLink, traffic never leaves AWS	Public internet
RAG (Managed)	Knowledge Bases (S3, OpenSearch, Neptune GraphRAG)	Assistants API with File Search
Fine-Tuning	Supported (model-dependent)	Supported (GPT-4o, GPT-4o mini)
Compliance	SOC 1/2/3, ISO 27001, HIPAA, FedRAMP	SOC 2, ISO 27001, HIPAA (BAA)
Developer Experience	AWS SDK (boto3), steeper learning curve	OpenAI SDK, excellent DX

Practical Comparison: API Integration

The best comparison is code. Here is the same task with both APIs: a simple chat completion with a system prompt.

Bedrock with boto3 (Converse API)

import boto3
 
client = boto3.client("bedrock-runtime", region_name="eu-central-1")
 
response = client.converse(
    modelId="eu.anthropic.claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": [{"text": "Was ist Retrieval-Augmented Generation?"}]
        }
    ],
    system=[{"text": "Du bist ein hilfreicher KI-Assistent."}],
    inferenceConfig={
        "temperature": 0.5,
        "maxTokens": 512
    }
)
 
text = response["output"]["message"]["content"][0]["text"]
usage = response["usage"]
print(f"Input: {usage['inputTokens']}, Output: {usage['outputTokens']}")
print(text)

OpenAI SDK

from openai import OpenAI
 
client = OpenAI()  # nutzt OPENAI_API_KEY aus Umgebungsvariable
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Du bist ein hilfreicher KI-Assistent."},
        {"role": "user", "content": "Was ist Retrieval-Augmented Generation?"}
    ],
    temperature=0.7,
    max_tokens=512
)
 
text = response.choices[0].message.content
usage = response.usage
print(f"Input: {usage.prompt_tokens}, Output: {usage.completion_tokens}")
print(text)

What Stands Out?

Authentication: Bedrock uses IAM. If your code runs on EC2, ECS, or Lambda, you do not need an API key. The instance role handles authentication automatically. OpenAI always requires an API key that must be securely stored and rotated.

Message Format: Bedrock's Converse API wraps text in a content array with type objects ({"text": "..."}). This is more verbose but natively supports multimodal inputs (text + image in the same request). OpenAI's format is more compact for pure text requests.

Model Switching: With Bedrock, you only change the modelId to switch from Claude to Llama or Mistral. The Converse API abstracts model-specific differences. With OpenAI, you are bound to OpenAI models.

Streaming Comparison

Both APIs support streaming for real-time output:

Bedrock:

response = client.converse_stream(
    modelId="eu.anthropic.claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": [{"text": "Erkläre RAG in 3 Sätzen."}]}
    ],
    inferenceConfig={"temperature": 0.5, "maxTokens": 256}
)
 
for event in response.get("stream", []):
    if "contentBlockDelta" in event:
        print(event["contentBlockDelta"]["delta"]["text"], end="")

OpenAI:

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Erkläre RAG in 3 Sätzen."}
    ],
    stream=True
)
 
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

OpenAI's streaming API is slightly more elegant. Bedrock's event-based format requires more parsing logic but provides structured metadata (token usage) within the stream.

Costs in Detail

The price differences are significant and depend heavily on the chosen model and usage pattern.

On-Demand Prices per 1M Tokens (March 2026)

Model	Input / 1M Tokens	Output / 1M Tokens	Platform
Amazon Nova Micro	$0.035	$0.14	Bedrock
Amazon Nova Lite	$0.06	$0.24	Bedrock
GPT-4o mini	$0.15	$0.60	OpenAI
Amazon Nova Pro	$0.80	$3.20	Bedrock
Claude 3.5 Haiku	$0.80	$4.00	Bedrock
Claude Sonnet 4.6	$3.00	$15.00	Bedrock
GPT-4o	$2.50	$10.00	OpenAI
o3-mini	$1.10	$4.40	OpenAI
o1	$15.00	$60.00	OpenAI
Claude Opus 4.6	$15.00	$75.00	Bedrock

Cost Example: 1 Million Requests per Month

Assumptions: an average of 500 input tokens and 300 output tokens per request.

Model	Monthly Cost
Amazon Nova Micro	$0.06
GPT-4o mini	$0.26
Amazon Nova Pro	$1.36
Claude Sonnet 4.6	$6.00
GPT-4o	$4.25

Nova Micro is 70x cheaper than GPT-4o. The quality is not comparable, of course, but for simple classification, summarization, or routing decisions, a small model is often sufficient.

Provisioned Throughput vs. Rate Limits

Bedrock offers Reserved Capacity and Provisioned Throughput for predictable workloads. OpenAI uses tier-based rate limits that automatically increase with growing revenue.

For workloads with steady load (e.g., an internal RAG API with 50 requests per minute), Bedrock Provisioned Throughput can be 30 to 50% cheaper than on-demand. For sporadic usage (e.g., a chatbot with peaks), on-demand pricing is more sensible on both platforms.

Data Privacy and Compliance

For European companies, data privacy is often the deciding criterion.

Data Residency

AWS Bedrock: You can choose eu-central-1 (Frankfurt) as your region. With Cross-Region Inference (CRIS), all data stays within the EU. CloudWatch logs, CloudTrail entries, and model invocation logs are stored only in the source region. You know exactly which data center processes your data.

OpenAI: Since February 2025, OpenAI offers data residency in Europe. Data is stored in the European Economic Area (EEA). However: you cannot choose a specific country (e.g., Germany only), the option is only available for enterprise customers, and it must be activated at project creation. Existing projects cannot be migrated.

VPC Isolation

Bedrock supports AWS PrivateLink. This means traffic between your application and Bedrock never leaves the AWS network. No DNS lookup over the public internet, no exposed endpoint URL. For applications in regulated environments (banking, insurance, healthcare), this is often a mandatory requirement.

OpenAI requests go over the public internet. TLS 1.2+ encrypts the transport, but the traffic is fundamentally publicly routable.

Compliance Comparison

Certification	AWS Bedrock	OpenAI API
SOC 2 Type II	Yes	Yes
ISO 27001	Yes	Yes
ISO 27701 (Privacy)	Yes	Yes
HIPAA	Eligible	BAA available
FedRAMP	Moderate + High (GovCloud)	No (Azure OpenAI only)
CSA STAR Level 2	Yes	No

For the public sector or US government projects, Bedrock with FedRAMP certification is the only direct option. OpenAI's FedRAMP certification runs through Azure OpenAI Service, not through the direct API.

RAG Integration

Both platforms offer managed RAG but with different philosophies.

Bedrock Knowledge Bases

Bedrock Knowledge Bases is a fully managed RAG solution:

import boto3
 
client = boto3.client("bedrock-agent-runtime", region_name="eu-central-1")
 
response = client.retrieve_and_generate(
    input={"text": "Wie funktioniert unser Rückgabeprozess?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "KB12345ABCDE",
            "modelArn": "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6"
        }
    }
)
 
print(response["output"]["text"])
 
# Quellenangaben abrufen
for citation in response.get("citations", []):
    for ref in citation.get("retrievedReferences", []):
        print(f"Quelle: {ref['location']['s3Location']['uri']}")

You upload documents to S3, and Bedrock automatically chunks, embeds, and indexes them. Queries return answers with source citations. Supported vector stores:

Vector Store	Type	Highlight
Amazon S3 Vectors	Object Storage	Up to 90% cheaper than dedicated vector DBs
OpenSearch Serverless	Managed	Standard option, hybrid search
Aurora PostgreSQL	Relational	Hybrid search (semantic + keyword)
Neptune Analytics	Graph	GraphRAG for entities and relationships
Pinecone	Third-Party	High performance
MongoDB Atlas	Third-Party	Hybrid search

OpenAI Assistants with File Search

from openai import OpenAI
 
client = OpenAI()
 
# Vector Store erstellen und Dateien hochladen
vector_store = client.vector_stores.create(name="Unternehmensdokumentation")
 
file = client.files.create(
    file=open("handbuch.pdf", "rb"),
    purpose="assistants"
)
 
client.vector_stores.files.create(
    vector_store_id=vector_store.id,
    file_id=file.id
)
 
# Assistant mit File Search
assistant = client.assistants.create(
    name="Unternehmens-Assistent",
    model="gpt-4o",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)
 
# Abfrage
thread = client.threads.create()
client.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Wie funktioniert unser Rückgabeprozess?"
)
 
run = client.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)
 
messages = client.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)

Comparison

Aspect	Bedrock Knowledge Bases	OpenAI Assistants
Setup	S3 bucket + configuration	Upload files
Vector Store	6+ options (S3, OpenSearch, Aurora, Neptune, Pinecone, MongoDB)	OpenAI's own store
GraphRAG	Yes (Neptune Analytics)	No
Hybrid Search	Yes (semantic + keyword)	Semantic only
Chunking	Configurable (fixed, semantic, hierarchical)	Automatic (800 token chunks)
Cost Control	Full control over vector store	Opaque pricing ($0.10/GB/day)
Flexibility	High (custom embeddings, custom vector store)	Low (fully managed)

For teams already building on AWS that need control over their RAG pipeline, Knowledge Bases are the better choice. For quick prototypes without AWS infrastructure, OpenAI's Assistants API is simpler. If you want to dive deeper into RAG architectures, you will find the theoretical foundations in our article on RAG and CRAG.

Latency and Performance

Latency is critical for interactive applications (chatbots, real-time search).

Time-to-First-Token (TTFT)

Model	TTFT (Median)	Notes
GPT-4o	200 to 400ms	Consistent, tier-dependent
GPT-4o mini	150 to 300ms	Fastest OpenAI model
Claude Sonnet 4.6 (Bedrock)	300 to 600ms	CRIS can add 50 to 100ms
Nova Pro (Bedrock)	200 to 400ms	AWS-native, low latency
Nova Micro (Bedrock)	100 to 200ms	Fastest Bedrock model

Cold Starts on Bedrock: After extended inactivity (10+ minutes without requests), the first request can take 1 to 3 seconds longer. This primarily affects rarely used models. Provisioned Throughput completely eliminates cold starts.

OpenAI Rate Limits: OpenAI throttles by tier. Free tier: 3 RPM (requests per minute). Tier 5: 10,000 RPM. Production workloads require at least Tier 3.

Streaming Behavior

Both APIs support streaming but with different granularity. OpenAI streams individual tokens. Bedrock streams in small chunks (typically 2 to 5 tokens), resulting in minimally higher throughput with marginally higher latency between chunks.

Terraform: Setting Up Bedrock Access

For teams integrating Bedrock into their existing AWS infrastructure, here is a Terraform example:

# IAM-Rolle für die Anwendung
resource "aws_iam_role" "bedrock_app" {
  name = "bedrock-app-role"
 
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}
 
# Bedrock-Berechtigungen (Least Privilege)
resource "aws_iam_role_policy" "bedrock_invoke" {
  name = "bedrock-invoke"
  role = aws_iam_role.bedrock_app.id
 
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "BedrockInvoke"
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel",
          "bedrock:InvokeModelWithResponseStream"
        ]
        Resource = [
          "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6",
          "arn:aws:bedrock:eu-central-1::foundation-model/amazon.nova-pro-v1:0"
        ]
      }
    ]
  })
}
 
# VPC Endpoint für private Konnektivität (optional)
resource "aws_vpc_endpoint" "bedrock_runtime" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.eu-central-1.bedrock-runtime"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.bedrock_endpoint.id]
  private_dns_enabled = true
}

With OpenAI, there is no infrastructure setup. You create an API key in the dashboard and store it as an environment variable or in a secrets manager. This is simpler but offers less control over network and access.

Decision Framework

Instead of a blanket recommendation, here are three typical scenarios:

Scenario 1: Startup Without AWS Infrastructure

Recommendation: OpenAI

You have no AWS account, no VPC, no IAM. You want to quickly build and validate a prototype. OpenAI's SDK is set up in 5 minutes, the documentation is excellent, and the community is large. GPT-4o mini offers good quality at low cost.

Scenario 2: Enterprise with AWS Stack and Compliance Requirements

Recommendation: Bedrock

Your company runs on AWS. Data must stay in the EU. You need VPC isolation, IAM integration, and audit logs. Bedrock integrates seamlessly into the existing infrastructure. No additional API keys, no external dependencies. Claude Sonnet 4.6 via Bedrock delivers comparable quality to GPT-4o.

Scenario 3: Hybrid Approach

Recommendation: OpenAI for Prototype, Bedrock for Production

Start with OpenAI for rapid iteration and validation. Once the use case is proven and production is on the horizon, migrate to Bedrock. The Converse API makes the switch easier since you only need to adjust client initialization and message format. The business logic stays the same.

Decision Criteria at a Glance

Question	If yes	If no
Must data stay in a specific EU region?	Bedrock	Either
Do you need VPC isolation?	Bedrock	Either
Do you already have AWS infrastructure?	Bedrock (easier integration)	OpenAI (faster start)
Do you need multiple model providers?	Bedrock	Either
Is developer experience the top priority?	OpenAI	Either
Do you need FedRAMP?	Bedrock	Either
Budget under $50/month?	Bedrock (Nova Micro) or OpenAI (GPT-4o mini)	Either

Conclusion

AWS Bedrock and OpenAI are not direct competitors. Bedrock is a multi-model service with deep AWS integration, strong data privacy, and model diversity. OpenAI offers proprietary top-tier models with the best developer experience on the market.

The three most important decision criteria are:

Data Privacy and Compliance: If you need specific EU regions, VPC isolation, or FedRAMP, there is no way around Bedrock. OpenAI's EU data residency is a good start but offers less granularity.
Existing Infrastructure: If your stack runs on AWS, Bedrock integrates without additional credentials or network configuration. If you do not use AWS, OpenAI is the faster entry point.
Model Flexibility: Bedrock gives you access to Claude, Llama, Mistral, Nova, and more. If a model is not optimal for your use case, you switch with one line of code. With OpenAI, you are bound to the OpenAI ecosystem.

If you are already working on the CI/CD pipeline for AI systems, the platform choice forms the foundation for your deployment architecture. In the next article, we go one step further: RAG infrastructure on AWS with GPU clusters, ECS, and Terraform. To learn how to systematically measure the quality of your RAG pipeline, check out the article on RAG Evaluation and Testing.

Are you evaluating LLM platforms for your company and need support with the architecture decision? Contact me for a no-obligation consultation.