AWS Bedrock vs OpenAI: A Practical Comparison

AWS Bedrock vs OpenAI: Ein praktischer Vergleich

Photo by Growtika on Unsplash

"Which LLM API should I use?" is the most common question in AI projects. The answer is rarely straightforward, because AWS Bedrock and OpenAI solve different problems. Bedrock is not an OpenAI competitor but rather a managed service that bundles foundation models from various providers under one roof. OpenAI offers proprietary top-tier models with one of the best developer experiences on the market.

This article compares both platforms based on concrete criteria: API integration with code examples, cost per token, data privacy and GDPR, RAG capabilities, and latency. At the end, you will find a decision framework to help you make the right choice for your project. If you are already running RAG systems in production, this comparison will help with platform selection for the next iteration.

What Is AWS Bedrock?

Amazon Bedrock is a fully managed service that provides foundation models through a unified API. Unlike OpenAI, AWS does not develop its own LLMs (with the exception of the Amazon Nova and Titan families) but hosts models from third-party providers.

Available Models (As of March 2026)

ProviderModelsStrength
AnthropicClaude Sonnet 4.6, Opus 4.6, Haiku 4.5Reasoning, coding, longer contexts
MetaLlama 4 Scout, Llama 4 Maverick, Llama 3.3 70BOpen source, cost-efficient
MistralMistral Large 3, Pixtral Large, Magistral SmallEuropean provider, multilingual
AmazonNova Pro, Nova Lite, Nova MicroExtremely affordable, AWS-native
DeepSeekDeepSeek-R1, DeepSeek V3.2Reasoning, open source
CohereCommand R+, Embed v4, Rerank 3.5RAG-optimized, embeddings

Core Features

Beyond pure model invocation, Bedrock offers four key features:

  • Knowledge Bases: Managed RAG without custom infrastructure (S3, OpenSearch, Aurora, Neptune)
  • Agents: Orchestration of multi-step workflows with tool use
  • Guardrails: Content filters, PII detection, topic restrictions
  • Model Evaluation: Automated quality assessment of different models

Comparison Matrix: The Key Dimensions

Before diving into the details, here is an overview:

CriterionAWS BedrockOpenAI API
Model Selection100+ models from 10+ providersProprietary models (GPT, o-Series, DALL-E, Whisper)
Top ModelClaude Opus 4.6, Claude Sonnet 4.6 (via Anthropic)GPT-4o, o1, o3-mini
Cheapest ModelNova Micro ($0.035/1M Input)GPT-4o mini ($0.15/1M Input)
Data Residencyeu-central-1 Frankfurt, 7+ EU regionsEU (EEA), no specific country selectable
AuthenticationIAM Roles, no API keys neededAPI key per project
VPC IsolationPrivateLink, traffic never leaves AWSPublic internet
RAG (Managed)Knowledge Bases (S3, OpenSearch, Neptune GraphRAG)Assistants API with File Search
Fine-TuningSupported (model-dependent)Supported (GPT-4o, GPT-4o mini)
ComplianceSOC 1/2/3, ISO 27001, HIPAA, FedRAMPSOC 2, ISO 27001, HIPAA (BAA)
Developer ExperienceAWS SDK (boto3), steeper learning curveOpenAI SDK, excellent DX

Practical Comparison: API Integration

The best comparison is code. Here is the same task with both APIs: a simple chat completion with a system prompt.

Bedrock with boto3 (Converse API)

import boto3
 
client = boto3.client("bedrock-runtime", region_name="eu-central-1")
 
response = client.converse(
    modelId="eu.anthropic.claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": [{"text": "Was ist Retrieval-Augmented Generation?"}]
        }
    ],
    system=[{"text": "Du bist ein hilfreicher KI-Assistent."}],
    inferenceConfig={
        "temperature": 0.5,
        "maxTokens": 512
    }
)
 
text = response["output"]["message"]["content"][0]["text"]
usage = response["usage"]
print(f"Input: {usage['inputTokens']}, Output: {usage['outputTokens']}")
print(text)

OpenAI SDK

from openai import OpenAI
 
client = OpenAI()  # nutzt OPENAI_API_KEY aus Umgebungsvariable
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Du bist ein hilfreicher KI-Assistent."},
        {"role": "user", "content": "Was ist Retrieval-Augmented Generation?"}
    ],
    temperature=0.7,
    max_tokens=512
)
 
text = response.choices[0].message.content
usage = response.usage
print(f"Input: {usage.prompt_tokens}, Output: {usage.completion_tokens}")
print(text)

What Stands Out?

Authentication: Bedrock uses IAM. If your code runs on EC2, ECS, or Lambda, you do not need an API key. The instance role handles authentication automatically. OpenAI always requires an API key that must be securely stored and rotated.

Message Format: Bedrock's Converse API wraps text in a content array with type objects ({"text": "..."}). This is more verbose but natively supports multimodal inputs (text + image in the same request). OpenAI's format is more compact for pure text requests.

Model Switching: With Bedrock, you only change the modelId to switch from Claude to Llama or Mistral. The Converse API abstracts model-specific differences. With OpenAI, you are bound to OpenAI models.

Streaming Comparison

Both APIs support streaming for real-time output:

Bedrock:

response = client.converse_stream(
    modelId="eu.anthropic.claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": [{"text": "Erkläre RAG in 3 Sätzen."}]}
    ],
    inferenceConfig={"temperature": 0.5, "maxTokens": 256}
)
 
for event in response.get("stream", []):
    if "contentBlockDelta" in event:
        print(event["contentBlockDelta"]["delta"]["text"], end="")

OpenAI:

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Erkläre RAG in 3 Sätzen."}
    ],
    stream=True
)
 
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

OpenAI's streaming API is slightly more elegant. Bedrock's event-based format requires more parsing logic but provides structured metadata (token usage) within the stream.

Costs in Detail

The price differences are significant and depend heavily on the chosen model and usage pattern.

On-Demand Prices per 1M Tokens (March 2026)

ModelInput / 1M TokensOutput / 1M TokensPlatform
Amazon Nova Micro$0.035$0.14Bedrock
Amazon Nova Lite$0.06$0.24Bedrock
GPT-4o mini$0.15$0.60OpenAI
Amazon Nova Pro$0.80$3.20Bedrock
Claude 3.5 Haiku$0.80$4.00Bedrock
Claude Sonnet 4.6$3.00$15.00Bedrock
GPT-4o$2.50$10.00OpenAI
o3-mini$1.10$4.40OpenAI
o1$15.00$60.00OpenAI
Claude Opus 4.6$15.00$75.00Bedrock

Cost Example: 1 Million Requests per Month

Assumptions: an average of 500 input tokens and 300 output tokens per request.

ModelMonthly Cost
Amazon Nova Micro$0.06
GPT-4o mini$0.26
Amazon Nova Pro$1.36
Claude Sonnet 4.6$6.00
GPT-4o$4.25

Nova Micro is 70x cheaper than GPT-4o. The quality is not comparable, of course, but for simple classification, summarization, or routing decisions, a small model is often sufficient.

Provisioned Throughput vs. Rate Limits

Bedrock offers Reserved Capacity and Provisioned Throughput for predictable workloads. OpenAI uses tier-based rate limits that automatically increase with growing revenue.

For workloads with steady load (e.g., an internal RAG API with 50 requests per minute), Bedrock Provisioned Throughput can be 30 to 50% cheaper than on-demand. For sporadic usage (e.g., a chatbot with peaks), on-demand pricing is more sensible on both platforms.

Data Privacy and Compliance

For European companies, data privacy is often the deciding criterion.

Data Residency

AWS Bedrock: You can choose eu-central-1 (Frankfurt) as your region. With Cross-Region Inference (CRIS), all data stays within the EU. CloudWatch logs, CloudTrail entries, and model invocation logs are stored only in the source region. You know exactly which data center processes your data.

OpenAI: Since February 2025, OpenAI offers data residency in Europe. Data is stored in the European Economic Area (EEA). However: you cannot choose a specific country (e.g., Germany only), the option is only available for enterprise customers, and it must be activated at project creation. Existing projects cannot be migrated.

VPC Isolation

Bedrock supports AWS PrivateLink. This means traffic between your application and Bedrock never leaves the AWS network. No DNS lookup over the public internet, no exposed endpoint URL. For applications in regulated environments (banking, insurance, healthcare), this is often a mandatory requirement.

OpenAI requests go over the public internet. TLS 1.2+ encrypts the transport, but the traffic is fundamentally publicly routable.

Compliance Comparison

CertificationAWS BedrockOpenAI API
SOC 2 Type IIYesYes
ISO 27001YesYes
ISO 27701 (Privacy)YesYes
HIPAAEligibleBAA available
FedRAMPModerate + High (GovCloud)No (Azure OpenAI only)
CSA STAR Level 2YesNo

For the public sector or US government projects, Bedrock with FedRAMP certification is the only direct option. OpenAI's FedRAMP certification runs through Azure OpenAI Service, not through the direct API.

RAG Integration

Both platforms offer managed RAG but with different philosophies.

Bedrock Knowledge Bases

Bedrock Knowledge Bases is a fully managed RAG solution:

import boto3
 
client = boto3.client("bedrock-agent-runtime", region_name="eu-central-1")
 
response = client.retrieve_and_generate(
    input={"text": "Wie funktioniert unser Rückgabeprozess?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "KB12345ABCDE",
            "modelArn": "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6"
        }
    }
)
 
print(response["output"]["text"])
 
# Quellenangaben abrufen
for citation in response.get("citations", []):
    for ref in citation.get("retrievedReferences", []):
        print(f"Quelle: {ref['location']['s3Location']['uri']}")

You upload documents to S3, and Bedrock automatically chunks, embeds, and indexes them. Queries return answers with source citations. Supported vector stores:

Vector StoreTypeHighlight
Amazon S3 VectorsObject StorageUp to 90% cheaper than dedicated vector DBs
OpenSearch ServerlessManagedStandard option, hybrid search
Aurora PostgreSQLRelationalHybrid search (semantic + keyword)
Neptune AnalyticsGraphGraphRAG for entities and relationships
PineconeThird-PartyHigh performance
MongoDB AtlasThird-PartyHybrid search

OpenAI Assistants with File Search

from openai import OpenAI
 
client = OpenAI()
 
# Vector Store erstellen und Dateien hochladen
vector_store = client.vector_stores.create(name="Unternehmensdokumentation")
 
file = client.files.create(
    file=open("handbuch.pdf", "rb"),
    purpose="assistants"
)
 
client.vector_stores.files.create(
    vector_store_id=vector_store.id,
    file_id=file.id
)
 
# Assistant mit File Search
assistant = client.assistants.create(
    name="Unternehmens-Assistent",
    model="gpt-4o",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)
 
# Abfrage
thread = client.threads.create()
client.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Wie funktioniert unser Rückgabeprozess?"
)
 
run = client.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)
 
messages = client.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)

Comparison

AspectBedrock Knowledge BasesOpenAI Assistants
SetupS3 bucket + configurationUpload files
Vector Store6+ options (S3, OpenSearch, Aurora, Neptune, Pinecone, MongoDB)OpenAI's own store
GraphRAGYes (Neptune Analytics)No
Hybrid SearchYes (semantic + keyword)Semantic only
ChunkingConfigurable (fixed, semantic, hierarchical)Automatic (800 token chunks)
Cost ControlFull control over vector storeOpaque pricing ($0.10/GB/day)
FlexibilityHigh (custom embeddings, custom vector store)Low (fully managed)

For teams already building on AWS that need control over their RAG pipeline, Knowledge Bases are the better choice. For quick prototypes without AWS infrastructure, OpenAI's Assistants API is simpler. If you want to dive deeper into RAG architectures, you will find the theoretical foundations in our article on RAG and CRAG.

Latency and Performance

Latency is critical for interactive applications (chatbots, real-time search).

Time-to-First-Token (TTFT)

ModelTTFT (Median)Notes
GPT-4o200 to 400msConsistent, tier-dependent
GPT-4o mini150 to 300msFastest OpenAI model
Claude Sonnet 4.6 (Bedrock)300 to 600msCRIS can add 50 to 100ms
Nova Pro (Bedrock)200 to 400msAWS-native, low latency
Nova Micro (Bedrock)100 to 200msFastest Bedrock model

Cold Starts on Bedrock: After extended inactivity (10+ minutes without requests), the first request can take 1 to 3 seconds longer. This primarily affects rarely used models. Provisioned Throughput completely eliminates cold starts.

OpenAI Rate Limits: OpenAI throttles by tier. Free tier: 3 RPM (requests per minute). Tier 5: 10,000 RPM. Production workloads require at least Tier 3.

Streaming Behavior

Both APIs support streaming but with different granularity. OpenAI streams individual tokens. Bedrock streams in small chunks (typically 2 to 5 tokens), resulting in minimally higher throughput with marginally higher latency between chunks.

Terraform: Setting Up Bedrock Access

For teams integrating Bedrock into their existing AWS infrastructure, here is a Terraform example:

# IAM-Rolle für die Anwendung
resource "aws_iam_role" "bedrock_app" {
  name = "bedrock-app-role"
 
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}
 
# Bedrock-Berechtigungen (Least Privilege)
resource "aws_iam_role_policy" "bedrock_invoke" {
  name = "bedrock-invoke"
  role = aws_iam_role.bedrock_app.id
 
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "BedrockInvoke"
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel",
          "bedrock:InvokeModelWithResponseStream"
        ]
        Resource = [
          "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-6",
          "arn:aws:bedrock:eu-central-1::foundation-model/amazon.nova-pro-v1:0"
        ]
      }
    ]
  })
}
 
# VPC Endpoint für private Konnektivität (optional)
resource "aws_vpc_endpoint" "bedrock_runtime" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.eu-central-1.bedrock-runtime"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.bedrock_endpoint.id]
  private_dns_enabled = true
}

With OpenAI, there is no infrastructure setup. You create an API key in the dashboard and store it as an environment variable or in a secrets manager. This is simpler but offers less control over network and access.

Decision Framework

Instead of a blanket recommendation, here are three typical scenarios:

Scenario 1: Startup Without AWS Infrastructure

Recommendation: OpenAI

You have no AWS account, no VPC, no IAM. You want to quickly build and validate a prototype. OpenAI's SDK is set up in 5 minutes, the documentation is excellent, and the community is large. GPT-4o mini offers good quality at low cost.

Scenario 2: Enterprise with AWS Stack and Compliance Requirements

Recommendation: Bedrock

Your company runs on AWS. Data must stay in the EU. You need VPC isolation, IAM integration, and audit logs. Bedrock integrates seamlessly into the existing infrastructure. No additional API keys, no external dependencies. Claude Sonnet 4.6 via Bedrock delivers comparable quality to GPT-4o.

Scenario 3: Hybrid Approach

Recommendation: OpenAI for Prototype, Bedrock for Production

Start with OpenAI for rapid iteration and validation. Once the use case is proven and production is on the horizon, migrate to Bedrock. The Converse API makes the switch easier since you only need to adjust client initialization and message format. The business logic stays the same.

Decision Criteria at a Glance

QuestionIf yesIf no
Must data stay in a specific EU region?BedrockEither
Do you need VPC isolation?BedrockEither
Do you already have AWS infrastructure?Bedrock (easier integration)OpenAI (faster start)
Do you need multiple model providers?BedrockEither
Is developer experience the top priority?OpenAIEither
Do you need FedRAMP?BedrockEither
Budget under $50/month?Bedrock (Nova Micro) or OpenAI (GPT-4o mini)Either

Conclusion

AWS Bedrock and OpenAI are not direct competitors. Bedrock is a multi-model service with deep AWS integration, strong data privacy, and model diversity. OpenAI offers proprietary top-tier models with the best developer experience on the market.

The three most important decision criteria are:

  1. Data Privacy and Compliance: If you need specific EU regions, VPC isolation, or FedRAMP, there is no way around Bedrock. OpenAI's EU data residency is a good start but offers less granularity.

  2. Existing Infrastructure: If your stack runs on AWS, Bedrock integrates without additional credentials or network configuration. If you do not use AWS, OpenAI is the faster entry point.

  3. Model Flexibility: Bedrock gives you access to Claude, Llama, Mistral, Nova, and more. If a model is not optimal for your use case, you switch with one line of code. With OpenAI, you are bound to the OpenAI ecosystem.

If you are already working on the CI/CD pipeline for AI systems, the platform choice forms the foundation for your deployment architecture. In the next article, we go one step further: RAG infrastructure on AWS with GPU clusters, ECS, and Terraform. To learn how to systematically measure the quality of your RAG pipeline, check out the article on RAG Evaluation and Testing.


Are you evaluating LLM platforms for your company and need support with the architecture decision? Contact me for a no-obligation consultation.