Laravel 13 AI SDK in der Praxis: Eine echte RAG-Pipeline, kein Hello-World

In February 2026, the Laravel team released the first-party AI SDK as a public beta. With Laravel 13 reaching stable on 17 March 2026, the SDK shipped alongside the framework. Eight weeks later, I deployed it in a real SaaS project, a RAG-as-a-service platform for mid-sized knowledge bases. This article is the field report.

Until that release, Python with LangChain or LlamaIndex was the de facto standard for RAG systems. Laravel teams that needed AI either stood up a parallel Python stack or wrote custom HTTP clients against OpenAI. Both meant duplicated infrastructure, duplicated deployments, duplicated monitoring. The new SDK changes that math fundamentally.

The core claim: the SDK solves 60 to 70 percent of RAG infrastructure. The remaining 30 to 40 percent is business logic, and that is exactly where it should live.

Disclosure: The SaaS project referenced in this article is my own pre-MVP venture. All code snippets come from this codebase, with tenant identifiers and company specifics removed. Version pin: PHP 8.3, Laravel 13.x, laravel/ai 0.4.4 as the build state of the codebase. The SDK has continued to evolve, and several of the gaps discussed below are already closed in 0.6.x.

Why Laravel and AI Even Became a Question

Through 2025, the answer to "we are building a RAG system" was almost always Python. LangChain, LlamaIndex, llama-index-server, Haystack: the ML ecosystem had clearly settled in Python. For Laravel teams this meant running a Python service alongside the main application, with its own deployment pipeline, monitoring, and scaling story. The friction was high enough that many teams avoided the work and argued "we do not need RAG".

What Laravel teams already have: Horizon for queues, Eloquent as ORM, Sanctum for auth, Livewire for dashboards, Cashier for billing. Everything a RAG application needs as infrastructure, except the AI part itself. Splitting into two stacks (PHP app plus Python AI service) meant standing up the same infrastructure twice, only because the AI lived in a different language.

The new SDK integrates with this infrastructure instead of running next to it. That is the strategic shift, not the individual features.

The thesis: if a team knows Laravel, Laravel is the faster choice for 80 percent of RAG use cases. Not because Python is worse, but because infrastructure consistency is worth more than framework nuance. The build vs. buy logic applies at stack level just as at feature level.

The RAG Pipeline at a Glance

The application is a RAG-as-a-service platform. Tenants upload documents, define projects, ask questions, and get answers with source citations. Multi-tenant from day one, because that is the typical B2B SaaS requirement.

The pipeline has six stages, processed in a job chain:

Text Extraction: convert PDF, DOCX, HTML to unified text
Chunking: split text into semantically meaningful pieces
Embedding Generation: compute vector representations per chunk, batched
Similarity Search: pgvector cosine distance against the query embedding
Answer Generation: agent with streaming via Server-Sent Events
Verification: optional second agent that validates the answer against the sources

Architectural decisions made early that have paid off:

Decision	Rationale
Actions/Agents/Services pattern instead of fat controllers	Testability, reuse, clear ownership
Job chain for asynchronous document processing	Extract, chunk, embed are individually retryable, no 30-minute jobs
Single-database multi-tenancy via stancl/tenancy	One DB server, isolation via global scopes, simpler operations
Dedicated action for vector search	Encapsulation, easy migration to newer SDK features
RagAgent as PHP class with Promptable trait	Type-safe, mockable, usable in Pest tests

What the SDK Does Well

Three examples from production code that show what the SDK solves at the infrastructure layer.

Embeddings as a One-Liner

Before the SDK, generating embeddings meant: HTTP client, error handling, retry logic for rate limits, batching strategy, caching. With the SDK, it is one call:

// app/Services/Embedding/EmbeddingService.php
namespace App\Services\Embedding;
 
use Laravel\Ai\Embeddings;
 
class EmbeddingService
{
    private string $model;
    private int $dimensions;
    private int $batchSize;
 
    public function __construct()
    {
        $this->model = (string) config('embedding.model', 'text-embedding-3-small');
        $this->dimensions = (int) config('embedding.dimensions', 1536);
        $this->batchSize = (int) config('embedding.batch_size', 100);
    }
 
    /**
     * @param  string[]  $texts
     * @return array<int, array<int, float>>
     */
    public function embed(array $texts): array
    {
        if (empty($texts)) {
            return [];
        }
 
        $allEmbeddings = [];
 
        foreach (array_chunk($texts, $this->batchSize) as $batch) {
            $response = Embeddings::for(array_values($batch))
                ->dimensions($this->dimensions)
                ->generate(model: $this->model);
 
            foreach ($response->embeddings as $embedding) {
                $allEmbeddings[] = $embedding;
            }
        }
 
        return $allEmbeddings;
    }
}

What the SDK does internally: provider abstraction (OpenAI, Anthropic, Bedrock, Ollama, more), automatic caching, retry logic for rate limits, unified response format. What stays domain logic by design: application-level batching and the choice of embedding model. The split is clean.

Agent Interface with the Promptable Trait

An agent is a PHP class with a system prompt and provider options. No runtime composition of dictionary configs as in LangChain. No implicit bindings. A normal class:

// app/Agents/RagAgent.php
namespace App\Agents;
 
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasProviderOptions;
use Laravel\Ai\Enums\Lab;
use Laravel\Ai\Promptable;
 
class RagAgent implements Agent, HasProviderOptions
{
    use Promptable;
 
    public function __construct(
        private string $systemPrompt,
        private float $temperature = 0.1,
    ) {}
 
    public function instructions(): string
    {
        return $this->systemPrompt;
    }
 
    public function providerOptions(Lab|string $provider): array
    {
        return [
            'temperature' => $this->temperature,
        ];
    }
}

Why this is better than a LangChain chain:

Dependency Injection: The agent is a normal class. Pest tests instantiate it with mock data, no container setup required.
Type-safe: PHP 8.3 types are statically checkable with PHPStan or Larastan. Wrong provider constants surface at build time, not runtime.
Explicit: The HasProviderOptions interface makes provider-specific behavior visible, not buried in a configuration hash.

The Promptable trait provides the prompt() method, which internally selects the provider, calls the model, and returns a structured response. From the application's perspective, it is a method call.

Streaming via Server-Sent Events

Streaming is UX-critical in RAG applications. No one waits ten seconds for a fully generated answer, but everyone accepts ten seconds of token-by-token output. Before the SDK, this meant writing a custom event parser, buffer logic, and cancel handling. With the SDK, the StreamableAgentResponse yields structured TextDelta events that pass straight to SSE.

// Simplified excerpt from the streaming endpoint
return response()->stream(function () use ($agent, $userMessage, $model) {
    foreach ($agent->stream($userMessage, model: $model) as $event) {
        if ($event instanceof TextDelta) {
            echo "data: " . json_encode(['delta' => $event->text]) . "\n\n";
            ob_flush();
            flush();
        }
    }
 
    echo "event: done\ndata: {}\n\n";
}, 200, [
    'Content-Type' => 'text/event-stream',
    'Cache-Control' => 'no-cache',
    'X-Accel-Buffering' => 'no',
]);

About 30 lines for full streaming, including header setup for Nginx buffering. Previously this was several hundred lines, rewritten in nearly every other application.

Where the SDK Has Gaps, or Had Them

No SDK is perfect, and first-party SDKs in their early maturity phase even less so. Three places where I wrote custom implementations during the build, with the honest note that the SDK is evolving fast.

Vector Similarity Search in the Beta Phase

In version 0.4.x, which the codebase was built against, there was no integrated query builder for vector operations. The solution was raw SQL against pgvector:

// app/Actions/SearchSimilarChunks.php
namespace App\Actions;
 
use App\Models\Project;
use Illuminate\Support\Facades\DB;
 
class SearchSimilarChunks
{
    /**
     * Cosine similarity search against project-scoped chunks.
     *
     * @param  array<int, float>  $queryEmbedding
     * @return array<int, object>
     */
    public function execute(Project $project, array $queryEmbedding, int $topK, float $similarityThreshold): array
    {
        if (empty($queryEmbedding)) {
            return [];
        }
 
        $embeddingVector = '['.implode(',', $queryEmbedding).']';
 
        return DB::select('
            SELECT c.id, c.content, c.metadata, c.position, c.document_id,
                   d.original_filename,
                   1 - (c.embedding <=> ?::vector) AS similarity
            FROM chunks c
            JOIN documents d ON d.id = c.document_id
            WHERE c.project_id = ?
              AND c.embedding IS NOT NULL
              AND 1 - (c.embedding <=> ?::vector) >= ?
            ORDER BY similarity DESC
            LIMIT ?
        ', [$embeddingVector, $project->id, $embeddingVector, $similarityThreshold, $topK]);
    }
}

The <=> operator is pgvector's cosine distance. 1 - distance yields the similarity between zero and one. The migration for the table:

DB::statement('ALTER TABLE chunks ADD COLUMN embedding vector(1536)');
DB::statement('CREATE INDEX chunks_embedding_hnsw_idx ON chunks USING hnsw (embedding vector_cosine_ops)');

HNSW is the state-of-the-art index for approximate nearest-neighbor search in pgvector. With default parameters, it returns results in under ten milliseconds at single-digit-million vectors.

What has changed since: with laravel/ai 0.5 and especially 0.6, the whereVectorSimilarTo() method is part of the SDK. A direct equivalent of the SQL above now looks like this:

$chunks = Chunk::query()
    ->where('project_id', $project->id)
    ->whereVectorSimilarTo('embedding', $queryEmbedding, minSimilarity: 0.7)
    ->limit(5)
    ->get();

The original action class stays in the backlog for migration. That migration is exactly the point: because vector search was encapsulated in a dedicated action, not in a model scope, the switch to the native SDK method is an isolated refactoring task, not a sweeping rework. Lesson learned: with first-party SDKs in their early phase, expect features to be added later. Cleanly encapsulated abstractions turn migration into routine work, not drama.

Chunking Strategies Are Deliberately Missing

Chunking is the single biggest quality factor in a RAG pipeline. A contract needs to be split differently than a technical wiki article, and differently again than a sales PDF. The SDK stays domain-agnostic here, which is the right call. Forcing chunking into the SDK would tie it to an assumption that rarely fits.

The codebase uses a custom strategy pattern:

// app/Services/Chunking/ChunkingService.php
class ChunkingService
{
    private array $strategies = [];
 
    public function __construct()
    {
        $this->registerStrategy(new SemanticChunkingStrategy);
        $this->registerStrategy(new ParagraphChunkingStrategy);
        $this->registerStrategy(new FixedSizeChunkingStrategy);
    }
 
    public function chunkText(
        string $text,
        string $strategy = 'semantic',
        ?int $chunkSize = null,
        ?int $overlap = null,
    ): array {
        $chunkSize ??= (int) config('chunking.default_chunk_size', 512);
        $overlap ??= (int) config('chunking.default_overlap', 50);
 
        return $this->getStrategy($strategy)->chunk($text, $chunkSize, $overlap);
    }
}

Three strategies cover most use cases. SemanticChunkingStrategy splits on sentence and paragraph boundaries while honoring a token limit. ParagraphChunkingStrategy uses double newlines as natural separators. FixedSizeChunkingStrategy enforces a fixed token count with configurable overlap, useful for homogeneous text without natural structure.

Choosing a strategy per project is a tenant decision, not a framework opinion. That is exactly the boundary the SDK draws correctly.

Hallucination Check as a Dedicated Pipeline Step

Verification is a quality requirement, not a framework feature. Every project sets its own threshold. The codebase uses a separate VerificationAgent that checks, after answer generation, whether the answer is supported by the sources:

// app/Agents/VerificationAgent.php
namespace App\Agents;
 
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasProviderOptions;
use Laravel\Ai\Enums\Lab;
use Laravel\Ai\Promptable;
 
class VerificationAgent implements Agent, HasProviderOptions
{
    use Promptable;
 
    public function __construct(
        private float $verificationTemperature = 0.0,
    ) {}
 
    public function instructions(): string
    {
        return 'You are a verification agent. Your job is to check whether a given answer is accurately supported by the provided source excerpts. '
            .'Analyze the answer against the sources and respond with EXACTLY one of these verdicts on the first line: VERIFIED, PARTIALLY_VERIFIED, or COULD_NOT_VERIFY. '
            .'On the second line, provide a brief explanation (1-2 sentences) of your reasoning.';
    }
 
    public function providerOptions(Lab|string $provider): array
    {
        return ['temperature' => $this->verificationTemperature];
    }
}

Three verdicts: VERIFIED, PARTIALLY_VERIFIED, COULD_NOT_VERIFY. Temperature 0.0, because this is a classification, not a creative task. The agent runs as a second call after the actual answer, can be enabled per project, and the three verdicts are stored in the database for downstream analysis. How this quality check is organized systematically is a topic of its own.

The Section's Core Takeaway

The SDK solves 60 to 70 percent of the infrastructure. The remaining 30 to 40 percent is business logic, and that is exactly where it should live. An SDK that does everything is in fact a framework. As soon as requirements get specific, that framework becomes the wrong abstraction layer.

Multi-Tenancy and RAG, the Underestimated Complexity

Multi-tenant RAG is significantly harder than multi-tenant CRUD, for three reasons. Embeddings are expensive assets (OpenAI cost per document), query caches must be invalidated per tenant, and vector search can break tenant boundaries when the WHERE clauses are missing.

The architectural decision in the codebase: single-database multi-tenancy via stancl/tenancy with empty bootstrappers. No per-tenant database, but isolation through global scopes on a shared database. This significantly simplifies backups, monitoring, and schema migrations.

The BelongsToTenant trait registers a global scope that automatically appends WHERE tenant_id = ? to every query. Models using this trait can never accidentally return data from another tenant, as long as the scope is not explicitly disabled.

The chunks table is an exception. Chunks have no direct tenant_id column because isolation comes from the parent chain Tenant → Project → Document → Chunk. Every chunk query goes via project_id, which is the first WHERE condition in the SQL example above. This asymmetry is intentional, because a duplicate tenant_id index would cost extra storage and maintenance without security gain.

Per-Project Cache Invalidation

Query results are cached because RAG calls are expensive. But every document update potentially makes all cached answers for that project wrong. The solution: a version counter per project that flows into cache keys.

A DocumentObserver increments the counter on every document change. Cache keys follow the format query:{project_id}:{version}:{hash(question)}. A change bumps the version, and old keys are pushed out by the natural TTL. No explicit invalidation, no risk of forgetting a key.

Queue Jobs Lose Tenant Context

Serializing a job loses the current tenant context. Anyone missing this gets subtle bugs: a job runs in the "main" schema and accesses tables that do not exist or hold different data in the current tenant schema.

The fix sits in every handle():

public function handle(): void
{
    tenancy()->initialize($this->tenant);
 
    // Actual job logic, which now has tenant context
}

In tests, this is verified explicitly with Model::withoutGlobalScope(TenantScope::class) and an assertEmpty() assertion. Cross-tenant leakage is one of the most dangerous bug classes in multi-tenant systems, because it rarely surfaces, but in a GDPR context the consequences are severe.

Numbers and an Honest Balance

The stack in numbers, as built:

Component	Version
PHP	8.3
Laravel Framework	13.x
laravel/ai	0.4.4 (build state, current is 0.6.x)
PostgreSQL	17 with pgvector 0.8.x
HNSW index	m=16, ef_construction=64
Embedding model	text-embedding-3-small, 1536 dimensions, batch 100
Completion model	gpt-4.1-mini, temperature 0.1
Queue	Horizon with Redis, 3 retries, 60s exponential backoff
Cache	Redis, 24h TTL, version-counter invalidation
Multi-tenancy	stancl/tenancy 3.10, single-database

What the SDK Saved in Code

Estimated line counts compared to a custom implementation with the OpenAI PHP client and homegrown infrastructure:

Area	Estimated LOC without the SDK
Provider abstraction (OpenAI, Anthropic, Bedrock)	200
Streaming implementation (SSE, buffering, cancel)	150
Agent tool calling and structured output	300
Retry and error handling	100
Total	~750 LOC

That is infrastructure code that, without the SDK, would be rewritten in every AI project, with the well-known costs: maintenance, tests, bug fixes for years.

What Still Had to Be Written

Area	LOC including tests
Chunking strategies (3 strategies plus service)	~400
Vector search with raw SQL (at build time)	~100
Verification agent	~150
Multi-tenancy scoping and test helpers	~250
Cache invalidation via observer	~80
Total	~980 LOC

These numbers are the reality: the SDK saves infrastructure, not domain logic. Anyone planning to ship should expect roughly 1000 LOC of own code on top of the SDK.

Honest Caveat

"Production-ready" does not mean "pre-MVP done". The referenced codebase is pre-MVP. It works, has tests, runs in a staging environment, but is not proof that the SDK runs unchanged in a high-load B2B platform with a hundred tenants. Anyone adopting now should plan for early 1.0 maturity: own abstractions that can be swapped against native SDK features later, clear version pinning, willingness to refactor.

Conclusion and Recommendation

The Laravel AI SDK is the strategically most important shift in the PHP ecosystem for AI applications in years. It does not solve everything, but it solves enough to establish Laravel as a serious stack for RAG systems. Who it fits, who it does not:

Fits when: the team already has a productive Laravel stack, AI features are added to an existing application, and a parallel Python stack should be avoided. That describes the majority of mid-sized Laravel projects with AI needs.

Python and LangChain remain better when: the project has a research phase, experimentation with bleeding-edge models is required, the team already has Python ML experience, or specialized ML libraries (Transformers, LangGraph with complex agent hierarchies) are core. That is also a valid choice, not an ideological question.

The bigger lesson: "Buy the infrastructure, build the domain." The SDK is good infrastructure. The domain knowledge, meaning chunking, verification, prompting, multi-tenancy, remains the developer's work. And that is right. An SDK that forces domain knowledge becomes the wrong abstraction layer the moment requirements get real.

RAG systems are no longer a Python monopoly. For Laravel teams, there is now a viable path, with gaps, but with a clear direction. Earlier steps in this ecosystem showed the demand, this SDK delivers the answer.

You are unsure whether Laravel or Python is the right stack for your AI project? Contact me for an IT strategy check that answers the architectural question in two to three days.