infra
intermediate
AI RAG Architecture
Solution Components
Cloud Cost Estimator
Dynamic Pricing Calculator
$0 / month
Compute Resources
$ 15
Database Storage
$ 25
Load Balancer
$ 10
CDN / Bandwidth
$ 5
* Estimates vary by provider & region
%% Autogenerated ai-rag-llm
graph TD
classDef standard fill:#1e293b,stroke:#38bdf8,stroke-width:1px,color:#e5e7eb;
classDef c-actor fill:#1e293b,stroke:#e5e7eb,stroke-width:1px,stroke-dasharray: 5 5,color:#e5e7eb;
classDef c-compute fill:#422006,stroke:#fb923c,stroke-width:1px,color:#fed7aa;
classDef c-database fill:#064e3b,stroke:#34d399,stroke-width:1px,color:#d1fae5;
classDef c-network fill:#2e1065,stroke:#a855f7,stroke-width:1px,color:#f3e8ff;
classDef c-storage fill:#450a0a,stroke:#f87171,stroke-width:1px,color:#fee2e2;
classDef c-security fill:#450a0a,stroke:#f87171,stroke-width:1px,color:#fee2e2;
classDef c-gateway fill:#2e1065,stroke:#a855f7,stroke-width:1px,color:#f3e8ff;
classDef c-container fill:#422006,stroke:#facc15,stroke-width:1px,color:#fef9c3;
subgraph inference ["Inference Pipeline"]
direction TB
query_api["
"]
class query_api c-network
retriever["
"]
class retriever c-compute
llm_service["
"]
class llm_service c-compute
end
subgraph data_pipeline ["Data Pipeline"]
direction TB
ingestion_pipeline["
"]
class ingestion_pipeline c-compute
embedding_service["
"]
class embedding_service c-compute
doc_storage["
"]
class doc_storage c-database
vector_db["
"]
class vector_db c-database
end
%% Orphans
users["
"]
class users c-actor
%% Edges
users -.-> query_api
query_api -.-> retriever
query_api -.-> llm_service
retriever -.-> vector_db
ingestion_pipeline -.-> embedding_service
ingestion_pipeline -.-> vector_db
Query APIgatewayREST/GraphQL endpoint
Retrieval ServiceserviceSemantic search
LLM ServiceserviceOpenAI / Anthropic API
Document IngestionserviceParse, chunk, embed
Embedding ServiceserviceText → Vectors
Document StoragedatabaseS3 / Blob Storage
Vector DatabasedatabasePinecone / Weaviate
UsersactorEnd users querying AI
AI RAG with LLM
RAG (Retrieval Augmented Generation) architecture combining vector databases with Large Language Models to provide accurate, context-aware AI responses grounded in your own data.
Documents are embedded and stored in a vector database, then retrieved based on semantic similarity to augment LLM prompts with relevant context, reducing hallucinations and improving accuracy.
Tech Stack
| Component | Technology |
|---|---|
| Llm | OpenAI / Anthropic |
| Vector Db | Pinecone / Weaviate |
| Embeddings | OpenAI Ada-002 |
| Orchestration | LangChain |