LangChain and LlamaIndex

🧠 LangChain: Simple Overview

LangChain is a powerful open-source framework that helps you build applications using Large Language Models (LLMs) like GPT-4 by connecting them with external data, tools, and memory. Think of it as the orchestrator that turns a language model into a full-fledged intelligent agent.

🔧 What Does LangChain Do?

Capability	Description
🔍 Retrieval	Connects LLMs to vector databases (e.g., Pinecone, FAISS) for RAG pipelines
🧠 Memory	Adds short-term or long-term memory to chatbots
🛠️ Tools	Lets LLMs use tools like search engines, calculators, or APIs
📚 Chains	Combines multiple steps (e.g., retrieval → generation → summarization) into a workflow
🧩 Agents	Enables LLMs to reason and decide which tools to use dynamically

🧪 Simple LangChain Example (RAG)

from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI

# Load vector store
embedding = OpenAIEmbeddings()
vectorstore = Chroma(persist_directory="db", embedding_function=embedding)

# Create RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    retriever=vectorstore.as_retriever()
)

# Ask a question
query = "What is a vector database?"
result = qa_chain.run(query)
print(result)

🧱 Core Building Blocks

Component	Purpose
LLMs	GPT-4, Claude, LLaMA, etc.
Embeddings	Convert text into vectors
Vector Stores	Store and retrieve documents semantically
Chains	Combine multiple steps (e.g., input → retrieval → generation)
Agents	Let LLMs choose tools dynamically
Memory	Maintain context across conversations

🚀 What Can You Build with LangChain?

RAG-powered chatbots
AI research assistants
Document Q&A systems
Code interpreters
Workflow automation tools

Benefits of Using LangChain
There are several benefits of using LangChain to build applications powered by LLMs. These benefits include:

Ease of use: LangChain makes it easier to use LLMs to build a variety of applications, even if the developer does not have any experience with artificial intelligence (AI) or machine learning
Flexibility: LangChain is a flexible framework that can be used to build a wide variety of applications. Developers are not limited to any specific use case
Scalability: LangChain is scalable to support applications of all sizes. Developers can use LangChain to build applications that serve millions of users
Robustness: LangChain provides several features that make it easier to build robust and reliable applications. For example, LangChain supports caching and error handling

🧱 LangChain Core Components: A Modular Breakdown

LangChain is built around a set of modular components that you can mix and match to build powerful LLM-based applications. Here's a breakdown of the key components and what each one does:

🔹 1. LLMs (Large Language Models)

Purpose: Interface with models like GPT-4, Claude, LLaMA, etc.
Examples:

OpenAI(), ChatOpenAI()
HuggingFaceHub(), Anthropic()

🔹 2. Prompt Templates

Purpose: Standardize and structure prompts for LLMs.
Types:

PromptTemplate → For single-turn prompts
ChatPromptTemplate → For multi-turn chat-style prompts

📌 Example:

from langchain.prompts import PromptTemplate
prompt = PromptTemplate.from_template("Translate '{text}' to French.")

🔹 3. Chains

Purpose: Combine multiple components into a pipeline.
Types:

LLMChain → Prompt + LLM
RetrievalQA → Retriever + LLM
SequentialChain, SimpleSequentialChain → Multi-step workflows

📌 Example:

from langchain.chains import LLMChain
chain = LLMChain(llm=OpenAI(), prompt=prompt)

🔹 4. Memory

Purpose: Store and recall previous interactions (for chatbots).
Types:

ConversationBufferMemory
ConversationSummaryMemory
VectorStoreRetrieverMemory

📌 Example:

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()

🔹 5. Agents

Purpose: Let LLMs choose tools dynamically to solve tasks.
Includes:

Tool use (e.g., calculator, search)
Planning and decision-making
ReAct-style prompting

📌 Example:

from langchain.agents import initialize_agent
agent = initialize_agent(tools, llm, agent="zero-shot-react-description")

🔹 6. Tools

Purpose: External functions the agent can call.
Examples:

Web search
Calculator
Python REPL
Custom APIs

🔹 7. Retrievers

Purpose: Fetch relevant documents from a vector store.
Examples:

Chroma, FAISS, Pinecone, Weaviate
Used in RAG pipelines

📌 Example:

retriever = vectorstore.as_retriever()

🔹 8. Document Loaders & Text Splitters

Purpose: Load and chunk documents for embedding and retrieval.
Examples:

TextLoader, PDFLoader, UnstructuredLoader
CharacterTextSplitter, RecursiveCharacterTextSplitter

🔹 9. Embeddings

Purpose: Convert text into dense vectors for semantic search.
Examples:

OpenAIEmbeddings, HuggingFaceEmbeddings, CohereEmbeddings

🔹 10. Vector Stores

Purpose: Store and retrieve embeddings.
Examples:

Chroma, FAISS, Pinecone, Qdrant, Weaviate

🔄 LangChain Model I/O: Inputs, Outputs, and Interfaces

LangChain’s Model I/O module is all about how you interact with language models—how you structure inputs, manage outputs, and control the flow of information between components like prompts, LLMs, and chains.

Let’s break it down:

🔹 1. Prompt Templates

Purpose: Structure and format inputs to LLMs.

Types:

PromptTemplate: For single-turn prompts.
ChatPromptTemplate: For multi-turn chat-style prompts.

📌 Example:

from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("Translate '{text}' to French.")
formatted = prompt.format(text="Hello")
# Output: "Translate 'Hello' to French."

🔹 2. LLMs and Chat Models

Purpose: Interface with language models.

Types:

LLM: For text completion models (e.g., GPT-3).
ChatModel: For chat-based models (e.g., GPT-4, Claude).

📌 Example:

from langchain.llms import OpenAI
llm = OpenAI()
response = llm("What is LangChain?")

🔹 3. Output Parsers

Purpose: Convert raw LLM output into structured formats (JSON, lists, etc.).

Types:

StrOutputParser: Returns plain strings.
CommaSeparatedListOutputParser: Parses comma-separated values.
PydanticOutputParser: Parses into Pydantic models.

📌 Example:

from langchain.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()
parsed = parser.parse("apples, bananas, oranges")
# Output: ['apples', 'bananas', 'oranges']

🔹 4. Output Fixing Parsers

Purpose: Automatically fix malformed outputs using LLMs.

📌 Example:

from langchain.output_parsers import OutputFixingParser

parser = OutputFixingParser.from_llm(parser=parser, llm=llm)

🔹 5. PromptValue and LLMResult

Purpose: Internal representations of inputs and outputs.

PromptValue: Encapsulates a formatted prompt.
LLMResult: Encapsulates raw output from an LLM call.

These are mostly used under the hood but are important for advanced customization.

🔹 6. Runnable Interfaces

LangChain uses a unified interface called Runnable for all components (prompts, LLMs, chains).

📌 Example:

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

prompt = PromptTemplate.from_template("Tell me a joke about {topic}")
llm = OpenAI()

chain = prompt | llm  # Runnable composition
print(chain.invoke({"topic": "AI"}))

🧠 Summary Table

Component	Role
`PromptTemplate`	Formats input text
`LLM` / `ChatModel`	Generates output
`OutputParser`	Parses or validates output
`Runnable`	Composable interface for chaining steps

🧠 LangChain PromptTemplate: Structure Your Prompts Like a Pro

In LangChain, a PromptTemplate is a reusable, parameterized prompt that helps you format inputs for LLMs in a clean and consistent way. It’s one of the most fundamental building blocks in any LangChain application.

🔹 Why Use PromptTemplate?

✅ Avoid hardcoding prompts
✅ Reuse and customize prompts dynamically
✅ Maintain clean separation between logic and prompt text
✅ Combine with chains, agents, and tools

🔧 Basic Usage

from langchain.prompts import PromptTemplate

# Define a template with placeholders
template = "Translate the following sentence to French: {sentence}"

# Create a PromptTemplate object
prompt = PromptTemplate.from_template(template)

# Format the prompt with actual input
formatted_prompt = prompt.format(sentence="I love machine learning.")
print(formatted_prompt)

📤 Output:

Translate the following sentence to French: I love machine learning.

🔹 Advanced Usage with Multiple Variables

template = """
You are a helpful assistant.
Summarize the following article in {num_words} words:

{article}
"""

prompt = PromptTemplate(
    input_variables=["article", "num_words"],
    template=template
)

formatted = prompt.format(
    article="LangChain is a framework for building applications with LLMs...",
    num_words="50"
)

🔹 Integration with Chains

PromptTemplates are often used with LLMChain:

from langchain.chains import LLMChain
from langchain.llms import OpenAI

llm = OpenAI()
chain = LLMChain(llm=llm, prompt=prompt)

response = chain.run(article="LangChain is...", num_words="30")
print(response)

🔹 ChatPromptTemplate (for Chat Models)

For chat-based models like GPT-4:

from langchain.prompts import ChatPromptTemplate

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "Translate '{text}' to French.")
])

formatted = chat_prompt.format_messages(text="Good morning!")

🧠 Summary

Feature	Benefit
`PromptTemplate`	For single-turn text prompts
`ChatPromptTemplate`	For multi-turn chat-style prompts
`format()`	Injects variables into the template
`from_template()`	Quick creation from a string

🧾 1. Document Loaders

Purpose: Load raw data from various sources (text, PDFs, web pages, etc.) into LangChain-compatible Document objects.

🔹 Common Loaders:

Loader	Source
`TextLoader`	Plain `.txt` files
`PyPDFLoader`	PDF documents
`UnstructuredLoader`	HTML, Word, PowerPoint, etc.
`WebBaseLoader`	Web pages via URL
`DirectoryLoader`	Bulk load files from a folder

📌 Example:

from langchain.document_loaders import TextLoader
loader = TextLoader("data/notes.txt")
documents = loader.load()

✂️ 2. Text Splitters

Purpose: Break large documents into smaller, overlapping chunks for better embedding and retrieval.

🔹 Common Splitters:

Splitter	Description
`CharacterTextSplitter`	Splits by character count
`RecursiveCharacterTextSplitter`	Smart splitting by paragraph → sentence → word
`TokenTextSplitter`	Splits by token count (useful for LLM token limits)

📌 Example:

from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

🧠 3. Text Embeddings

Purpose: Convert text chunks into dense vector representations for semantic similarity search.

🔹 Common Embedding Models:

Model	Provider
`OpenAIEmbeddings`	OpenAI (e.g., `text-embedding-ada-002`)
`HuggingFaceEmbeddings`	Local or hosted transformer models
`CohereEmbeddings`	Cohere API
`GooglePalmEmbeddings`	Google Vertex AI

📌 Example:

from langchain.embeddings import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings()

📦 4. Vector Stores

Purpose: Store and index embeddings for fast similarity search.

🔹 Popular Vector Stores:

Store	Type
`Chroma`	Lightweight, local, LangChain-native
`FAISS`	Open-source, fast, supports GPU
`Pinecone`	Fully managed, scalable cloud DB
`Weaviate`, `Qdrant`, `Milvus`	Open-source, production-grade vector DBs

📌 Example:

from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(chunks, embedding_model)

🔍 5. Retrievers

Purpose: Interface that abstracts how documents are retrieved from a vector store.

🔹 Retriever Types:

Retriever	Description
`vectorstore.as_retriever()`	Basic semantic search
`MultiQueryRetriever`	Uses multiple reformulated queries
`ContextualCompressionRetriever`	Compresses retrieved docs using an LLM
`SelfQueryRetriever`	Uses LLM to generate structured queries with filters

📌 Example:

retriever = vectorstore.as_retriever(search_type="similarity", k=3)

🧠 Putting It All Together (Mini RAG Pipeline)

from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    retriever=retriever,
    return_source_documents=True
)

query = "What are vector databases used for?"
result = qa_chain(query)

print("Answer:", result["result"])

🔗 LangChain Chains: Orchestrating LLM Workflows

In LangChain, a Chain is a modular pipeline that connects multiple components—like prompts, LLMs, retrievers, and tools—into a sequential or branching workflow. Chains are the backbone of LangChain applications, enabling you to build everything from simple Q&A bots to complex multi-step reasoning agents.

🧱 Types of Chains in LangChain

🔹 1. LLMChain

The most basic chain: combines a prompt and an LLM.

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

prompt = PromptTemplate.from_template("Translate '{text}' to French.")
llm = OpenAI()
chain = LLMChain(llm=llm, prompt=prompt)

response = chain.run(text="Hello, how are you?")

🔹 2. RetrievalQA

Combines a retriever (e.g. vector store) with an LLM to build a RAG pipeline.

from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
response = qa_chain.run("What is a vector database?")

🔹 3. Stuff, Map-Reduce, and Refine Chains

Used for summarization or document synthesis.

Chain Type	Description
`Stuff`	Concatenates all docs into one prompt
`MapReduce`	Summarizes chunks individually, then combines
`Refine`	Iteratively builds a summary by refining previous output

🔹 4. SequentialChain

Executes multiple chains in a fixed order, passing outputs as inputs.

from langchain.chains import SequentialChain

chain1 = LLMChain(llm=llm, prompt=PromptTemplate.from_template("Summarize: {text}"))
chain2 = LLMChain(llm=llm, prompt=PromptTemplate.from_template("Translate to French: {summary}"))

seq_chain = SequentialChain(chains=[chain1, chain2], input_variables=["text"], output_variables=["summary"])

🔹 5. SimpleSequentialChain

A simplified version of SequentialChain for linear flows.

from langchain.chains import SimpleSequentialChain

chain = SimpleSequentialChain(llm=llm, prompt=prompt)

🔹 6. MultiPromptChain

Routes input to different prompts based on topic or intent.

🔹 7. RouterChain

Dynamically selects a sub-chain based on input characteristics.

🧠 When to Use Chains

Use Case	Recommended Chain
Basic prompt + LLM	`LLMChain`
RAG pipeline	`RetrievalQA`
Summarization	`MapReduceChain`, `RefineChain`
Multi-step workflows	`SequentialChain`, `RouterChain`
Topic-based routing	`MultiPromptChain`

LCEL VS Legacy Chains

Feature	LCEL (LangChain Expression Language)	Chain Class (Legacy)
🧱 Style	Declarative, composable	Object-oriented
🔗 Composition	Uses	(pipe) operator	Uses class inheritance
⚡ Performance	Async, streaming, batch-ready	Less optimized
🧠 Flexibility	Easily mix LLMs, tools, retrievers	Harder to customize
🧪 Debugging	Transparent, introspectable	More opaque
🛠️ Status	Modern, recommended	Legacy, still supported but not preferred

🔹 1. LCEL (LangChain Expression Language)

LCEL is a new, functional-style API introduced in LangChain to make chains more:

Composable
Transparent
Async-friendly
Easier to debug and test

✅ Key Features:

Uses the | (pipe) operator to chain components
All components implement the Runnable interface
Supports streaming, batch processing, and tracing

🧪 Example:

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.output_parsers import StrOutputParser

prompt = PromptTemplate.from_template("Translate '{text}' to French.")
llm = OpenAI()
parser = StrOutputParser()

chain = prompt | llm | parser
result = chain.invoke({"text": "Good morning!"})

🔹 2. Chain Class (Legacy)

The legacy Chain classes (like LLMChain, SequentialChain, RetrievalQA) are object-oriented wrappers that encapsulate logic in a more rigid structure.

✅ Key Features:

Easy to use for simple pipelines
Good for beginners
Still supported, but less flexible

🧪 Example:

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

prompt = PromptTemplate.from_template("Translate '{text}' to French.")
llm = OpenAI()
chain = LLMChain(prompt=prompt, llm=llm)

result = chain.run(text="Good morning!")

🧠 When to Use What?

Use Case	Recommended
Prototyping or legacy code	Chain classes
Production-ready pipelines	LCEL
Async or streaming apps	LCEL
Complex workflows (tools, retrievers, memory)	LCEL
Multi-modal or multi-step chains	LCEL

🧠 LangChain Agents: LLMs That Can Think and Act

LangChain Agents are one of the most powerful features in the framework. They allow a language model to dynamically decide what actions to take, such as calling tools, querying APIs, or performing calculations—based on the user’s input and the current context.

🔍 What Is an Agent?

An agent is an LLM-powered decision-maker that:

Interprets the user’s query.
Chooses the appropriate tool(s) to use.
Executes actions in sequence.
Synthesizes the final answer.

📌 Think of it as an LLM with a brain and a toolbox.

🧰 Common Tools Agents Can Use

Tool	Purpose
`LLM Math`	Solve math problems using Python
`SerpAPI`	Perform web searches
`Python REPL`	Run Python code
`VectorStoreRetriever`	Retrieve documents from a vector DB
`Custom APIs`	Call your own endpoints

🧠 Agent Types in LangChain

Agent Type	Description
`zero-shot-react-description`	Uses ReAct-style prompting to choose tools based on descriptions
`chat-zero-shot-react-description`	Same as above but optimized for chat models
`structured-chat-zero-shot-react`	Returns structured outputs
`openai-functions`	Uses OpenAI’s function calling API (if supported)

🧪 Example: Agent with Math and Search Tools

from langchain.agents import initialize_agent, load_tools
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description",
    verbose=True
)

agent.run("What is the square root of the year Einstein published the theory of relativity?")

🔄 How Agents Work (ReAct Loop)

Thought: "I need to look up the year Einstein published his theory."
Action: Search["Einstein theory of relativity year"]
Observation: "1905"
Thought: "Now I can calculate the square root."
Action: Calculator["sqrt(1905)"]
Observation: "43.65"
Final Answer: "The square root is approximately 43.65."

🧠 When to Use Agents

Scenario	Use Agents?
Static prompt + LLM	❌ Use `LLMChain`
Dynamic tool use	✅ Use `Agent`
Multi-step reasoning	✅ Use `Agent`
API orchestration	✅ Use `Agent`
Simple RAG	❌ Use `RetrievalQA`

One of the key features of LangChain is its support for chaining prompts. This means that developers can combine multiple prompts together to create more complex and nuanced requests. Another key feature of LangChain is its support for modular components. This means that developers can reuse components from different chains to create new chains. This can save developers a lot of time and effort, and it also makes it easier to share and collaborate on chains.

LangChain offers a suite of tools, components and interfaces that simplify the construction of LLM-centric applications. LangChain provides an LLM class designed for interfacing with various language models providers, such as OpenAI, Cohere and Hugging Face, that makes it easier to build LLM-agnostic applications by simply switching the language models, allowing developers to focus on the application logic without delving into the complexities of dealing with vendor-specific language models. The versatility and flexibility of LangChain enable seamless integration with various data sources, making it a comprehensive solution for creating advanced language model-powered applications.

The open-source framework of LangChain is available to build applications in Python or JavaScript/TypeScript. Its core design principle is composition and modularity. By combining modules and components, one can quickly build complex LLM-based applications. LangChain is an open-source framework that makes it easier to build powerful applications with LLMs relevant to the interests and needs of the user. It connects to external systems to access information required to solve complex problems. It provides abstractions for most of the functionalities needed for building an LLM application and also has integrations that can readily read and write data, reducing the development time of the application. LangChains’s framework allows for building applications that are agnostic to the underlying language model. With its ever-expanding support for various LLMs, LangChain offers a unique value proposition to build applications and iterate continuously.

🧪 LangSmith: Observability & Evaluation for LLM Apps

LangSmith is a developer platform built by the creators of LangChain to help you debug, test, evaluate, and monitor your LLM-powered applications. Think of it as the “LangChain DevTools”—a powerful companion for building reliable, production-grade AI systems.

🔍 Why Use LangSmith?

Feature	Benefit
🐞 Debugging	Visualize every step in your chain or agent
📊 Evaluation	Run automated or human-in-the-loop evaluations
🔁 Tracing	Inspect inputs, outputs, intermediate steps, and tool calls
🧪 Testing	Create test suites for prompts, chains, and agents
📈 Monitoring	Track performance, latency, and failure rates in production

🧱 Key Concepts

🔹 1. Traces

A trace is a full record of a chain or agent run, including:

Inputs and outputs
Intermediate steps (e.g., tool calls, LLM generations)
Errors and retries

🔹 2. Datasets

Collections of test inputs and expected outputs used for:

Regression testing
Prompt tuning
Model comparisons

🔹 3. Evaluators

Automated or manual scoring functions to assess:

Accuracy
Relevance
Helpfulness
Toxicity

🧪 Example: Logging a Chain Run to LangSmith

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langsmith import traceable

# Enable LangSmith tracing
import os
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_TRACING_V2"] = "true"

# Define a simple chain
prompt = PromptTemplate.from_template("Translate '{text}' to French.")
llm = OpenAI()
chain = LLMChain(llm=llm, prompt=prompt)

# Run with tracing
result = chain.run("Good morning!")
print(result)

You’ll see the full trace in your LangSmith dashboard.

🧠 When to Use LangSmith

Scenario	Why LangSmith Helps
Building complex chains or agents	Visualize and debug each step
Evaluating prompt changes	Run A/B tests and regression checks
Deploying to production	Monitor performance and failures
Collaborating with teams	Share traces and test results

🚀 Bonus: LangSmith + LangChain Expression Language (LCEL)

LangSmith works seamlessly with LCEL pipelines. Just set the environment variable and all Runnable components will be traced automatically.

🚀 LangServe: Serve LangChain Apps as APIs

LangServe is a lightweight framework built on top of FastAPI that allows you to deploy LangChain chains and agents as RESTful APIs—quickly and with minimal boilerplate.

It’s perfect for turning your LangChain workflows into production-ready services that can be called from web apps, mobile apps, or other backend systems.

🔧 Why Use LangServe?

Feature	Benefit
⚡ FastAPI-based	High-performance, async-ready API server
🔁 Reusable	Serve any LangChain Runnable (LLMChain, RAG, Agent, etc.)
🧪 LangSmith-compatible	Automatically logs traces for observability
🔐 Secure	Add auth, rate limiting, and CORS easily
📦 Deployable	Works with Docker, serverless, or cloud platforms

🧱 Core Concept: Serve a Runnable Chain

LangServe exposes any LangChain Runnable (like a chain or agent) as a REST API with endpoints like:

POST /invoke → Run the chain
POST /batch → Run multiple inputs
GET /config → View chain metadata

🧪 Example: Serve a Simple LLMChain

1. 📁 Project Structure

my_langserve_app/
├── app.py
├── chain.py
└── requirements.txt

2. 🧠 chain.py

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

prompt = PromptTemplate.from_template("Translate '{text}' to French.")
llm = OpenAI()
chain = LLMChain(prompt=prompt, llm=llm)

# Expose as a Runnable
from langchain.schema.runnable import Runnable
app_chain: Runnable = chain

3. 🚀 app.py

from langserve import add_routes
from fastapi import FastAPI
from chain import app_chain

app = FastAPI()
add_routes(app, app_chain, path="/translate")

4. ▶️ Run the Server

uvicorn app:app --reload --port 8000

5. 🧪 Test It

curl -X POST http://localhost:8000/translate/invoke \
  -H "Content-Type: application/json" \
  -d '{"input": {"text": "Good morning"}}'

🧠 Bonus Features

✅ Works with LCEL (| operator pipelines)
✅ Supports streaming responses
✅ Integrates with LangSmith for tracing
✅ Easily deployable with Docker or serverless platforms

🦙 LlamaIndex: The Data Framework for LLMs

LlamaIndex (formerly known as GPT Index) is a powerful open-source framework designed to help you connect large language models (LLMs) to your external data—like PDFs, databases, Notion docs, APIs, and more.

It’s purpose-built for building RAG (Retrieval-Augmented Generation) systems and semantic search applications with minimal friction.

🧠 Why Use LlamaIndex?

Feature	Benefit
🔌 Data Connectors	Load data from files, APIs, SQL, Notion, etc.
🧾 Indexing	Organize and chunk data into searchable structures
🔍 Retrieval	Perform semantic search over your data
💬 Query Engines	Ask questions and get grounded answers
🧪 Evaluation	Built-in tools for testing and refining pipelines

🧱 Core Components of LlamaIndex

1. Data Connectors

Load data from:

Files (PDF, Markdown, CSV, etc.)
Web pages
APIs
SQL databases
Notion, Google Docs, Airtable, etc.

from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()

2. Node Parsers & Text Splitters

Break documents into manageable chunks (nodes) with metadata.

from llama_index.node_parser import SimpleNodeParser
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

3. Indexing

Build an index to organize and retrieve nodes efficiently.

Index Type	Use Case
`VectorStoreIndex`	Semantic search (most common)
`ListIndex`	Ordered document traversal
`TreeIndex`	Hierarchical summarization
`KeywordTableIndex`	Keyword-based retrieval

from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)

4. Retrievers

Query the index to retrieve relevant chunks.

retriever = index.as_retriever(similarity_top_k=3)

5. Query Engines

Combine retrievers with LLMs to generate answers.

query_engine = index.as_query_engine()
response = query_engine.query("What is LlamaIndex?")
print(response)

6. Storage & Persistence

Save and reload indexes for production use.

index.storage_context.persist("index_storage/")

🔄 LlamaIndex vs. LangChain

Feature	LlamaIndex	LangChain
Focus	Data ingestion, indexing, retrieval	Workflow orchestration, agents, tools
Strength	RAG pipelines, document QA	Agents, tool use, multi-step chains
Integration	Works with LangChain, OpenAI, Hugging Face	Can use LlamaIndex as a retriever

✅ Best of both worlds: Use LlamaIndex for retrieval and LangChain for orchestration.

🚀 Use Cases

RAG-powered chatbots
Enterprise document search
Academic research assistants
Legal/medical Q&A systems
Personal knowledge bases

🔌 LlamaIndex Data Connectors: Bringing External Data to LLMs

Data connectors in LlamaIndex are modules that allow you to ingest data from a wide variety of sources—files, APIs, databases, cloud platforms, and more—so that you can build powerful RAG (Retrieval-Augmented Generation) systems grounded in your own knowledge base.

🧾 Categories of Data Connectors

🔹 1. File-Based Connectors

Connector	Description
SimpleDirectoryReader	Loads all files from a local directory
PDFReader	Parses PDFs using PyMuPDF or pdfplumber
CSVReader	Loads structured data from CSV files
MarkdownReader	Parses `.md` files
DocxReader	Reads Microsoft Word `.docx` files
HTMLReader	Parses HTML content
ImageReader	Extracts text from images using OCR (e.g., Tesseract)

📌 Example:

from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader("data/").load_data()

🔹 2. Web & Cloud Connectors

Connector	Description
WebPageReader	Scrapes and parses content from URLs
NotionPageReader	Loads content from Notion pages
GoogleDocsReader	Connects to Google Docs via API
SlackReader	Ingests messages from Slack channels
GitHubRepositoryReader	Loads code and docs from GitHub repos
ConfluenceReader	Connects to Atlassian Confluence pages

📌 Example:

from llama_index.readers.web import WebPageReader
documents = WebPageReader().load_data(["https://example.com"])

🔹 3. Database & Structured Data Connectors

Connector	Description
SQLDatabaseReader	Connects to SQL databases (PostgreSQL, MySQL, SQLite)
MongoDBReader	Loads documents from MongoDB collections
AirtableReader	Connects to Airtable bases
GoogleSheetsReader	Reads from Google Sheets

📌 Example:

from llama_index.readers.database import SQLDatabaseReader
reader = SQLDatabaseReader(uri="sqlite:///mydb.sqlite")
documents = reader.load_data("SELECT * FROM customers")

🔹 4. API & Custom Connectors

Connector	Description
OpenAPIReader	Connects to OpenAPI-compatible APIs
RSSReader	Loads content from RSS feeds
CustomReader	Build your own connector using the BaseReader class

📌 Example:

from llama_index.readers.schema.base import BaseReader

class MyCustomReader(BaseReader):
    def load_data(self, **kwargs):
        # Fetch and return Document objects
        return [Document(text="Custom data here")]

🧠 Best Practices

Use metadata (e.g., source, timestamp) to enhance retrieval quality
Combine multiple connectors for hybrid knowledge bases
Persist documents using LlamaIndex’s StorageContext for reuse

🧪 Bonus: Combine with Node Parsers

After loading data, use a NodeParser to chunk and structure it:

from llama_index.node_parser import SimpleNodeParser
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

🦙 Core Components of LlamaIndex

LlamaIndex is designed to help you build powerful Retrieval-Augmented Generation (RAG) systems by connecting LLMs to your external data. Its architecture is modular and consists of several core components that work together to ingest, index, retrieve, and query data.

🧱 1. Data Connectors (Loaders)

Purpose: Ingest data from various sources.

Source Type	Examples
Files	PDFs, CSVs, Markdown, DOCX
Web	URLs, Notion, Confluence, GitHub
Databases	SQL, MongoDB, Airtable
APIs	OpenAPI, RSS, custom endpoints

📌 Example:

from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader("data/").load_data()

✂️ 2. Node Parsers (Text Splitters)

Purpose: Break documents into smaller, manageable chunks called “nodes” with metadata.

Parser	Description
SimpleNodeParser	Basic chunking by character count
SentenceWindowParser	Sentence-aware chunking
HierarchicalNodeParser	Multi-level chunking for tree-based indexes

📌 Example:

from llama_index.node_parser import SimpleNodeParser
nodes = SimpleNodeParser().get_nodes_from_documents(documents)

🧠 3. Embedding Models

Purpose: Convert text chunks (nodes) into dense vector representations for semantic search.

Provider	Examples
OpenAI	`text-embedding-ada-002`
Hugging Face	SentenceTransformers
Cohere	`embed-english-v3.0`

📌 Example:

from llama_index.embeddings import OpenAIEmbedding
embed_model = OpenAIEmbedding()

📦 4. Indexes

Purpose: Organize and store nodes for efficient retrieval.

Index Type	Use Case
VectorStoreIndex	Semantic search (most common)
ListIndex	Ordered traversal (e.g., summarization)
TreeIndex	Hierarchical summarization
KeywordTableIndex	Keyword-based search

📌 Example:

from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)

🔍 5. Retrievers

Purpose: Retrieve relevant nodes from an index based on a query.

Retriever	Description
DefaultRetriever	Basic top-k similarity search
BM25Retriever	Keyword-based retrieval
HybridRetriever	Combines vector + keyword search
AutoMergingRetriever	Merges overlapping chunks for better context

📌 Example:

retriever = index.as_retriever(similarity_top_k=3)

💬 6. Query Engines

Purpose: Combine retrievers with LLMs to generate answers.

Engine	Description
SimpleQueryEngine	Basic RAG
RetrieverQueryEngine	Custom retriever + LLM
SubQuestionQueryEngine	Breaks complex queries into sub-questions
SQLQueryEngine	Queries SQL databases using natural language

📌 Example:

query_engine = index.as_query_engine()
response = query_engine.query("What is LlamaIndex?")

💾 7. Storage Context

Purpose: Persist and reload indexes, documents, and embeddings.

📌 Example:

index.storage_context.persist("storage/")

🧪 8. Evaluation & Observability

Purpose: Test and debug your RAG pipeline.

Tool	Use
LangSmith	Trace and evaluate runs
Built-in Evaluators	Accuracy, relevance, faithfulness
Dataset Generator	Create test sets from your data

🧠 Summary Table

Component	Role
Data Connectors	Load data from files, APIs, DBs
Node Parsers	Chunk and structure documents
Embeddings	Convert text to vectors
Indexes	Organize and store nodes
Retrievers	Fetch relevant chunks
Query Engines	Generate answers using LLMs
Storage	Save and reload pipelines
Evaluation	Test and debug performance

🧱 Types of Indexes in LlamaIndex

In LlamaIndex, an index is a data structure that organizes your documents (or nodes) to enable efficient retrieval and interaction with LLMs. Each index type is optimized for a different use case—whether it's semantic search, summarization, or keyword lookup.

🔹 1. VectorStoreIndex (Most Common)

Purpose: Semantic search using vector similarity.

Stores embeddings of document chunks (nodes)
Supports top-k retrieval based on cosine similarity or other distance metrics
Works with vector stores like FAISS, Pinecone, Chroma, Weaviate

📌 Use Case: Retrieval-Augmented Generation (RAG), semantic Q&A

from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)

🔹 2. ListIndex

Purpose: Ordered traversal of documents.

Stores documents in a linear list
Useful for summarization or sequential reading
No semantic search—retrieves all documents in order

📌 Use Case: Document summarization, storytelling, walkthroughs

from llama_index import ListIndex
index = ListIndex.from_documents(documents)

🔹 3. TreeIndex

Purpose: Hierarchical summarization and reasoning.

Builds a tree of summaries from document chunks
Each parent node summarizes its children
Enables recursive summarization and multi-level reasoning

📌 Use Case: Long document summarization, nested Q&A, outline generation

from llama_index import TreeIndex
index = TreeIndex.from_documents(documents)

🔹 4. KeywordTableIndex

Purpose: Keyword-based retrieval (non-semantic).

Extracts keywords from documents and builds an inverted index
Fast keyword lookup without embeddings
Lightweight and interpretable

📌 Use Case: Simple keyword search, fallback when embeddings are unavailable

from llama_index import KeywordTableIndex
index = KeywordTableIndex.from_documents(documents)

🧠 Summary Table

Index Type	Retrieval Style	Best For
VectorStoreIndex	Semantic similarity	RAG, semantic search, Q&A
ListIndex	Sequential	Summarization, walkthroughs
TreeIndex	Hierarchical	Recursive summarization, long docs
KeywordTableIndex	Keyword match	Lightweight search, no embeddings

🔌 Connecting LlamaIndex with Different LLMs

LlamaIndex is model-agnostic—it supports a wide range of LLMs from different providers, allowing you to plug in the model that best fits your use case, whether it's hosted (like OpenAI) or local (like LLaMA or Mistral).

🧠 How LLMs Are Used in LlamaIndex

LLMs in LlamaIndex are used for:

Generating answers (via Query Engines)
Summarizing documents
Refining responses
Re-ranking retrieved results
Evaluating outputs

🔹 1. OpenAI (GPT-3.5, GPT-4)

from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-4", temperature=0.3)

✅ Requires OPENAI_API_KEY
✅ Great for RAG, summarization, and reasoning tasks

🔹 2. Anthropic (Claude)

from llama_index.llms import Anthropic

llm = Anthropic(model="claude-2", temperature=0.5)

✅ Requires ANTHROPIC_API_KEY
✅ Known for long context windows and safe outputs

🔹 3. Hugging Face (Hosted or Local)

from llama_index.llms import HuggingFaceLLM

llm = HuggingFaceLLM(
    model_name="tiiuae/falcon-7b-instruct",
    tokenizer_name="tiiuae/falcon-7b-instruct",
    context_window=2048,
    max_new_tokens=256
)

✅ Works with Hugging Face Hub or local models
✅ Ideal for open-source deployments

🔹 4. Google Vertex AI (PaLM, Gemini)

from llama_index.llms import VertexAI

llm = VertexAI(model="text-bison", temperature=0.2)

✅ Requires Google Cloud setup
✅ Good for enterprise and multilingual use cases

🔹 5. Cohere

from llama_index.llms import Cohere

llm = Cohere(model="command-xlarge-nightly", temperature=0.4)

✅ Requires COHERE_API_KEY
✅ Strong performance on command-following tasks

🔹 6. Local Models (LLaMA, Mistral, etc.)

Use with backends like:

🔧 Ollama
🔧 LM Studio
🔧 Hugging Face Transformers
🔧 vLLM or Text Generation Inference (TGI)

📌 Example with Ollama:

from llama_index.llms import Ollama

llm = Ollama(model="llama2")

📌 Example with Transformers:

from llama_index.llms import HuggingFaceLLM

🧪 Using the LLM in a Query Engine

from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(llm=llm)

response = query_engine.query("What is LlamaIndex?")
print(response)

🧠 Summary Table

Provider	Class	Notes
OpenAI	`OpenAI`	GPT-3.5, GPT-4
Anthropic	`Anthropic`	Claude 1/2
Hugging Face	`HuggingFaceLLM`	Local or hosted models
Google	`VertexAI`	PaLM, Gemini
Cohere	`Cohere`	Command models
Ollama	`Ollama`	Local LLaMA, Mistral, etc.

Building a simple RAG (Retrieval-Augmented Generation) pipeline using 🦙 LlamaIndex with:

PDF/Text file ingestion
Node parsing and vector indexing
OpenAI for embeddings and LLM
Semantic search and query answering

🧪 Full LlamaIndex RAG Pipeline (Step-by-Step)

✅ Prerequisites

Install the required packages:

pip install llama-index openai PyPDF2

Set your OpenAI API key:

export OPENAI_API_KEY=your-api-key

📁 Folder Structure

llamaindex_rag/
├── data/
│   └── example.pdf  # or .txt
├── app.py

🧠 app.py

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.node_parser import SimpleNodeParser
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding
from llama_index.query_engine import RetrieverQueryEngine

# Step 1: Load documents from a folder
documents = SimpleDirectoryReader("data").load_data()

# Step 2: Parse documents into nodes (chunks)
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

# Step 3: Set up embedding model and LLM
embed_model = OpenAIEmbedding()
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)

# Step 4: Create a vector index from nodes
index = VectorStoreIndex(nodes, embed_model=embed_model)

# Step 5: Create a retriever and query engine
retriever = index.as_retriever(similarity_top_k=3)
query_engine = RetrieverQueryEngine.from_args(retriever=retriever, llm=llm)

# Step 6: Ask a question
query = "What is this document about?"
response = query_engine.query(query)

# Step 7: Print the answer
print("\n🧠 Answer:")
print(response)

📄 Example Output

🧠 Answer:
This document discusses the fundamentals of vector databases and their role in semantic search...

🧠 What This Pipeline Does

Step	Purpose
📂 Load	Ingests files from the `data/` folder
✂️ Parse	Splits documents into chunks (nodes)
🧠 Embed	Converts chunks into vectors using OpenAI
📦 Index	Stores vectors in memory for semantic search
🔍 Retrieve	Finds top-k relevant chunks
💬 Generate	Uses GPT to synthesize an answer from context