LangChain: LLM Application Development Framework
Complete setup and integration guide for LangChain - the powerful framework for developing applications powered by large language models. Covers installation, core concepts, chains, agents, retrieval-augmented generation (RAG), memory systems, and production deployment patterns.
- Step 1
What is LangChain?
LangChain is an open-source framework designed to simplify the development of applications powered by large language models (LLMs). With over 87,000 stars on GitHub, it has become the industry standard for building LLM applications. LangChain provides abstractions and tools for common LLM patterns including chains (sequential operations), agents (autonomous decision-making), memory (context persistence), and retrieval-augmented generation (RAG). The framework supports multiple languages with official implementations in Python and JavaScript/TypeScript.
- Step 2
Core Architecture and Concepts
LangChain is built around several key abstractions that work together to create powerful LLM applications:
Models - Wrappers for various LLM providers (OpenAI, Anthropic, Cohere, HuggingFace, local models)
Prompts - Template management and optimization for LLM inputs
Chains - Sequences of calls to models, utilities, or other chains
Agents - Autonomous systems that use LLMs to decide which actions to take
Memory - Mechanisms to persist state between chain/agent calls
Retrievers - Interfaces for fetching relevant documents from vector stores or databases
Callbacks - Hooks for logging, monitoring, and streaming intermediate steps
LangChain Architecture: ├── Models (LLM wrappers) ├── Prompts (template management) ├── Chains (sequential operations) ├── Agents (autonomous decision-making) ├── Memory (context persistence) ├── Retrievers (document fetching) └── Callbacks (logging & monitoring) - Step 3
Python Installation
LangChain for Python requires Python 3.8 or higher. The core package is lightweight, and you can install additional integrations as needed. For most production applications, you'll also want langchain-community (community integrations) and specific provider packages.
# Install core LangChain (lightweight, minimal dependencies) pip install langchain # Install with common integrations pip install langchain langchain-community langchain-openai # Install specific provider packages pip install langchain-anthropic # For Claude pip install langchain-google-genai # For Gemini pip install langchain-cohere # For Cohere # Install vector store integrations pip install langchain-chroma # ChromaDB pip install langchain-pinecone # Pinecone # Verify installation python -c "import langchain; print(langchain.__version__)"⚠ Heads up: Starting with LangChain 0.2.0, integrations have been split into separate packages. Always install the specific provider packages you need (langchain-openai, langchain-anthropic, etc.) rather than relying on the deprecated all-in-one langchain package. - Step 4
JavaScript/TypeScript Installation
LangChain.js is the official JavaScript/TypeScript implementation, designed for Node.js, browsers, and edge runtime environments. It follows the same architectural patterns as the Python version but is optimized for JavaScript ecosystems.
# Install core LangChain.js npm install langchain # Or with yarn yarn add langchain # Install specific integrations npm install @langchain/openai npm install @langchain/anthropic npm install @langchain/community # For vector stores npm install @langchain/pinecone npm install chromadb # ChromaDB client # Verify installation node -e "console.log(require('langchain').version)" - Step 5
Setting Up API Keys
LangChain connects to various LLM providers, each requiring API credentials. Store these in environment variables or a .env file (never commit API keys to version control). Most applications will start with OpenAI or Anthropic.
# Create a .env file in your project root touch .env # Add your API keys (choose the providers you need) OPENAI_API_KEY=sk-your-openai-key-here ANTHROPIC_API_KEY=sk-ant-your-anthropic-key COHERE_API_KEY=your-cohere-key HUGGINGFACEHUB_API_KEY=your-huggingface-key # For vector databases PINECONE_API_KEY=your-pinecone-key PINECONE_ENVIRONMENT=your-environment # Install python-dotenv to load .env files pip install python-dotenv⚠ Heads up: Add .env to your .gitignore file to prevent accidentally committing API keys. Get OpenAI keys at: https://platform.openai.com/api-keys and Anthropic keys at: https://console.anthropic.com/ - Step 6
First LangChain Application (Python)
Let's create a simple chain that uses an LLM to answer questions. This example demonstrates the basic pattern: create a model, create a prompt template, and chain them together.
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser import os from dotenv import load_dotenv # Load environment variables load_dotenv() # Initialize the model llm = ChatOpenAI(model="gpt-4", temperature=0.7) # Create a prompt template prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant that explains technical concepts clearly."), ("user", "{question}") ]) # Create the chain using the new pipe syntax chain = prompt | llm | StrOutputParser() # Invoke the chain response = chain.invoke({"question": "What is LangChain?"}) print(response) - Step 7
First LangChain Application (JavaScript)
The JavaScript/TypeScript version follows the same pattern with async/await syntax. This example shows the modern LangChain Expression Language (LCEL) approach.
import { ChatOpenAI } from "@langchain/openai"; import { ChatPromptTemplate } from "@langchain/core/prompts"; import { StringOutputParser } from "@langchain/core/output_parsers"; import * as dotenv from "dotenv"; // Load environment variables dotenv.config(); // Initialize the model const llm = new ChatOpenAI({ modelName: "gpt-4", temperature: 0.7, }); // Create a prompt template const prompt = ChatPromptTemplate.fromMessages([ ["system", "You are a helpful assistant that explains technical concepts clearly."], ["user", "{question}"], ]); // Create the chain const chain = prompt.pipe(llm).pipe(new StringOutputParser()); // Invoke the chain const response = await chain.invoke({ question: "What is LangChain?", }); console.log(response); - Step 8
Building Retrieval-Augmented Generation (RAG) Systems
RAG is one of the most powerful patterns in LangChain - it combines document retrieval with LLM generation to answer questions based on your own data. This involves loading documents, splitting them into chunks, creating embeddings, storing in a vector database, and querying with semantic search.
from langchain_community.document_loaders import TextLoader from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_openai import ChatOpenAI from langchain.chains import RetrievalQA # 1. Load documents loader = TextLoader("path/to/your/document.txt") documents = loader.load() # 2. Split into chunks text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200 ) splits = text_splitter.split_documents(documents) # 3. Create embeddings and store in vector database embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents( documents=splits, embedding=embeddings, persist_directory="./chroma_db" ) # 4. Create retrieval chain llm = ChatOpenAI(model="gpt-4", temperature=0) qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 3}) ) # 5. Query your documents response = qa_chain.invoke({"query": "What is the main topic of this document?"}) print(response["result"])⚠ Heads up: Embeddings API calls cost money. OpenAI charges per token for embeddings. For large document collections, consider using free local embeddings models or caching embeddings in your vector store. - Step 9
Advanced RAG with LCEL (LangChain Expression Language)
The modern approach to RAG uses LangChain Expression Language (LCEL) for more control and better streaming support. This pattern is now the recommended way to build RAG applications.
from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser # Set up vector store (assume already populated) embeddings = OpenAIEmbeddings() vectorstore = Chroma( persist_directory="./chroma_db", embedding_function=embeddings ) retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) # Create prompt template template = """Answer the question based only on the following context: Context: {context} Question: {question} Answer:""" prompt = ChatPromptTemplate.from_template(template) llm = ChatOpenAI(model="gpt-4", temperature=0) # Build the chain with LCEL rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # Query response = rag_chain.invoke("What are the key findings?") print(response) - Step 10
Creating Autonomous Agents
Agents use LLMs to decide which actions to take and in what order. LangChain provides agent executors that can use tools (functions) to accomplish complex tasks. This is ideal for workflows where the steps aren't predetermined.
from langchain.agents import create_openai_functions_agent, AgentExecutor from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_community.tools import DuckDuckGoSearchRun from langchain.tools import Tool # Define tools the agent can use search = DuckDuckGoSearchRun() tools = [ Tool( name="Search", func=search.run, description="Search the web for current information" ) ] # Create the prompt prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant. Use the search tool when you need current information."), ("user", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ]) # Initialize LLM and agent llm = ChatOpenAI(model="gpt-4", temperature=0) agent = create_openai_functions_agent(llm, tools, prompt) # Create agent executor agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True ) # Run the agent result = agent_executor.invoke({ "input": "What are the latest developments in AI as of 2024?" }) print(result["output"]) - Step 11
Memory and Conversation Management
Memory allows chains and agents to remember previous interactions. LangChain provides several memory types: ConversationBufferMemory (stores all messages), ConversationSummaryMemory (summarizes old messages), and ConversationBufferWindowMemory (keeps last N messages).
from langchain_openai import ChatOpenAI from langchain.chains import ConversationChain from langchain.memory import ConversationBufferMemory # Initialize LLM and memory llm = ChatOpenAI(model="gpt-4", temperature=0.7) memory = ConversationBufferMemory() # Create conversation chain with memory conversation = ConversationChain( llm=llm, memory=memory, verbose=True ) # Have a conversation print(conversation.predict(input="My name is Alice")) print(conversation.predict(input="What's 2+2?")) print(conversation.predict(input="What's my name?")) # Should remember "Alice" # View conversation history print(memory.buffer) - Step 12
Advanced Memory with Message History
For production applications, use the modern message history approach with LCEL. This provides better control over memory management and integrates cleanly with streaming.
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_core.runnables.history import RunnableWithMessageHistory from langchain_community.chat_message_histories import ChatMessageHistory # In-memory store (use Redis or database in production) store = {} def get_session_history(session_id: str): if session_id not in store: store[session_id] = ChatMessageHistory() return store[session_id] # Create prompt with history placeholder prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), MessagesPlaceholder(variable_name="history"), ("user", "{input}") ]) # Create chain llm = ChatOpenAI(model="gpt-4") chain = prompt | llm # Wrap with message history with_message_history = RunnableWithMessageHistory( chain, get_session_history, input_messages_key="input", history_messages_key="history" ) # Use with session management config = {"configurable": {"session_id": "user123"}} response = with_message_history.invoke( {"input": "My name is Bob"}, config=config ) print(response.content) # Later messages remember context response = with_message_history.invoke( {"input": "What's my name?"}, config=config ) print(response.content) # Should remember "Bob" - Step 13
Streaming Responses
For better user experience in chat applications, stream LLM responses token-by-token rather than waiting for the complete response. LangChain provides streaming support for both simple chains and complex agents.
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # Create chain llm = ChatOpenAI(model="gpt-4", streaming=True) prompt = ChatPromptTemplate.from_template("Tell me a story about {topic}") chain = prompt | llm | StrOutputParser() # Stream the response for chunk in chain.stream({"topic": "AI agents"}): print(chunk, end="", flush=True) print() # New line after streaming - Step 14
Working with Different Document Loaders
LangChain supports loading documents from various sources: PDFs, web pages, Google Drive, Notion, GitHub, and more. Each loader is optimized for its source format.
from langchain_community.document_loaders import ( PyPDFLoader, WebBaseLoader, NotionDirectoryLoader, GitLoader, CSVLoader ) # Load PDF pdf_loader = PyPDFLoader("document.pdf") pdf_docs = pdf_loader.load() # Load from web page web_loader = WebBaseLoader("https://example.com/article") web_docs = web_loader.load() # Load from GitHub repository git_loader = GitLoader( clone_url="https://github.com/user/repo", repo_path="./temp/", branch="main" ) git_docs = git_loader.load() # Load CSV csv_loader = CSVLoader(file_path="data.csv") csv_docs = csv_loader.load() # Each loader returns a list of Document objects for doc in pdf_docs: print(doc.page_content[:200]) # First 200 chars print(doc.metadata) # Source, page number, etc.⚠ Heads up: Some loaders require additional dependencies. Install with: pip install pypdf (for PDFs), pip install beautifulsoup4 (for web scraping), or check the LangChain documentation for specific loader requirements. - Step 15
Vector Store Options and Selection
LangChain integrates with many vector databases. The choice depends on your scale, budget, and deployment requirements:
Chroma - Best for local development and small-scale applications. Free, runs in-memory or persists to disk.
Pinecone - Managed cloud vector database. Best for production applications. Excellent performance and reliability.
Weaviate - Open-source with cloud option. Good balance of features and control.
FAISS - Meta's similarity search library. Fast, runs locally, no network calls. Good for offline applications.
Qdrant - Modern vector database with excellent filtering. Open-source with cloud option.
pgvector - PostgreSQL extension. Best if you already use PostgreSQL and want to avoid adding another database.
# Chroma (local, free) from langchain_community.vectorstores import Chroma vectorstore = Chroma.from_documents( documents=docs, embedding=embeddings, persist_directory="./db" ) # Pinecone (cloud, managed) from langchain_pinecone import PineconeVectorStore import pinecone pinecone.init(api_key="your-key", environment="your-env") vectorstore = PineconeVectorStore.from_documents( documents=docs, embedding=embeddings, index_name="my-index" ) # FAISS (local, fast) from langchain_community.vectorstores import FAISS vectorstore = FAISS.from_documents( documents=docs, embedding=embeddings ) # Save to disk vectorstore.save_local("faiss_index") # Load later vectorstore = FAISS.load_local("faiss_index", embeddings) - Step 16
LangSmith for Observability and Debugging
LangSmith is the official LangChain platform for debugging, testing, evaluating, and monitoring LLM applications. It provides tracing, evaluation datasets, and production monitoring. Sign up at smith.langchain.com for a free account.
# Set environment variables to enable LangSmith tracing export LANGCHAIN_TRACING_V2=true export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com" export LANGCHAIN_API_KEY="your-langsmith-api-key" export LANGCHAIN_PROJECT="your-project-name" # Now run your LangChain code normally # All chains will automatically send traces to LangSmith - Step 17
Production Deployment Patterns
When deploying LangChain applications to production, consider these best practices:
1. Use async operations - LangChain provides async versions of most operations for better performance under load.
2. Implement proper error handling - LLM APIs can fail or timeout. Use retries and fallbacks.
3. Cache embeddings - Don't recompute embeddings for the same content. Store them in your vector database.
4. Rate limiting - Implement rate limiting to avoid hitting API quotas.
5. Monitoring - Use LangSmith or similar tools to track performance, costs, and errors.
6. Version control prompts - Store prompts in version control, not hardcoded in code.
7. Secure API keys - Use environment variables or secret managers, never commit keys.
# Example async implementation from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate import asyncio async def process_query(query: str): llm = ChatOpenAI(model="gpt-4") prompt = ChatPromptTemplate.from_template("{query}") chain = prompt | llm try: # Use ainvoke for async response = await chain.ainvoke({"query": query}) return response.content except Exception as e: # Log error, return fallback print(f"Error: {e}") return "Sorry, I encountered an error. Please try again." # Process multiple queries concurrently async def main(): queries = ["Query 1", "Query 2", "Query 3"] results = await asyncio.gather(*[ process_query(q) for q in queries ]) return results # Run results = asyncio.run(main()) - Step 18
Testing LangChain Applications
Proper testing is crucial for LLM applications. Use LangSmith datasets for evaluation, mock LLMs for unit tests, and implement quality checks for production.
# Unit testing with FakeListLLM (deterministic) from langchain_community.llms.fake import FakeListLLM from langchain.prompts import PromptTemplate def test_chain_structure(): # Create fake LLM with predetermined responses fake_llm = FakeListLLM( responses=["Response 1", "Response 2"] ) prompt = PromptTemplate.from_template("Question: {question}") chain = prompt | fake_llm result = chain.invoke({"question": "test"}) assert result == "Response 1" # Integration testing with real LLM import pytest @pytest.mark.integration def test_rag_chain_with_real_llm(): # Test actual RAG pipeline from langchain_openai import ChatOpenAI # ... rest of real test pass - Step 19
Cost Optimization Strategies
LLM APIs can be expensive at scale. Here are strategies to reduce costs:
1. Use cheaper models where possible - Use GPT-3.5-turbo for simple tasks, GPT-4 only when needed.
2. Implement caching - Use GPTCache or similar to cache identical queries.
3. Reduce prompt size - Shorter prompts = lower costs. Only include necessary context.
4. Batch operations - Process multiple items in one request when possible.
5. Monitor token usage - Track input and output tokens to identify expensive operations.
6. Consider local models - For sensitive data or high-volume use cases, local LLMs (via Ollama, LlamaCPP) cost nothing per request.
# Example: Using cheaper model for classification, GPT-4 for complex tasks from langchain_openai import ChatOpenAI from langchain.callbacks import get_openai_callback # Cheap classifier classifier = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) # Expensive model for complex tasks complex_model = ChatOpenAI(model="gpt-4", temperature=0.7) # Track costs with get_openai_callback() as cb: # Your LangChain operations here response = classifier.invoke("Classify: This is urgent") # Print token usage and cost print(f"Total Tokens: {cb.total_tokens}") print(f"Prompt Tokens: {cb.prompt_tokens}") print(f"Completion Tokens: {cb.completion_tokens}") print(f"Total Cost (USD): ${cb.total_cost}") - Step 20
Using Local Models with Ollama
For privacy, cost savings, or offline use, run models locally with Ollama. LangChain has first-class Ollama integration. This is ideal for development, sensitive data, or high-volume applications.
# Install Ollama (macOS/Linux) curl -fsSL https://ollama.ai/install.sh | sh # Or download from https://ollama.ai # Pull a model ollama pull llama2 ollama pull mistral ollama pull codellama # List available models ollama list - Step 21
LangChain with Ollama (Python)
Ollama integration works identically to cloud providers - just swap the model class. All LangChain features (chains, agents, RAG) work with local models.
from langchain_community.llms import Ollama from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # Initialize local model (no API key needed!) llm = Ollama(model="llama2") # Use exactly like OpenAI prompt = ChatPromptTemplate.from_template( "Explain {topic} in simple terms." ) chain = prompt | llm | StrOutputParser() response = chain.invoke({"topic": "quantum computing"}) print(response) # For RAG with local embeddings from langchain_community.embeddings import OllamaEmbeddings embeddings = OllamaEmbeddings(model="llama2")⚠ Heads up: Local models are slower than API-based models, especially without GPU acceleration. Llama2-7B requires ~8GB RAM, larger models need more. Quality varies - GPT-4 is still superior for complex tasks. - Step 22
Common Debugging Tips
LangChain applications can be complex. Here are common issues and solutions:
1. API key errors - Verify keys are set in environment, no quotes or spaces around them.
2. Rate limiting - OpenAI free tier limits to 3 req/min. Upgrade to paid or add retry logic.
3. Import errors - Integration packages are separate. Install langchain-openai, not just langchain.
4. Memory leaks in long conversations - Use ConversationSummaryMemory or BufferWindowMemory, not BufferMemory.
5. Slow performance - Enable streaming, use async, cache embeddings.
6. Inconsistent agent behavior - Set temperature=0 for deterministic outputs.
7. Empty retrieval results - Check embedding model matches between indexing and querying.
# Enable verbose mode for debugging from langchain.globals import set_verbose, set_debug set_verbose(True) set_debug(True) # Or per-chain from langchain.chains import LLMChain chain = LLMChain(llm=llm, prompt=prompt, verbose=True) # Use callbacks for detailed logging from langchain.callbacks import StdOutCallbackHandler chain.invoke( {"input": "test"}, config={"callbacks": [StdOutCallbackHandler()]} ) - Step 23
LangChain Ecosystem and Extensions
The LangChain ecosystem includes several companion projects:
LangSmith - Debugging, testing, and monitoring platform (smith.langchain.com)
LangServe - FastAPI server for deploying chains as REST APIs
LangGraph - Library for building stateful multi-actor applications with LLMs
LangChain Templates - Pre-built application templates (RAG, agents, chatbots)
LangChain Hub - Community prompt repository
LangFlow - Visual builder for LangChain applications (low-code)
Flowise - Another visual LangChain builder (low-code)
# Install LangServe for API deployment pip install langserve[all] # Install LangGraph for advanced agents pip install langgraph # Use LangChain CLI to scaffold from templates pip install langchain-cli langchain app new my-app cd my-app langchain app add <template-name> langchain app serve - Step 24
Example: Production RAG Application Structure
Here's a recommended structure for a production RAG application with proper separation of concerns, error handling, and monitoring.
# app/config.py from pydantic_settings import BaseSettings class Settings(BaseSettings): openai_api_key: str pinecone_api_key: str pinecone_environment: str langsmith_api_key: str class Config: env_file = ".env" settings = Settings() # app/embeddings.py from langchain_openai import OpenAIEmbeddings from functools import lru_cache @lru_cache() def get_embeddings(): return OpenAIEmbeddings() # app/vectorstore.py from langchain_pinecone import PineconeVectorStore from app.embeddings import get_embeddings import pinecone pinecone.init( api_key=settings.pinecone_api_key, environment=settings.pinecone_environment ) def get_vectorstore(index_name: str): return PineconeVectorStore( index_name=index_name, embedding=get_embeddings() ) # app/chain.py from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser def create_rag_chain(vectorstore): template = """Answer based on context: {context} Question: {question} Answer:""" prompt = ChatPromptTemplate.from_template(template) llm = ChatOpenAI(model="gpt-4", temperature=0) retriever = vectorstore.as_retriever() return ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # app/main.py from fastapi import FastAPI, HTTPException from pydantic import BaseModel from app.vectorstore import get_vectorstore from app.chain import create_rag_chain app = FastAPI() vectorstore = get_vectorstore("docs") chain = create_rag_chain(vectorstore) class Query(BaseModel): question: str @app.post("/query") async def query(q: Query): try: response = await chain.ainvoke(q.question) return {"answer": response} except Exception as e: raise HTTPException(status_code=500, detail=str(e)) - Step 25
Resources and Next Steps
Official Documentation:
- Python: https://python.langchain.com/
- JavaScript: https://js.langchain.com/
GitHub Repositories:
- LangChain Python: https://github.com/langchain-ai/langchain (87K+ stars)
- LangChain.js: https://github.com/langchain-ai/langchainjs
LangSmith Platform:
- https://smith.langchain.com/ (debugging, monitoring, evaluation)
Community Resources:
- Discord: https://discord.gg/langchain
- Twitter: @LangChainAI
- LangChain Blog: https://blog.langchain.dev/
Tutorials and Learning:
- LangChain Academy: https://academy.langchain.com/
- LangChain Cookbook: https://github.com/langchain-ai/langchain/tree/master/cookbook
- DeepLearning.AI Courses: Short courses on LangChain for LLM app development
Next Steps:
- Complete the LangChain quickstart guide
- Build a simple RAG application with your own documents
- Experiment with agents and tools
- Deploy a production application with LangServe
- Join the Discord community for support
Feature requests
Sign in to suggest features or vote on existing ones.
No feature requests yet.
Discussion
Sign in to join the discussion.
No comments yet.