Intermediateaillmollamafastapisveltekitself-hostedrag

Open WebUI - Self-hosted AI interface platform

Deploy Open WebUI, the extensible self-hosted AI platform supporting Ollama, OpenAI API, RAG, voice/video calls, and multi-tenancy.

Step 1
Overview
Open WebUI is a powerful, extensible AI interface that supports multiple LLM providers (Ollama, OpenAI API, Anthropic, Google Gemini, etc.) with built-in RAG, voice/video calling, Python function execution, and multi-tenant user management.

Key capabilities:
- Connect to Ollama or any OpenAI-compatible API endpoint
- Run AI inference locally with CUDA support (:cuda image)
- Bundled Ollama support in a single container (:ollama image)
- RAG with 9 vector database options and multiple content extraction engines
- Voice input (Whisper) and TTS (Azure, ElevenLabs, OpenAI, local)
- Image generation (DALL-E, ComfyUI, AUTOMATIC1111)
- Python code execution sandbox (Pyodide)
- Collaborative editing (YJS + Prosemirror)
- Enterprise authentication (LDAP/AD, OAuth, SCIM 2.0)
- Horizontal scaling with Redis-backed sessions
- OpenTelemetry observability
Step 2
Tech Stack Reference
Understanding the technology stack helps with troubleshooting and customization:

Frontend:
- SvelteKit (framework), Vite 5 (tooling)
- Tailwind CSS 4, Shiki (syntax highlighting)
- Socket.IO-client (real-time), i18next (internationalization)
- Pyodide (browser Python), YJS + Prosemirror (collaboration)
- Tiptap (rich text editor), Mermaid (diagrams), Katex (LaTeX)
- Chart.js, Leaflet, PDF.js, CodeMirror
Backend:
- FastAPI + uvicorn (async Python server)
- Pydantic 2, SQLAlchemy (async ORM), Alembic (migrations)
- Starsessions + Redis (session management)
- LangChain (RAG orchestration)
- sentence-transformers (embeddings)
- faster-whisper (STT), transformers, ONNX Runtime
- ChromaDB/PGVector/Qdrant/Milvus/Elasticsearch/OpenSearch/Pinecone (vector stores)
- boto3, azure-* SDKs, google-* SDKs (cloud integrations)
- OpenTelemetry (traces/metrics/logs)
Deployment:
- Docker/Docker Compose, Kubernetes (Helm, kustomize)
- PostgreSQL or SQLite (main database)
- Redis (session/caching for scale)
```
Relevant packages:

Frontend (npm): @sveltejs/kit, @sveltejs/adapter-static, tailwindcss@4,
tiptap, pyodide, yjs, @azure/msal-browser, socket.io-client

Backend (pip): fastapi, uvicorn, pydantic, sqlalchemy[asyncio],
langchain, chromadb, sentence-transformers, faster-whisper,
transformers, onnxruntime, tiktoken, openai, anthropic, opentelemetry-sdk
```
Step 3
Quick Start with Docker (Ollama on same machine)
The simplest setup runs Open WebUI with Ollama accessible from your host machine. Port 3000 exposes the UI.

The -v open-webui:/app/backend/data mount is critical — it persists your database, models, and documents. Without it, data is lost on container restart.
```
# Start Open WebUI connecting to local Ollama
docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

# Access at http://localhost:3000
# Ensure Ollama is running locally: ollama serve
# Verify: ollama list  # shows available models
```
⚠ Heads up: The `-v open-webui:/app/backend/data` volume mount is required. Without it, your database and document uploads will be lost when the container restarts.

Step 4

Quick Start (Ollama on different server)

If Ollama runs on a different machine, set OLLAMA_BASE_URL to its address. Replace the URL with the exact host:port serving Ollama.

# Connect to remote Ollama server
docker run -d \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://192.168.1.100:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Step 5

Quick Start with CUDA GPU acceleration

The :cuda tag includes PyTorch with CUDA for running embedding models and Whisper on a GPU. Requires NVIDIA Container Toolkit installed on the host.

Prerequisites:

Install NVIDIA Container Toolkit
Restart Docker service

See NVIDIA docs for installation.

# First, ensure NVIDIA Container Toolkit is installed:
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

# Then run with GPU:
docker run -d \
  -p 3000:8080 \
  --gpus all \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:cuda

Step 6

One-Container setup (Open WebUI + Ollama bundled)

The :ollama image bundles Ollama inside the container, eliminating external dependencies. Choose the command based on your hardware.

Benefits:

Single container, no external Ollama setup needed
Ollama models persist in the ollama volume
GPU passthrough works with --gpus=all

# With GPU support (recommended if you have an NVIDIA GPU):
docker run -d \
  -p 3000:8080 \
  --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

# CPU-only mode:
docker run -d \
  -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

Step 7

Docker Compose setup (recommended for production)

Docker Compose is ideal for multi-container setups with persistent data and complex networking. Create docker-compose.yml in your project directory.

version: '3.8'

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    restart: always

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    restart: always

volumes:
  open-webui:
  ollama:

Step 8

Python pip installation

Install Open WebUI as a Python package. Use Python 3.11 for compatibility.

# Install
pip install open-webui

# Start the server
open-webui serve

# Access at http://localhost:8080
# Default Ollama endpoint: http://127.0.0.1:11434

# Optional: expose on network
open-webui serve --host 0.0.0.0 --port 8080

Step 9
Configuration options
Open WebUI is highly configurable via environment variables.

Model providers:
- OLLAMA_BASE_URL — Ollama server endpoint
- OPENAI_API_KEY — OpenAI or compatible API key
- OPENAI_API_BASE_URL — Custom endpoint for non-OpenAI providers
Database:
- DATABASE_URL — PostgreSQL connection string (default: SQLite)
- DB_ENCRYPT_KEY — Encryption key for SQLite (optional)
Vector database (RAG):
- RAG_EMBEDDING_MODEL — Sentence transformer model (default: sentence-transformers/all-MiniLM-L6-v2)
- STORAGE_TYPE — Vector DB backend (chroma, qdrant, pgvector, etc.)
Security:
- WEBUI_SECRET_KEY — Session secret (auto-generated if not set)
- ENABLE_SIGNUP — Allow user registration (default: true)
```
# Example with custom configuration:
docker run -d \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
  -e OPENAI_API_KEY=sk-*** \
  -e DATABASE_URL=postgresql://user:pass@postgres:5432/openwebui \
  -e RAG_EMBEDDING_MODEL=intfloat/multilingual-e5-large \
  -e ENABLE_SIGNUP=false \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main
```

Step 10

PostgreSQL database setup

For production deployments, use PostgreSQL instead of SQLite for better performance and multi-instance support.

version: '3.8'

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - DATABASE_URL=postgresql://openwebui:openwebui@postgres:5432/openwebui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
      - postgres
    restart: always

  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama:/root/.ollama
    restart: always

  postgres:
    image: postgres:16-alpine
    environment:
      - POSTGRES_USER=openwebui
      - POSTGRES_PASSWORD=openwebui
      - POSTGRES_DB=openwebui
    volumes:
      - postgres-data:/var/lib/postgresql/data
    restart: always

volumes:
  open-webui:
  ollama:
  postgres-data:

Step 11

Troubleshooting

Connection errors: If Open WebUI can't reach Ollama, use --add-host=host.docker.internal:host-gateway to map the Docker host.

GPU not detected: For CUDA images, ensure --gpus all is passed to docker run.

Data persistence: Always mount -v open-webui:/app/backend/data to prevent data loss.

# Check container logs
docker logs open-webui

# Test Ollama connection from container
docker exec -it open-webui sh
wget -qO- http://host.docker.internal:11434/api/tags

# Monitor resources
docker stats open-webui

# Fallback: use --network=host flag (port becomes 8080):
docker run -d --network=host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Step 12
Key Features
Pipelines Plugin Framework: Extend Open WebUI with custom Python code for rate limiting, logging, translation, content filtering.

Function Calling (Tools): Integrate external APIs and functions with LLMs.

RAG: Upload documents (PDF, DOCX, images) for contextual chat. Supports 15+ web search providers.

Web Browsing: Inject webpages into chat using #<url> command.

Image Generation: DALL-E, ComfyUI, AUTOMATIC1111.

Voice/Video Calls: STT (Whisper) and TTS (Azure, ElevenLabs, OpenAI).

Collaborative Editing: Real-time multi-user editing with YJS.

Python Sandbox: Execute Python in browser with Pyodide.

Open WebUI - Self-hosted AI interface platform

Overview

Tech Stack Reference

Quick Start with Docker (Ollama on same machine)

Quick Start (Ollama on different server)

Quick Start with CUDA GPU acceleration

One-Container setup (Open WebUI + Ollama bundled)

Docker Compose setup (recommended for production)

Python pip installation

Configuration options

PostgreSQL database setup

Troubleshooting

Key Features

Feature requests

Discussion