TechSetupGuides
Intermediateaillmollamafastapisveltekitself-hostedrag

Open WebUI - Self-hosted AI interface platform

Deploy Open WebUI, the extensible self-hosted AI platform supporting Ollama, OpenAI API, RAG, voice/video calls, and multi-tenancy.

  1. Step 1

    Overview

    Open WebUI is a powerful, extensible AI interface that supports multiple LLM providers (Ollama, OpenAI API, Anthropic, Google Gemini, etc.) with built-in RAG, voice/video calling, Python function execution, and multi-tenant user management.

    Key capabilities:

    • Connect to Ollama or any OpenAI-compatible API endpoint
    • Run AI inference locally with CUDA support (:cuda image)
    • Bundled Ollama support in a single container (:ollama image)
    • RAG with 9 vector database options and multiple content extraction engines
    • Voice input (Whisper) and TTS (Azure, ElevenLabs, OpenAI, local)
    • Image generation (DALL-E, ComfyUI, AUTOMATIC1111)
    • Python code execution sandbox (Pyodide)
    • Collaborative editing (YJS + Prosemirror)
    • Enterprise authentication (LDAP/AD, OAuth, SCIM 2.0)
    • Horizontal scaling with Redis-backed sessions
    • OpenTelemetry observability
  2. Step 2

    Tech Stack Reference

    Understanding the technology stack helps with troubleshooting and customization:

    Frontend:

    • SvelteKit (framework), Vite 5 (tooling)
    • Tailwind CSS 4, Shiki (syntax highlighting)
    • Socket.IO-client (real-time), i18next (internationalization)
    • Pyodide (browser Python), YJS + Prosemirror (collaboration)
    • Tiptap (rich text editor), Mermaid (diagrams), Katex (LaTeX)
    • Chart.js, Leaflet, PDF.js, CodeMirror

    Backend:

    • FastAPI + uvicorn (async Python server)
    • Pydantic 2, SQLAlchemy (async ORM), Alembic (migrations)
    • Starsessions + Redis (session management)
    • LangChain (RAG orchestration)
    • sentence-transformers (embeddings)
    • faster-whisper (STT), transformers, ONNX Runtime
    • ChromaDB/PGVector/Qdrant/Milvus/Elasticsearch/OpenSearch/Pinecone (vector stores)
    • boto3, azure-* SDKs, google-* SDKs (cloud integrations)
    • OpenTelemetry (traces/metrics/logs)

    Deployment:

    • Docker/Docker Compose, Kubernetes (Helm, kustomize)
    • PostgreSQL or SQLite (main database)
    • Redis (session/caching for scale)
    Relevant packages:
    
    Frontend (npm): @sveltejs/kit, @sveltejs/adapter-static, tailwindcss@4,
    tiptap, pyodide, yjs, @azure/msal-browser, socket.io-client
    
    Backend (pip): fastapi, uvicorn, pydantic, sqlalchemy[asyncio],
    langchain, chromadb, sentence-transformers, faster-whisper,
    transformers, onnxruntime, tiktoken, openai, anthropic, opentelemetry-sdk
  3. Step 3

    Quick Start with Docker (Ollama on same machine)

    The simplest setup runs Open WebUI with Ollama accessible from your host machine. Port 3000 exposes the UI.

    The -v open-webui:/app/backend/data mount is critical — it persists your database, models, and documents. Without it, data is lost on container restart.

    # Start Open WebUI connecting to local Ollama
    docker run -d \
      -p 3000:8080 \
      --add-host=host.docker.internal:host-gateway \
      -v open-webui:/app/backend/data \
      --name open-webui \
      --restart always \
      ghcr.io/open-webui/open-webui:main
    
    # Access at http://localhost:3000
    # Ensure Ollama is running locally: ollama serve
    # Verify: ollama list  # shows available models
    ⚠ Heads up: The `-v open-webui:/app/backend/data` volume mount is required. Without it, your database and document uploads will be lost when the container restarts.
  4. Step 4

    Quick Start (Ollama on different server)

    If Ollama runs on a different machine, set OLLAMA_BASE_URL to its address. Replace the URL with the exact host:port serving Ollama.

    # Connect to remote Ollama server
    docker run -d \
      -p 3000:8080 \
      -e OLLAMA_BASE_URL=http://192.168.1.100:11434 \
      -v open-webui:/app/backend/data \
      --name open-webui \
      --restart always \
      ghcr.io/open-webui/open-webui:main
  5. Step 5

    Quick Start with CUDA GPU acceleration

    The :cuda tag includes PyTorch with CUDA for running embedding models and Whisper on a GPU. Requires NVIDIA Container Toolkit installed on the host.

    Prerequisites:

    1. Install NVIDIA Container Toolkit
    2. Restart Docker service

    See NVIDIA docs for installation.

    # First, ensure NVIDIA Container Toolkit is installed:
    sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker
    
    # Then run with GPU:
    docker run -d \
      -p 3000:8080 \
      --gpus all \
      --add-host=host.docker.internal:host-gateway \
      -v open-webui:/app/backend/data \
      --name open-webui \
      --restart always \
      ghcr.io/open-webui/open-webui:cuda
  6. Step 6

    One-Container setup (Open WebUI + Ollama bundled)

    The :ollama image bundles Ollama inside the container, eliminating external dependencies. Choose the command based on your hardware.

    Benefits:

    • Single container, no external Ollama setup needed
    • Ollama models persist in the ollama volume
    • GPU passthrough works with --gpus=all
    # With GPU support (recommended if you have an NVIDIA GPU):
    docker run -d \
      -p 3000:8080 \
      --gpus=all \
      -v ollama:/root/.ollama \
      -v open-webui:/app/backend/data \
      --name open-webui \
      --restart always \
      ghcr.io/open-webui/open-webui:ollama
    
    # CPU-only mode:
    docker run -d \
      -p 3000:8080 \
      -v ollama:/root/.ollama \
      -v open-webui:/app/backend/data \
      --name open-webui \
      --restart always \
      ghcr.io/open-webui/open-webui:ollama
  7. Step 7

    Docker Compose setup (recommended for production)

    Docker Compose is ideal for multi-container setups with persistent data and complex networking. Create docker-compose.yml in your project directory.

    version: '3.8'
    
    services:
      open-webui:
        image: ghcr.io/open-webui/open-webui:main
        ports:
          - "3000:8080"
        environment:
          - OLLAMA_BASE_URL=http://ollama:11434
        volumes:
          - open-webui:/app/backend/data
        depends_on:
          - ollama
        restart: always
    
      ollama:
        image: ollama/ollama:latest
        ports:
          - "11434:11434"
        volumes:
          - ollama:/root/.ollama
        restart: always
    
    volumes:
      open-webui:
      ollama:
  8. Step 8

    Python pip installation

    Install Open WebUI as a Python package. Use Python 3.11 for compatibility.

    # Install
    pip install open-webui
    
    # Start the server
    open-webui serve
    
    # Access at http://localhost:8080
    # Default Ollama endpoint: http://127.0.0.1:11434
    
    # Optional: expose on network
    open-webui serve --host 0.0.0.0 --port 8080
  9. Step 9

    Configuration options

    Open WebUI is highly configurable via environment variables.

    Model providers:

    • OLLAMA_BASE_URL — Ollama server endpoint
    • OPENAI_API_KEY — OpenAI or compatible API key
    • OPENAI_API_BASE_URL — Custom endpoint for non-OpenAI providers

    Database:

    • DATABASE_URL — PostgreSQL connection string (default: SQLite)
    • DB_ENCRYPT_KEY — Encryption key for SQLite (optional)

    Vector database (RAG):

    • RAG_EMBEDDING_MODEL — Sentence transformer model (default: sentence-transformers/all-MiniLM-L6-v2)
    • STORAGE_TYPE — Vector DB backend (chroma, qdrant, pgvector, etc.)

    Security:

    • WEBUI_SECRET_KEY — Session secret (auto-generated if not set)
    • ENABLE_SIGNUP — Allow user registration (default: true)
    # Example with custom configuration:
    docker run -d \
      -p 3000:8080 \
      -e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
      -e OPENAI_API_KEY=sk-*** \
      -e DATABASE_URL=postgresql://user:pass@postgres:5432/openwebui \
      -e RAG_EMBEDDING_MODEL=intfloat/multilingual-e5-large \
      -e ENABLE_SIGNUP=false \
      -v open-webui:/app/backend/data \
      --name open-webui \
      ghcr.io/open-webui/open-webui:main
  10. Step 10

    PostgreSQL database setup

    For production deployments, use PostgreSQL instead of SQLite for better performance and multi-instance support.

    version: '3.8'
    
    services:
      open-webui:
        image: ghcr.io/open-webui/open-webui:main
        ports:
          - "3000:8080"
        environment:
          - OLLAMA_BASE_URL=http://ollama:11434
          - DATABASE_URL=postgresql://openwebui:openwebui@postgres:5432/openwebui
        volumes:
          - open-webui:/app/backend/data
        depends_on:
          - ollama
          - postgres
        restart: always
    
      ollama:
        image: ollama/ollama:latest
        volumes:
          - ollama:/root/.ollama
        restart: always
    
      postgres:
        image: postgres:16-alpine
        environment:
          - POSTGRES_USER=openwebui
          - POSTGRES_PASSWORD=openwebui
          - POSTGRES_DB=openwebui
        volumes:
          - postgres-data:/var/lib/postgresql/data
        restart: always
    
    volumes:
      open-webui:
      ollama:
      postgres-data:
  11. Step 11

    Troubleshooting

    Connection errors: If Open WebUI can't reach Ollama, use --add-host=host.docker.internal:host-gateway to map the Docker host.

    GPU not detected: For CUDA images, ensure --gpus all is passed to docker run.

    Data persistence: Always mount -v open-webui:/app/backend/data to prevent data loss.

    # Check container logs
    docker logs open-webui
    
    # Test Ollama connection from container
    docker exec -it open-webui sh
    wget -qO- http://host.docker.internal:11434/api/tags
    
    # Monitor resources
    docker stats open-webui
    
    # Fallback: use --network=host flag (port becomes 8080):
    docker run -d --network=host \
      -v open-webui:/app/backend/data \
      -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
      --name open-webui \
      ghcr.io/open-webui/open-webui:main
  12. Step 12

    Key Features

    Pipelines Plugin Framework: Extend Open WebUI with custom Python code for rate limiting, logging, translation, content filtering.

    Function Calling (Tools): Integrate external APIs and functions with LLMs.

    RAG: Upload documents (PDF, DOCX, images) for contextual chat. Supports 15+ web search providers.

    Web Browsing: Inject webpages into chat using #<url> command.

    Image Generation: DALL-E, ComfyUI, AUTOMATIC1111.

    Voice/Video Calls: STT (Whisper) and TTS (Azure, ElevenLabs, OpenAI).

    Collaborative Editing: Real-time multi-user editing with YJS.

    Python Sandbox: Execute Python in browser with Pyodide.

Feature requests

Sign in to suggest features or vote on existing ones.

No feature requests yet.

Discussion

0 people marked this as worked·Sign in to mark your own.

Sign in to join the discussion.

No comments yet.