CVAT: Computer Vision Annotation Tool Setup
Complete setup guide for CVAT (Computer Vision Annotation Tool) - a leading open-source platform for building high-quality visual datasets. Includes Docker deployment, PostgreSQL/Redis configuration, AI-assisted annotation with OpenVINO, and production HTTPS setup. Supports image, video, and 3D annotation with team collaboration features.
- Step 1
Understanding CVAT Architecture
CVAT (Computer Vision Annotation Tool) is an industry-leading open-source platform for creating high-quality visual datasets for AI/ML applications. It features a three-tier microservices architecture with a React frontend, Django REST API backend, and multiple specialized databases. The platform supports image, video, and 3D annotation with AI-assisted labeling, team collaboration, quality assurance, and comprehensive developer APIs.
Architecture Overview: Frontend Layer: - React 18.2.0 with Redux state management - Single-page application (SPA) - TypeScript for type safety API Gateway: - Traefik (routes requests to backend services) - Nginx (serves static frontend assets) Backend Layer: - Django REST Framework (Python) - Redis Queue for asynchronous jobs - Specialized workers for processing Data Layer: - PostgreSQL 15 (primary database) - Redis 7.2.11 (jobs/cache) - Kvrocks 2.12.1 (chunk storage) - ClickHouse 23.11 (analytics) - Step 2
System Prerequisites
Before installing CVAT, ensure your system meets the minimum requirements. CVAT is designed to run on Linux servers and supports various distributions. The Docker-based deployment makes it portable, but adequate resources are essential for smooth operation, especially when handling large video datasets or running AI models.
# Verify system resources uname -a free -h # Check RAM (minimum 4GB, recommended 8GB+) df -h # Check disk space (minimum 50GB) nproc # Check CPU cores (minimum 2, recommended 4+) # Recommended specifications: # - OS: Ubuntu 20.04+, Debian 11+, CentOS 8+, or similar # - CPU: 4+ cores for production use # - RAM: 8GB+ (16GB+ for AI-assisted annotation) # - Storage: 100GB+ for production datasets # - Network: Stable connection for Docker image downloads⚠ Heads up: CVAT requires significant resources when processing videos or running AI models. For production use with multiple users, consider 16GB+ RAM and dedicated GPU for serverless AI functions. - Step 3
Install Docker Engine
CVAT uses Docker and Docker Compose for deployment, which simplifies installation and ensures consistency across environments. Docker Engine 20.10.0 or higher and Docker Compose 2.0.0 or higher are required. The official Docker installation script works on most Linux distributions.
# Download and run Docker installation script curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh # Add your user to the docker group (avoid using sudo for docker commands) sudo usermod -aG docker $USER # Apply group changes (or log out and back in) newgrp docker # Verify Docker installation docker --version # Expected: Docker version 20.10.0 or higher docker compose version # Expected: Docker Compose version 2.0.0 or higher⚠ Heads up: After adding your user to the docker group, you may need to log out and log back in for the changes to take effect. The newgrp command applies changes immediately in the current shell. - Step 4
Clone CVAT Repository
Clone the official CVAT repository from GitHub. The repository includes Docker Compose configurations, deployment scripts, and all necessary files for running CVAT. Always use the latest stable release for production deployments.
# Clone the repository git clone https://github.com/cvat-ai/cvat cd cvat # Optional: checkout a specific release (recommended for production) git tag -l # List available releases git checkout v2.19.0 # Replace with latest stable version # Verify repository contents ls -la # You should see: # - docker-compose.yml (main deployment config) # - docker-compose.dev.yml (development config) # - docker-compose.https.yml (HTTPS config) # - docker-compose.serverless.yml (AI functions config) - Step 5
Basic Docker Compose Deployment
Deploy CVAT using Docker Compose, which will download all required images (PostgreSQL, Redis, Traefik, and CVAT services) and start the containers. The first deployment takes longer as Docker downloads multiple images totaling several gigabytes. Subsequent starts are much faster.
# Start CVAT with default configuration docker compose up -d # This will start 17 interconnected services: # - cvat_server (Django API backend) # - cvat_ui (Nginx serving React frontend) # - cvat_db (PostgreSQL 15) # - cvat_redis_inmem (Redis cache) # - cvat_redis_ondisk (Redis persistent storage) # - cvat_opa (Open Policy Agent) # - cvat_clickhouse (Analytics database) # - cvat_vector (Log aggregation) # - cvat_worker_* (Various background workers) # - traefik (Reverse proxy) # Monitor the startup process docker compose logs -f # Check that all services are running docker compose ps # All services should show "Up" status⚠ Heads up: First-time deployment downloads 3-5GB of Docker images. Ensure stable internet connection. The complete startup process may take 5-10 minutes on slower systems. - Step 6
Create Superuser Account
After the containers are running, create an administrator account to access CVAT. This superuser has full permissions to manage users, organizations, projects, and system settings. Store these credentials securely as they provide complete system access.
# Create superuser interactively docker exec -it cvat_server bash -ic 'python3 ~/manage.py createsuperuser' # You'll be prompted for: # - Username (e.g., admin) # - Email address (optional) # - Password (minimum 8 characters) # - Password confirmation # Example session: # Username: admin # Email address: admin@example.com # Password: ******** # Password (again): ******** # Superuser created successfully. # Verify the account was created docker exec -it cvat_server bash -ic 'python3 ~/manage.py shell' # >>> from django.contrib.auth.models import User # >>> User.objects.all() # >>> exit() - Step 7
Access CVAT Web Interface
Once deployment is complete and the superuser is created, access CVAT through your web browser. By default, CVAT runs on port 8080. The interface provides a modern, intuitive UI for creating projects, managing tasks, annotating data, and reviewing results.
# Access CVAT in your browser: http://localhost:8080 # For remote server access: http://<server-ip>:8080 # Login with your superuser credentials # Username: admin (or what you created) # Password: <your-password> # After login, you can: # - Create organizations and invite team members # - Set up projects and tasks # - Upload images/videos for annotation # - Configure annotation labels and attributes # - Start annotating or assign tasks to team members⚠ Heads up: Default deployment uses HTTP without encryption. For production use with sensitive data or remote access, configure HTTPS as shown in later steps. - Step 8
Configure Environment Variables
Customize CVAT behavior through environment variables defined in a .env file. This includes database connections, cache settings, email configuration, and security settings. Environment variables allow deployment-specific configuration without modifying the source code.
# Create environment configuration file cat > .env << 'EOF' # Django settings DJANGO_MODWSGI_EXTRA_ARGS="" ALLOWED_HOSTS='*' CSRF_TRUSTED_ORIGINS='http://localhost:8080' # PostgreSQL configuration CVAT_POSTGRES_HOST=cvat_db CVAT_POSTGRES_PORT=5432 CVAT_POSTGRES_DBNAME=cvat CVAT_POSTGRES_USER=root CVAT_POSTGRES_PASSWORD=cvat_postgresql_password # Redis configuration CVAT_REDIS_HOST=cvat_redis_inmem CVAT_REDIS_PORT=6379 CVAT_REDIS_PASSWORD="" # Organization settings CVAT_ORGANIZATION=YourOrgName # Email configuration (optional) EMAIL_BACKEND=django.core.mail.backends.smtp.EmailBackend EMAIL_HOST=smtp.gmail.com EMAIL_PORT=587 EMAIL_USE_TLS=True EMAIL_HOST_USER=your-email@gmail.com EMAIL_HOST_PASSWORD=your-app-password EOF # Restart services to apply changes docker compose down docker compose up -d⚠ Heads up: Never commit .env files containing passwords to version control. Use strong passwords for PostgreSQL in production. For Gmail, use app-specific passwords, not your main account password. - Step 9
External PostgreSQL Database Configuration
For production deployments, you may want to use an external PostgreSQL database for better reliability, backups, and scaling. CVAT supports connecting to external PostgreSQL instances, but requires PostgreSQL 15 (the same major version as used in docker-compose.yml).
# Configure external PostgreSQL in .env file cat >> .env << 'EOF' # External PostgreSQL configuration CVAT_POSTGRES_HOST=db.example.com CVAT_POSTGRES_PORT=5432 CVAT_POSTGRES_DBNAME=cvat_production CVAT_POSTGRES_USER=cvat_app CVAT_POSTGRES_PASSWORD=strong_database_password EOF # Create the database on your PostgreSQL server psql -h db.example.com -U postgres # CREATE DATABASE cvat_production; # CREATE USER cvat_app WITH PASSWORD 'strong_database_password'; # GRANT ALL PRIVILEGES ON DATABASE cvat_production TO cvat_app; # \q # Modify docker-compose.yml to exclude cvat_db service # Comment out or remove the cvat_db service section # Deploy with external database docker compose up -d # Run migrations on external database docker exec -it cvat_server bash -ic 'python3 ~/manage.py migrate'⚠ Heads up: Ensure your external PostgreSQL version matches CVAT requirements (PostgreSQL 15). Mismatched versions may cause migration failures or runtime errors. Always test database connectivity before deployment. - Step 10
Enable HTTPS with Let's Encrypt
For production deployments accessible over the internet, enable HTTPS using Let's Encrypt for free SSL/TLS certificates. This encrypts traffic between clients and the server, essential for protecting user credentials and annotation data. The docker-compose.https.yml configuration includes automatic certificate provisioning via Traefik.
# Set required environment variables for HTTPS export CVAT_HOST=cvat.yourdomain.com export ACME_EMAIL=admin@yourdomain.com # Add to .env file for persistence cat >> .env << EOF CVAT_HOST=cvat.yourdomain.com ACME_EMAIL=admin@yourdomain.com EOF # Update CSRF_TRUSTED_ORIGINS for HTTPS sed -i 's|http://localhost:8080|https://cvat.yourdomain.com|' .env # Deploy with HTTPS configuration docker compose -f docker-compose.yml -f docker-compose.https.yml up -d # Traefik will automatically: # - Request SSL certificate from Let's Encrypt # - Handle HTTP to HTTPS redirects # - Renew certificates before expiration # Verify HTTPS is working curl -I https://cvat.yourdomain.com # Should return 200 OK with valid certificate⚠ Heads up: Ensure DNS records for cvat.yourdomain.com point to your server before starting. Let's Encrypt requires port 443 accessible from the internet. Certificate provisioning may take 1-2 minutes on first deployment. - Step 11
Custom SSL Certificates
For enterprise deployments or internal networks, you may need to use custom SSL certificates instead of Let's Encrypt. This is common when using corporate CA certificates, wildcard certificates, or deploying behind corporate proxies. CVAT supports custom certificates through Traefik configuration.
# Create certificates directory mkdir -p certs # Copy your certificate files cp /path/to/your/certificate.crt certs/cert.crt cp /path/to/your/private.key certs/cert.key # Ensure proper permissions chmod 600 certs/cert.key chmod 644 certs/cert.crt # Create Traefik TLS configuration cat > certs/tls.yml << 'EOF' tls: certificates: - certFile: /certs/cert.crt keyFile: /certs/cert.key EOF # Update docker-compose.yml to mount certificates # Add to traefik service volumes: # - ./certs:/certs:ro # - ./certs/tls.yml:/etc/traefik/tls.yml:ro # Add to traefik command: # - "--providers.file.filename=/etc/traefik/tls.yml" # Restart services docker compose down docker compose up -d⚠ Heads up: Ensure certificate and key files match and are valid. Test certificate with: openssl x509 -in certs/cert.crt -text -noout. Private keys must never be committed to version control. - Step 12
Setup AI-Assisted Annotation (Serverless Functions)
CVAT's AI-assisted annotation uses serverless functions powered by Nuclio to run deep learning models for automatic and semi-automatic annotation. These functions include pre-trained models like Mask R-CNN, Faster R-CNN, YOLO, Segment Anything, and more. Models are optimized with Intel OpenVINO for CPU inference.
# Install Nuclio CLI tool (nuctl) version 1.13.0 wget https://github.com/nuclio/nuclio/releases/download/1.13.0/nuctl-1.13.0-linux-amd64 chmod +x nuctl-1.13.0-linux-amd64 sudo mv nuctl-1.13.0-linux-amd64 /usr/local/bin/nuctl # Verify installation nuctl version # Deploy serverless services docker compose -f docker-compose.yml -f docker-compose.serverless.yml up -d # Wait for Nuclio dashboard to start (port 8070) sleep 30 # Deploy pre-built AI models from serverless directory cd serverless # Deploy example models (Mask R-CNN, YOLO, etc.) ./deploy_cpu.sh # For GPU-accelerated inference (requires NVIDIA GPU + nvidia-docker) ./deploy_gpu.sh # List deployed functions nuctl get functions⚠ Heads up: Serverless functions require significant resources. Each model uses 1-4GB RAM. Deploy only needed models. GPU deployment requires nvidia-docker runtime and compatible CUDA drivers. - Step 13
Configure Serverless Functions
Individual serverless functions can be customized, deployed, and managed through Nuclio. You can deploy functions for specific model architectures, adjust resource limits, or add custom models. The Nuclio dashboard at port 8070 provides a web interface for managing functions.
# Access Nuclio dashboard http://localhost:8070 # Deploy specific function manually cd serverless/pytorch/foolwood/siammask nuctl deploy --project-name cvat \ --path . \ --platform local # View function logs nuctl get function siammask nuctl get logs siammask # Test function manually curl -X POST http://localhost:8080/api/functions/siammask \ -H "Content-Type: application/json" \ -d '{"image": "<base64-encoded-image>"}' # Update function configuration # Edit function.yaml in the serverless function directory # Then redeploy: nuctl deploy --project-name cvat \ --path . \ --platform local \ --force # Remove a function nuctl delete function siammask - Step 14
Using AI-Assisted Annotation in CVAT
Once serverless functions are deployed, they become available in the CVAT web interface for automatic and semi-automatic annotation. Users can apply AI models to automatically detect objects, segment instances, or track objects across video frames, significantly speeding up the annotation process.
Using AI Models in CVAT Web Interface: 1. Create a task with images or video 2. Open the task in annotation mode 3. Click "Actions" → "Run detector" or "Run tracker" 4. Select an AI model from the dropdown: - Object detectors (YOLO, Faster R-CNN, Mask R-CNN) - Instance segmentation (Mask R-CNN, Segment Anything) - Tracking (SiamMask, TransT) - Interactive segmentation (Inside-Outside Guidance) 5. Configure model parameters: - Confidence threshold - Labels to detect - Frame range (for video) 6. Click "Annotate" to run the model 7. AI predictions appear as annotations 8. Review and correct predictions manually 9. Save annotations Best Practices: - Start with lower confidence thresholds (0.3-0.5) - Review all AI predictions before saving - Use interactive tools to correct errors - Combine multiple models for best results - Step 15
Proxy and Network Configuration
If deploying CVAT behind a corporate proxy or in a restricted network environment, configure proxy settings for Docker builds and runtime operations. This ensures CVAT can download models and access external resources when needed.
# Set proxy environment variables for Docker builds cat >> ~/.docker/config.json << 'EOF' { "proxies": { "default": { "httpProxy": "http://proxy.example.com:8080", "httpsProxy": "http://proxy.example.com:8080", "noProxy": "localhost,127.0.0.1,.example.com" } } } EOF # Set proxy for CVAT runtime (add to .env) cat >> .env << 'EOF' HTTP_PROXY=http://proxy.example.com:8080 HTTPS_PROXY=http://proxy.example.com:8080 NO_PROXY=localhost,127.0.0.1,cvat_db,cvat_redis_inmem EOF # Restart services docker compose down docker compose up -d # Test external connectivity from container docker exec -it cvat_server curl -I https://www.google.com - Step 16
Backup and Restore
Regular backups are essential for production deployments. CVAT stores data in PostgreSQL (metadata), Redis (cache/jobs), and the filesystem (uploaded media). A complete backup includes all three components plus configuration files.
# Create backup directory mkdir -p cvat-backups/$(date +%Y%m%d) # Backup PostgreSQL database docker exec cvat_db pg_dump -U root cvat > \ cvat-backups/$(date +%Y%m%d)/cvat_db.sql # Backup uploaded media and data docker run --rm \ -v cvat_cvat_data:/data \ -v $(pwd)/cvat-backups/$(date +%Y%m%d):/backup \ alpine tar czf /backup/cvat_data.tar.gz /data # Backup Redis data (if using persistent Redis) docker exec cvat_redis_ondisk redis-cli SAVE docker cp cvat_redis_ondisk:/data/dump.rdb \ cvat-backups/$(date +%Y%m%d)/redis_dump.rdb # Backup configuration files cp .env cvat-backups/$(date +%Y%m%d)/ cp docker-compose.yml cvat-backups/$(date +%Y%m%d)/ # Restore from backup # 1. Restore database: cat cvat-backups/20240115/cvat_db.sql | \ docker exec -i cvat_db psql -U root cvat # 2. Restore media: docker run --rm \ -v cvat_cvat_data:/data \ -v $(pwd)/cvat-backups/20240115:/backup \ alpine tar xzf /backup/cvat_data.tar.gz -C / # 3. Restore Redis: docker cp cvat-backups/20240115/redis_dump.rdb \ cvat_redis_ondisk:/data/dump.rdb docker restart cvat_redis_ondisk⚠ Heads up: Test restore procedures regularly. Schedule automated backups using cron. Store backups on separate storage from the CVAT server. Encrypt backups containing sensitive data. - Step 17
User Management and Organizations
CVAT supports multi-user environments with role-based access control. Organizations allow grouping users and projects with separate permissions. Administrators can create users, assign roles, and manage organization memberships through the web interface or Django management commands.
# Create additional users via Django shell docker exec -it cvat_server bash -ic 'python3 ~/manage.py shell' # In Django shell: from django.contrib.auth.models import User user = User.objects.create_user('annotator1', 'annotator@example.com', 'password123') user.save() exit() # Or create user via management command docker exec -it cvat_server bash -ic \ 'python3 ~/manage.py createsuperuser --username annotator2 --email annotator2@example.com' # List all users docker exec -it cvat_server bash -ic \ 'python3 ~/manage.py shell -c "from django.contrib.auth.models import User; print([u.username for u in User.objects.all()])"' # In the web interface: # 1. Login as superuser # 2. Go to Organization menu → Create Organization # 3. Set organization name and description # 4. Invite members by email or username # 5. Assign roles: Owner, Maintainer, Supervisor, Worker # 6. Create projects within the organization # 7. Assign tasks to organization members - Step 18
Performance Optimization
For production deployments with heavy usage, optimize CVAT performance through database tuning, worker scaling, caching configuration, and resource allocation. These optimizations significantly improve responsiveness and throughput.
# Scale worker processes in docker-compose.yml # Edit docker-compose.yml and add: services: cvat_worker_import: deploy: replicas: 3 # Scale import workers cvat_worker_export: deploy: replicas: 3 # Scale export workers cvat_worker_annotation: deploy: replicas: 2 # Scale annotation workers # Increase PostgreSQL performance (add to .env) cat >> .env << 'EOF' POSTGRES_SHARED_BUFFERS=256MB POSTGRES_EFFECTIVE_CACHE_SIZE=1GB POSTGRES_MAX_CONNECTIONS=200 EOF # Increase Redis memory limit # Edit docker-compose.yml: # cvat_redis_inmem: # command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru # Configure Nginx caching for static assets # Add to nginx configuration in docker-compose.yml # Monitor resource usage docker stats # Check worker queue status docker exec -it cvat_server bash -ic 'python3 ~/manage.py rqstats' # Restart services to apply changes docker compose down docker compose up -d - Step 19
Monitoring and Logs
Proper monitoring and logging are essential for maintaining CVAT in production. CVAT includes built-in analytics with ClickHouse and centralized logging with Vector. Docker logs provide real-time troubleshooting capabilities.
# View logs for all services docker compose logs -f # View logs for specific service docker compose logs -f cvat_server docker compose logs -f cvat_worker_import # View last 100 lines docker compose logs --tail=100 cvat_server # Search logs for errors docker compose logs cvat_server | grep -i error docker compose logs cvat_server | grep -i exception # Access ClickHouse analytics (if enabled) docker exec -it cvat_clickhouse clickhouse-client # SELECT * FROM events LIMIT 10; # exit # Check service health docker compose ps curl http://localhost:8080/api/server/health # Monitor resource usage docker stats --no-stream # Export logs to file for analysis docker compose logs --since 24h > cvat_logs_$(date +%Y%m%d).log - Step 20
Updating CVAT
Keep CVAT up-to-date to receive bug fixes, security patches, and new features. The update process involves pulling new code, downloading updated Docker images, running database migrations, and restarting services. Always backup before updating.
# Backup before updating (see Backup and Restore step) mkdir -p cvat-backups/pre-update-$(date +%Y%m%d) # Stop CVAT services docker compose down # Pull latest code git fetch --all git checkout v2.20.0 # or latest version # Pull updated Docker images docker compose pull # Start services docker compose up -d # Run database migrations docker exec -it cvat_server bash -ic 'python3 ~/manage.py migrate' # Clear cache docker exec -it cvat_redis_inmem redis-cli FLUSHALL # Verify update curl http://localhost:8080/api/server/about # Check logs for errors docker compose logs -f --tail=50 # If issues occur, rollback: # docker compose down # git checkout v2.19.0 # previous version # docker compose pull # docker compose up -d # # Restore database backup if needed⚠ Heads up: Always read release notes before updating. Some updates require manual migration steps. Test updates in staging environment before applying to production. Database migrations cannot be easily reversed. - Step 21
Common Usage Patterns
Practical workflows for common CVAT use cases, from creating simple annotation tasks to managing complex multi-user projects with AI-assisted annotation and quality control pipelines.
Common CVAT Workflows: 1. Simple Image Annotation Project: - Create project → Define labels (car, person, etc.) - Create task → Upload images - Assign annotators → Set job size (10 images/job) - Annotate → Draw bounding boxes - Review → Quality check by supervisor - Export → COCO, YOLO, Pascal VOC formats 2. Video Object Tracking: - Create task with video file - Define labels with attributes (color, size) - Use Track mode for temporal consistency - Apply AI tracking models (SiamMask, TransT) - Interpolate between keyframes - Export with frame timestamps 3. Team Collaboration Project: - Create organization → Invite team members - Assign roles (Maintainer, Supervisor, Worker) - Create project with quality settings - Split into tasks by annotator - Enable peer review workflow - Track progress in analytics dashboard - Export consensus annotations 4. AI-Assisted Workflow: - Upload images → Run detector (YOLO/Mask R-CNN) - Review AI predictions → Adjust thresholds - Manually correct errors - Export annotated data - Train improved model - Deploy updated model to CVAT - Iterate for continuous improvement - Step 22
Troubleshooting Common Issues
Solutions to frequently encountered problems during CVAT deployment and operation. Most issues relate to Docker configuration, resource constraints, network connectivity, or database migrations.
# Issue: Services won't start # Solution: Check Docker daemon and logs sudo systemctl status docker docker compose logs --tail=100 # Issue: Port 8080 already in use # Solution: Change port in docker-compose.yml or stop conflicting service sudo lsof -i :8080 # Edit docker-compose.yml traefik ports: "8080:8080" → "8081:8080" # Issue: Cannot create superuser # Solution: Ensure database is ready docker compose ps # Check cvat_db is "Up" docker compose logs cvat_db | grep "ready to accept connections" # Issue: Login fails with "Invalid credentials" # Solution: Reset password via Django docker exec -it cvat_server bash -ic 'python3 ~/manage.py changepassword admin' # Issue: Slow performance with large videos # Solution: Increase worker replicas and Redis memory # See Performance Optimization step # Issue: AI models not appearing # Solution: Check serverless deployment docker compose -f docker-compose.serverless.yml ps nuctl get functions # Redeploy if needed: cd serverless && ./deploy_cpu.sh # Issue: Database migration errors # Solution: Check PostgreSQL version compatibility docker exec cvat_db psql -U root -c 'SELECT version();' # Must be PostgreSQL 15.x # Issue: Out of disk space # Solution: Clean Docker cache and old images docker system df docker system prune -a --volumes # WARNING: Removes unused data # Issue: Cannot access from remote host # Solution: Update ALLOWED_HOSTS and firewall # Add to .env: ALLOWED_HOSTS='*' sudo ufw allow 8080/tcp⚠ Heads up: Before running 'docker system prune', ensure you have backups. This command removes all unused containers, networks, images, and optionally volumes.
Feature requests
Sign in to suggest features or vote on existing ones.
No feature requests yet.
Discussion
Sign in to join the discussion.
No comments yet.