Advancedaimachine-learningcomputer-visionimage-processingpythononnxopencv

HivisionIDPhotos: AI-Powered ID Photo Generator

A lightweight, offline-capable AI tool for automatic ID photo generation with portrait matting, background replacement, and layout photo creation.

Step 1
Project Overview
HivisionIDPhotos is an intelligent algorithm system for producing ID photos. It uses AI models for face detection, portrait matting, and automatic photo adjustments to generate standard ID photos from user images.

Key capabilities:
- Lightweight matting (purely offline, CPU-only inference)
- Generates standard ID photos and layout photos in various sizes
- Supports pure offline or edge-cloud inference
- Multiple matting model options for different use cases
- FastAPI and Gradio interfaces for API and web usage
Step 2
Environment Requirements
Ensure your system meets the following requirements before installation:
- Python: >= 3.7 (primarily tested on Python 3.10)
- Operating System: Linux, Windows, or macOS
- Memory: At least 4GB RAM (16GB+ recommended for beast mode)
- Disk Space: ~500MB for the project and model weights
```
python --version
# Should be 3.7 or higher
```

Step 3

Clone the Repository

Clone the HivisionIDPhotos repository from GitHub:

git clone https://github.com/Zeyi-Lin/HivisionIDPhotos.git
cd HivisionIDPhotos

Step 4

Set Up Python Virtual Environment

Create and activate a Python virtual environment. Using conda or venv is recommended to isolate dependencies.

# Using conda (recommended)
conda create -n hivision python=3.10
conda activate hivision

# Or using venv
python -m venv hivision_env
source hivision_env/bin/activate  # On Linux/macOS
# source hivision_env\Scripts\activate  # On Windows

Step 5
Install Dependencies
Install the core dependencies and application dependencies. The project is split into base requirements (core functionality) and app requirements (Gradio/FastAPI for web interface).
```
pip install -r requirements.txt
pip install -r requirements-app.txt
```
Step 6
Download Model Weights
The project requires pre-trained model weights for matting and face detection. You can either download them using a script or manually.
```
# Method 1: Use the download script
python scripts/download_model.py --models all

# Method 2: Download manually and place in hivision/creator/weights/
```
Step 7
Manual Model Download (Optional)
If the download script does not work, download these models manually and save them to hivision/creator/weights/:

| Model File | Size | Source | Description | |---|---|---|---| | modnet_photographic_portrait_matting.onnx | 24.7MB | MODNet | Official matting weights | | hivision_modnet.onnx | 24.7MB | Release | Improved matting for color backgrounds | | rmbg-1.4.onnx | 176.2MB | BRIA AI | High-quality matting | | birefnet-v1-lite.onnx | 224MB | BiRefNet | Best-quality matting, GPU required
⚠ Heads up: At least one matting model is required to run the project. HIVISION_MODNET is the default and recommended for CPU-only inference.
Step 8
Face Detection Model Setup (Optional)
By default, the project uses MTCNN for face detection (offline, fast, works on CPU). You can also use RetinaFace (higher accuracy, slower CPU) or Face++ (online API, highest accuracy).

RetinaFace Setup (Offline, High Accuracy, Moderate CPU Speed):
```
# Download RetinaFace weights and place in hivision/creator/retinaface/weights/
curl -L https://github.com/Zeyi-Lin/HivisionIDPhotos/releases/download/pretrained-model/retinaface-resnet50.onnx \
  -o hivision/creator/retinaface/weights/retinaface-resnet50.onnx
```
Step 9
GPU Acceleration Setup (Optional)
For NVIDIA GPU acceleration with the birefnet-v1-lite model (~16GB VRAM recommended), install CUDA-enabled libraries:

For CUDA 12.x and cuDNN 8:
```
pip install onnxruntime-gpu==1.18.0
pip install torch --index-url https://download.pytorch.org/whl/cu121
```
⚠ Heads up: CUDA installations are backward compatible. If you have CUDA 12.6 but the available torch build is for 12.4, you can still install the 12.4 version.
Step 10
Run Gradio Demo
Launch the interactive web interface for generating ID photos. This is the simplest way to use the tool.
```
python app.py
```
Step 11
Using the Gradio Interface
After running app.py, open http://127.0.0.1:7860 in your browser. The interface allows you to:
- Upload a photo
- Choose output size (standard sizes for various countries)
- Select matting model
- Choose background color
- Apply beauty effects
- Generate layout photos (6-inch, A4, etc.)
- Enable face alignment and rotation correction
⚠ Heads up: If you see an error about missing models, ensure you downloaded at least one matting model and placed it in the `hivision/creator/weights/` directory.
Step 12
Python Inference CLI
Use the command-line interface for batch processing or scripting:

1. Create ID Photo:
```
python inference.py \
  -i demo/images/test0.jpg \
  -o ./output.png \
  --height 413 \
  --width 295
```

Step 13

CLI Inference Options

2. Portrait Matting Only (extract person from background):

python inference.py -t human_matting \
  -i demo/images/test0.jpg \
  -o ./matting.png \
  --matting_model hivision_modnet

Step 14

Add Background to Transparent Image

3. Add Background Color to Transparent PNG:

python inference.py -t add_background \
  -i ./output.png \
  -o ./output_colored.jpg \
  -c 4f83ce \
  -k 30 \
  -r 1

Step 15

Generate Layout Photo

4. Create Six-Inch Layout Photo (for ID card printing):

python inference.py -t generate_layout_photos \
  -i ./output_colored.jpg \
  -o ./layout.jpg \
  --height 413 \
  --width 295 \
  -k 200

Step 16
Deploy FastAPI Backend
Run the project as an API service for programmatic access:
```
python deploy_api.py
```
Step 17
API Service Features
The FastAPI backend provides RESTful endpoints for:
- ID photo generation
- Portrait matting
- Background color addition
- Layout photo creation
- Batch processing support
CURL Request Example:
```
curl -X POST "http://localhost:8080/api/v1/idphoto" \
  -F "image=@path/to/photo.jpg" \
  -F "height=413" \
  -F "width=295" \
  -F "matting_model=hivision_modnet" \
  -F "background_color=4f83ce"
```

Step 18

Docker Deployment

Deploy the application using Docker for consistent environments across systems.

# Pull the official image
docker pull linzeyi/hivision_idphotos

# Or build from local Dockerfile (after placing model weights)
docker build -t linzeyi/hivision_idphotos .

Step 19
Run Docker Containers
Run Both Simultaneously:
```
docker compose up -d
```
Step 20
Environment Variables
The project supports several configuration options via environment variables:

| Variable | Type | Description | |---|---|---| | FACE_PLUS_API_KEY | Optional | Face++ API key for online face detection | | FACE_PLUS_API_SECRET | Optional | Face++ API secret | | RUN_MODE | Optional | Set to beast for faster inference (models stay in memory) |
```
docker run -d -p 7860:7860 \
  -e FACE_PLUS_API_KEY=your_key \
  -e FACE_PLUS_API_SECRET=your_secret \
  -e RUN_MODE=beast \
  linzeyi/hivision_idphotos
```
Step 21
Performance Reference
Benchmark results on Mac M1 Max 64GB (CPU only, non-GPU acceleration):

| Model Combination | Memory Usage | Inference Time (512x715) | Inference Time (764x1146) | |---|---|---|---| | MODNet + MTCNN | 410MB | 0.207s | 0.246s | | MODNet + RetinaFace | 405MB | 0.571s | 0.971s | | BiRefNet-lite + RetinaFace | 6.20GB | 7.063s | 7.128s |
⚠ Heads up: BiRefNet-lite requires significant memory (~6GB) and works best with GPU acceleration.
Step 22
Technology Stack
Key technologies and frameworks used:

| Category | Technology | Purpose | |---|---|---| | Language | Python 3.7+ | Core implementation | | Deep Learning | ONNX Runtime | Model inference | | Image Processing | OpenCV | Image manipulation | | Matting Models | MODNet, RMBG-1.4, BiRefNet | Portrait extraction | | Face Detection | MTCNN, RetinaFace, Face++ | Face localization | | Web Framework | FastAPI | REST API backend | | UI Framework | Gradio | Web interface | | Containerization | Docker | Deployment | | Machine Learning | NumPy, PIL | Numerical operations |
Step 23
Model Architecture Details
The matting models use deep neural networks:

MODNet (Hivision Variant):
- Lightweight architecture optimized for real-time performance
- Runs efficiently on CPU
- Good balance of speed and quality
RMBG-1.4 (BRIA AI):
- Vision Transformer (ViT)-based architecture
- Higher quality matting
- Slower inference (177MB model)
BiRefNet V1-lite:
- Bidirectional refinement network
- State-of-the-art matting quality
- Requires GPU for practical inference speed
```
# View available models in the codebase
from hivision.creator.choose_handler import HUMAN_MATTING_MODELS
print(HUMAN_MATTING_MODELS)
```
Step 24
Troubleshooting
Issue 3: CUDA/GPU not working Solution: Verify cuDNN is installed or try CPU-only mode with onnxruntime instead of onnxruntime-gpu.
```
python app.py --port 8080 --host 0.0.0.0
```
Step 25
Advanced Customization
2. Modify preset colors: Edit demo/assets/color_list_EN.csv (name, hex)

3. Add custom watermark fonts: Place font files in hivision/plugin/font/ and update hivision/plugin/watermark.py.
```
Standard,413,295
One inch,567,413
Two inches,626,413
```
Step 26
Community Projects and Extensions
Several community-built extensions exist:
- HivisionIDPhotos-ComfyUI: ComfyUI workflow for ID photo processing
- HivisionIDPhotos-cpp: C++ version for better performance
- HivisionIDPhotos-windows-GUI: Windows desktop app
- HivisionIDPhotos-wechat-weapp: WeChat mini program
```
# Explore community projects at:
# https://github.com/Zeyi-Lin/HivisionIDPhotos
```

HivisionIDPhotos: AI-Powered ID Photo Generator

Project Overview

Environment Requirements

Clone the Repository

Set Up Python Virtual Environment

Install Dependencies

Download Model Weights

Manual Model Download (Optional)

Face Detection Model Setup (Optional)

GPU Acceleration Setup (Optional)

Run Gradio Demo

Using the Gradio Interface

Python Inference CLI

CLI Inference Options

Add Background to Transparent Image

Generate Layout Photo

Deploy FastAPI Backend

API Service Features

Docker Deployment

Run Docker Containers

Environment Variables

Performance Reference

Technology Stack

Model Architecture Details

Troubleshooting

Advanced Customization

Community Projects and Extensions

Feature requests

Discussion