Hugging Face Transformers: State-of-the-Art Machine Learning
Get started with Hugging Face Transformers, the leading library for state-of-the-art machine learning models in PyTorch, TensorFlow, and JAX. Learn installation, configuration, and practical use cases for NLP, vision, audio, and multimodal tasks.
- Step 1
Check Python version
Ensure you have Python 3.10 or higher installed. Transformers has been tested on Python 3.10+ and works best with recent versions.
python --version - Step 2
Set up a virtual environment (recommended)
Creating a virtual environment helps manage dependencies and avoid conflicts between packages. This is especially important for ML projects with many dependencies.
# Create virtual environment python -m venv transformers-env # Activate on Linux/Mac source transformers-env/bin/activate # Activate on Windows transformers-env\Scripts\activate - Step 3
Install Transformers with PyTorch
Install the core transformers library along with PyTorch as the backend. PyTorch is the recommended choice for research and experimentation with Transformers.
pip install transformers torch - Step 4
Install Transformers with TensorFlow (alternative)
If you prefer TensorFlow or need it for production deployment, install Transformers with TensorFlow backend instead. Note that PyTorch has more seamless integration with Transformers.
pip install transformers tensorflow - Step 5
Install additional dependencies
For production use and advanced features, install additional recommended packages including accelerate for distributed training, datasets for loading ML datasets, tokenizers for fast tokenization, and sentencepiece for certain model tokenizers.
pip install accelerate datasets tokenizers sentencepiece - Step 6
Verify installation
Test that Transformers is installed correctly by running a simple sentiment analysis pipeline. This will download a pre-trained model and run inference.
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('Hugging Face is amazing!'))" - Step 7
Understand the Pipeline API
The Pipeline API is the simplest way to use Transformers. It provides a high-level abstraction for common ML tasks including text generation, image segmentation, automatic speech recognition, and more. Here are several practical examples.
from transformers import pipeline # Sentiment analysis sentiment = pipeline('sentiment-analysis') result = sentiment('I love using Transformers!') print(result) # Text generation generator = pipeline('text-generation', model='gpt2') text = generator('Once upon a time', max_length=50) print(text) # Question answering qa = pipeline('question-answering') context = 'Transformers is a library by Hugging Face.' question = 'Who created Transformers?' answer = qa(question=question, context=context) print(answer) - Step 8
Use the AutoModel API for custom workflows
For more control beyond pipelines, use the AutoModel API. This allows you to load any model from the Hugging Face Hub and use it with custom preprocessing and postprocessing.
from transformers import AutoTokenizer, AutoModel import torch # Load model and tokenizer model_name = 'bert-base-uncased' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(model_name) # Tokenize input text = 'Hello, how are you?' inputs = tokenizer(text, return_tensors='pt') # Run inference with torch.no_grad(): outputs = model(**inputs) last_hidden_states = outputs.last_hidden_state print(f'Output shape: {last_hidden_states.shape}') - Step 9
Fine-tune a model with the Trainer API
The Trainer API provides a complete training and evaluation loop, abstracting away boilerplate code. It supports features like mixed precision, distributed training, and gradient accumulation.
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer from datasets import load_dataset # Load dataset and model dataset = load_dataset('imdb') model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Tokenize dataset def tokenize_function(examples): return tokenizer(examples['text'], padding='max_length', truncation=True) tokenized_datasets = dataset.map(tokenize_function, batched=True) # Configure training training_args = TrainingArguments( output_dir='./results', num_train_epochs=3, per_device_train_batch_size=8, per_device_eval_batch_size=8, warmup_steps=500, weight_decay=0.01, logging_dir='./logs', ) # Create trainer and train trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_datasets['train'].select(range(1000)), eval_dataset=tokenized_datasets['test'].select(range(100)) ) trainer.train() - Step 10
Work with vision models
Transformers supports vision tasks using Vision Transformer (ViT) and other vision models. You can perform image classification, object detection, and image segmentation.
from transformers import pipeline from PIL import Image import requests # Load image classifier classifier = pipeline('image-classification', model='google/vit-base-patch16-224') # Download and classify an image url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg' image = Image.open(requests.get(url, stream=True).raw) results = classifier(image) for result in results: print(f"{result['label']}: {result['score']:.4f}") - Step 11
Use translation models
Transformers includes powerful translation models for translating text between languages. The library supports hundreds of language pairs.
from transformers import pipeline # English to French translation translator = pipeline('translation_en_to_fr', model='Helsinki-NLP/opus-mt-en-fr') text = 'Hello, how are you doing today?' translation = translator(text, max_length=100) print(translation[0]['translation_text']) # For other language pairs, use models like: # Helsinki-NLP/opus-mt-en-de (English to German) # Helsinki-NLP/opus-mt-en-es (English to Spanish) # Helsinki-NLP/opus-mt-fr-en (French to English) - Step 12
Configure model cache directory
By default, Transformers caches downloaded models in your home directory. You can customize this location using the HF_HOME or HF_HUB_CACHE environment variable.
# Set cache directory (Linux/Mac) export HF_HOME=/path/to/your/cache # Set cache directory (Windows) set HF_HOME=C:\path\to\your\cache # Or set in Python import os os.environ['HF_HOME'] = '/path/to/your/cache' - Step 13
Use models offline
After downloading models, you can use them offline by setting the HF_HUB_OFFLINE environment variable. Models are loaded from the local cache.
# Enable offline mode (Linux/Mac) export HF_HUB_OFFLINE=1 # Enable offline mode (Windows) set HF_HUB_OFFLINE=1 - Step 14
Explore the Hugging Face Hub
The Hugging Face Hub hosts over 1 million pre-trained model checkpoints across various tasks. Browse models, datasets, and spaces at huggingface.co.
from huggingface_hub import list_models # List popular text classification models models = list_models(task='text-classification', sort='downloads', limit=5) for model in models: print(f'{model.id} - {model.downloads} downloads') # Search for specific models bert_models = list_models(search='bert', limit=10) for model in bert_models: print(model.id) - Step 15
Use Flash Attention for faster inference
Flash Attention can significantly speed up inference for large language models. Install flash-attn and enable it in your model configuration.
# Install Flash Attention (requires CUDA) pip install flash-attn --no-build-isolation - Step 16
Enable Flash Attention in models
Load models with Flash Attention enabled for 2-4x faster inference on compatible hardware (NVIDIA GPUs with compute capability 8.0+).
from transformers import AutoModelForCausalLM # Load model with Flash Attention model = AutoModelForCausalLM.from_pretrained( 'meta-llama/Llama-2-7b-hf', attn_implementation='flash_attention_2', torch_dtype='auto', device_map='auto' ) - Step 17
Use quantization for memory efficiency
Quantization reduces model memory usage by using lower precision weights. This allows you to run larger models on consumer hardware.
from transformers import AutoModelForCausalLM, BitsAndBytesConfig # 8-bit quantization quantization_config = BitsAndBytesConfig(load_in_8bit=True) model = AutoModelForCausalLM.from_pretrained( 'meta-llama/Llama-2-7b-hf', quantization_config=quantization_config, device_map='auto' ) # 4-bit quantization (even more memory efficient) quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype='float16', bnb_4bit_quant_type='nf4' ) model = AutoModelForCausalLM.from_pretrained( 'meta-llama/Llama-2-13b-hf', quantization_config=quantization_config, device_map='auto' )⚠ Heads up: Quantization requires the bitsandbytes library: pip install bitsandbytes - Step 18
Generate text with advanced decoding strategies
Control text generation quality using different decoding strategies including greedy search, beam search, top-k sampling, and top-p (nucleus) sampling.
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = 'gpt2' model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = 'In the future, AI will' inputs = tokenizer(prompt, return_tensors='pt') # Greedy search (deterministic) output = model.generate(**inputs, max_length=50) print(tokenizer.decode(output[0])) # Beam search (better quality) output = model.generate(**inputs, max_length=50, num_beams=5) print(tokenizer.decode(output[0])) # Top-k sampling (creative) output = model.generate(**inputs, max_length=50, do_sample=True, top_k=50) print(tokenizer.decode(output[0])) # Top-p (nucleus) sampling (balanced) output = model.generate(**inputs, max_length=50, do_sample=True, top_p=0.95, temperature=0.9) print(tokenizer.decode(output[0])) - Step 19
Work with audio models
Transformers supports audio tasks including automatic speech recognition (ASR), audio classification, and text-to-speech using models like Wav2Vec2 and Whisper.
from transformers import pipeline # Automatic speech recognition with Whisper asr = pipeline('automatic-speech-recognition', model='openai/whisper-base') audio_file = 'path/to/audio.wav' transcription = asr(audio_file) print(transcription['text']) # Audio classification classifier = pipeline('audio-classification', model='facebook/wav2vec2-base-960h') result = classifier(audio_file) print(result)⚠ Heads up: Audio tasks require librosa or soundfile: pip install librosa soundfile - Step 20
Install from source for latest features
Installing from the GitHub repository gives you access to the latest features and bug fixes before they are released on PyPI.
# Clone the repository git clone https://github.com/huggingface/transformers.git cd transformers # Install in editable mode pip install -e . # Or install directly from GitHub pip install git+https://github.com/huggingface/transformers.git - Step 21
Common use cases and applications
Transformers powers a wide range of applications across industries. Here are some common use cases: NLP Tasks - Text classification (sentiment analysis, spam detection), named entity recognition, question answering, text summarization, and translation. Vision Tasks - Image classification, object detection, image segmentation, and image captioning. Audio Tasks - Speech recognition, audio classification, and text-to-speech. Multimodal Tasks - Visual question answering, image-to-text generation, and document understanding. Production Applications - Chatbots and virtual assistants, content generation and copywriting, code generation and completion, sentiment analysis for customer feedback, and language translation services.
Feature requests
Sign in to suggest features or vote on existing ones.
No feature requests yet.
Discussion
Sign in to join the discussion.
No comments yet.