Elasticsearch - Distributed Search and Analytics Engine
Install and configure Elasticsearch, the powerful open-source distributed RESTful search and analytics engine built on Apache Lucene - covering installation, indexing, search queries, aggregations, cluster management, and production best practices.
- Step 1
Overview
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. Originally developed by Shay Banon in 2010, it has become the leading solution for full-text search, log analytics, and real-time data analysis. With 70,000+ GitHub stars, Elasticsearch powers search features for companies like Netflix, Uber, GitHub, and Wikipedia.
Key capabilities:
- Distributed architecture: Horizontal scaling across multiple nodes with automatic sharding and replication
- Full-text search: Advanced text analysis with 40+ language analyzers and custom tokenizers
- Real-time indexing: Near real-time data ingestion and search with sub-second latency
- RESTful API: Simple JSON-based HTTP interface for all operations
- Aggregations: Powerful analytics framework for metrics, bucketing, and pipeline aggregations
- Schema-free JSON: Dynamic mapping with automatic field type detection
- Multi-tenancy: Index-level isolation for multiple datasets in one cluster
Why Elasticsearch:
- Battle-tested: Powers some of the world's largest search deployments (billions of documents)
- ELK Stack: Native integration with Logstash and Kibana for complete observability
- Rich ecosystem: Clients for Java, Python, JavaScript, Go, Ruby, and 10+ languages
- Scalable: Scales from single-node development to multi-datacenter production clusters
- Versatile: Search engines, log analytics, metrics, security analytics, business intelligence
Official site: https://www.elastic.co/elasticsearch GitHub: https://github.com/elastic/elasticsearch (70K+ stars) Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html Download: https://www.elastic.co/downloads/elasticsearch - Step 2
Technology Stack
Elasticsearch is built on a sophisticated stack of Java technologies optimized for distributed systems and high-performance search.
Core platform:
- Java 21+ (bundled with distribution)
- Apache Lucene (text search library and inverted index engine)
- Netty for async HTTP and transport layer
- Jackson for JSON serialization
- Log4j2 for structured logging
Distributed systems:
- Custom cluster coordination (formerly Zen Discovery, now based on Raft consensus)
- Segment-based storage with automatic merge policies
- Vector clock for distributed versioning
- Cross-cluster replication for disaster recovery
Data structures:
- Inverted index for text search (term → document IDs)
- Doc values for sorting and aggregations (column-oriented storage)
- BKD trees for numeric and geo-spatial indexing
- Finite state transducers (FST) for efficient term lookups
Query execution:
- Two-phase distributed search (query then fetch)
- Scoring with TF-IDF and BM25 algorithms
- Vector search with k-NN for semantic/machine learning use cases
- Query cache and request cache for performance
Architecture: ├── Core: Java 21, Apache Lucene ├── Network: Netty (HTTP + Transport) ├── Serialization: Jackson (JSON) ├── Coordination: Raft consensus └── Storage: Segment-based with LSM-tree patterns Data structures: ├── Inverted index (text search) ├── Doc values (aggregations) ├── BKD trees (numerics, geo) └── FST (term lookups) Query: ├── Distributed search (query → fetch) ├── Scoring: BM25 (default), TF-IDF └── Vector: k-NN, ANN algorithms - Step 3
Quick Installation Options
Multiple installation methods available depending on your environment and use case.
Installation options:
- Docker: Fastest for development and testing
- Binary archive: Direct download for any platform
- Package managers: APT, YUM, Homebrew for production
- Kubernetes: Elastic Cloud on Kubernetes (ECK) operator
- Elastic Cloud: Fully managed SaaS offering
System requirements:
- 2+ GB RAM (4+ GB recommended for production)
- 64-bit OS (Linux, macOS, Windows)
- Java bundled with distribution (no separate install needed)
- Sufficient disk space for indices (varies by use case)
# Option 1: Docker (quick start) docker run -d \ --name elasticsearch \ -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ -e "xpack.security.enabled=false" \ docker.elastic.co/elasticsearch/elasticsearch:8.13.0 # Verify installation curl http://localhost:9200 # Output: cluster info JSON with version, tagline # Option 2: Binary (Linux/macOS) wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.13.0-linux-x86_64.tar.gz tar -xzf elasticsearch-8.13.0-linux-x86_64.tar.gz cd elasticsearch-8.13.0/ ./bin/elasticsearch # Option 3: Homebrew (macOS) brew tap elastic/tap brew install elastic/tap/elasticsearch-full brew services start elastic/tap/elasticsearch-full # Option 4: APT (Debian/Ubuntu) wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list sudo apt update && sudo apt install elasticsearch sudo systemctl enable elasticsearch sudo systemctl start elasticsearch # Option 5: YUM (RHEL/CentOS) sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch cat > /etc/yum.repos.d/elasticsearch.repo << 'EOF' [elasticsearch] name=Elasticsearch repository for 8.x packages baseurl=https://artifacts.elastic.co/packages/8.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md EOF sudo yum install elasticsearch sudo systemctl enable elasticsearch sudo systemctl start elasticsearch # Verify curl http://localhost:9200 - Step 4
Basic Configuration
Elasticsearch uses YAML configuration files located in
config/directory. Key files areelasticsearch.yml(main config) andjvm.options(JVM settings).Essential settings:
- cluster.name: Cluster identifier (nodes with same name join together)
- node.name: Human-readable node identifier
- path.data: Where indices are stored (critical for backups)
- path.logs: Log file location
- network.host: Network binding address
- http.port: HTTP API port (default 9200)
- discovery.seed_hosts: Bootstrap cluster discovery
- cluster.initial_master_nodes: Initial master-eligible nodes
# config/elasticsearch.yml - Basic single-node configuration cluster.name: my-application node.name: node-1 # Data and logs paths path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch # Network settings network.host: 0.0.0.0 http.port: 9200 # Single-node cluster (development) discovery.type: single-node # Security (disable for development, enable for production) xpack.security.enabled: false xpack.security.enrollment.enabled: false # --- Production multi-node configuration --- # cluster.yml for 3-node cluster cluster.name: production-cluster node.name: ${HOSTNAME} # Set via environment variable # Node roles (can combine multiple) node.roles: [ master, data, ingest ] path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200 transport.port: 9300 # Cluster discovery discovery.seed_hosts: - es-node1:9300 - es-node2:9300 - es-node3:9300 cluster.initial_master_nodes: - es-node1 - es-node2 - es-node3 # Security xpack.security.enabled: true xpack.security.transport.ssl.enabled: true xpack.security.transport.ssl.verification_mode: certificate xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12 xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12 - Step 5
JVM and Memory Configuration
Elasticsearch is a Java application, so JVM tuning is critical for performance. The heap size is the most important setting.
Heap size rules:
- Set
-Xmsand-Xmxto the same value (prevents heap resizing) - Never exceed 50% of physical RAM (leave space for OS file cache)
- Never exceed ~31 GB (compressed object pointers threshold)
- For a 64 GB server, set heap to 31 GB
- For a 16 GB server, set heap to 8 GB
# config/jvm.options - JVM heap settings # Set heap size (example: 8 GB for a 16 GB server) -Xms8g -Xmx8g # Production recommended settings -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 # Heap dump on out-of-memory -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch # GC logging (helpful for troubleshooting) -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m # Environment variable approach (overrides jvm.options) export ES_JAVA_OPTS="-Xms8g -Xmx8g" ./bin/elasticsearch # Docker environment variable docker run -d \ -e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \ docker.elastic.co/elasticsearch/elasticsearch:8.13.0 - Set
- Step 6
First Steps: Creating an Index and Adding Documents
Elasticsearch stores data in indices (similar to databases) containing documents (similar to rows). Documents are JSON objects. Let's create an index and add documents.
Key concepts:
- Index: Collection of documents with similar characteristics
- Document: Basic unit of information (JSON)
- Field: Key-value pair in a document
- Mapping: Schema definition (field types and settings)
- Shard: Index subdivision for horizontal scaling
- Replica: Shard copy for high availability
# Create an index with explicit mapping curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d' { "mappings": { "properties": { "name": { "type": "text" }, "description": { "type": "text" }, "price": { "type": "float" }, "category": { "type": "keyword" }, "tags": { "type": "keyword" }, "in_stock": { "type": "boolean" }, "created_at": { "type": "date" } } } }' # Add a document (POST generates auto ID) curl -X POST "localhost:9200/products/_doc" -H 'Content-Type: application/json' -d' { "name": "Wireless Headphones", "description": "High-quality noise-cancelling headphones", "price": 299.99, "category": "electronics", "tags": ["audio", "wireless", "bluetooth"], "in_stock": true, "created_at": "2024-01-15T10:30:00Z" }' # Add a document with specific ID curl -X PUT "localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d' { "name": "USB-C Cable", "description": "Fast charging cable 2m", "price": 19.99, "category": "accessories", "tags": ["cable", "usb-c"], "in_stock": true, "created_at": "2024-01-16T14:20:00Z" }' # Bulk indexing (faster for multiple documents) curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' --data-binary @- << 'EOF' { "index": { "_index": "products" } } { "name": "Laptop Stand", "price": 49.99, "category": "accessories", "in_stock": true } { "index": { "_index": "products" } } { "name": "Mechanical Keyboard", "price": 149.99, "category": "electronics", "in_stock": false } EOF # Retrieve a document by ID curl -X GET "localhost:9200/products/_doc/1" # Get index mapping curl -X GET "localhost:9200/products/_mapping" # Get index stats curl -X GET "localhost:9200/products/_stats" - Step 7
Search Queries: From Simple to Complex
Elasticsearch provides a rich Query DSL (Domain Specific Language) for searching documents. Queries range from simple text matches to complex boolean logic.
Query types:
- Match: Full-text search with analysis
- Term: Exact match (no analysis)
- Range: Numeric or date ranges
- Bool: Combine queries with AND/OR/NOT logic
- Wildcard: Pattern matching with * and ?
- Fuzzy: Approximate matching (typo tolerance)
- Nested: Query nested objects
- Geo: Geographic queries
# Simple match query (full-text search) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "description": "wireless headphones" } } }' # Match with size and pagination curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "from": 0, "size": 10, "query": { "match": { "name": "cable" } } }' # Multi-match (search across multiple fields) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "query": { "multi_match": { "query": "wireless", "fields": ["name", "description"] } } }' # Term query (exact match, no analysis) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "query": { "term": { "category": "electronics" } } }' # Range query curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "query": { "range": { "price": { "gte": 50, "lte": 200 } } } }' # Bool query (complex logic) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "description": "wireless" } } ], "filter": [ { "term": { "in_stock": true } }, { "range": { "price": { "lte": 500 } } } ], "must_not": [ { "term": { "category": "refurbished" } } ], "should": [ { "match": { "tags": "bluetooth" } } ], "minimum_should_match": 1 } } }' # Wildcard query curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "query": { "wildcard": { "name": "*phone*" } } }' # Fuzzy query (typo tolerance) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "query": { "fuzzy": { "name": { "value": "hedphones", "fuzziness": "AUTO" } } } }' - Step 8
Aggregations: Analytics and Metrics
Aggregations provide analytics over your data. Think of them as SQL GROUP BY on steroids. There are three types: metric (calculate metrics), bucket (group documents), and pipeline (aggregate aggregation results).
Common aggregations:
- Metrics: avg, sum, min, max, stats, cardinality, percentiles
- Bucket: terms (group by field), date_histogram, range, filters
- Pipeline: derivative, cumulative_sum, moving_average
# Terms aggregation (group by category) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "categories": { "terms": { "field": "category" } } } }' # Average price per category curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "categories": { "terms": { "field": "category" }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }' # Stats aggregation (min, max, avg, sum, count) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "price_stats": { "stats": { "field": "price" } } } }' # Date histogram (time-series data) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "products_over_time": { "date_histogram": { "field": "created_at", "calendar_interval": "month" }, "aggs": { "total_revenue": { "sum": { "field": "price" } } } } } }' # Percentiles aggregation curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "price_percentiles": { "percentiles": { "field": "price", "percents": [50, 75, 90, 95, 99] } } } }' # Range aggregation curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "price_ranges": { "range": { "field": "price", "ranges": [ { "to": 50 }, { "from": 50, "to": 100 }, { "from": 100 } ] } } } }' - Step 9
Index Management: Mappings, Aliases, and Templates
Effective index management is crucial for performance and maintainability. This includes defining mappings, using aliases for zero-downtime reindexing, and templates for consistent settings.
Best practices:
- Define explicit mappings (don't rely on dynamic mapping for production)
- Use index aliases for production indices
- Create index templates for time-series data
- Set appropriate shard counts (over-sharding hurts performance)
- Use index lifecycle management (ILM) for data retention
# Update mapping (add new field to existing index) curl -X PUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d' { "properties": { "manufacturer": { "type": "keyword" }, "rating": { "type": "float" } } }' # Create index alias curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d' { "actions": [ { "add": { "index": "products", "alias": "products-latest" } } ] }' # Atomic alias switch (zero-downtime reindex) curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d' { "actions": [ { "remove": { "index": "products-v1", "alias": "products" } }, { "add": { "index": "products-v2", "alias": "products" } } ] }' # Create index template curl -X PUT "localhost:9200/_index_template/logs_template" -H 'Content-Type: application/json' -d' { "index_patterns": ["logs-*"], "template": { "settings": { "number_of_shards": 1, "number_of_replicas": 1, "refresh_interval": "5s" }, "mappings": { "properties": { "timestamp": { "type": "date" }, "level": { "type": "keyword" }, "message": { "type": "text" }, "service": { "type": "keyword" } } } } }' # Now any index matching logs-* gets these settings curl -X PUT "localhost:9200/logs-2024-01-15" # Reindex data from old index to new curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d' { "source": { "index": "products-old" }, "dest": { "index": "products-new" } }' # Delete index curl -X DELETE "localhost:9200/products-old" # Get all indices curl -X GET "localhost:9200/_cat/indices?v" # Get cluster health curl -X GET "localhost:9200/_cluster/health?pretty" - Step 10
Production Cluster Setup
Production deployments require a multi-node cluster for high availability and scalability. A typical setup includes dedicated master, data, and ingest nodes.
Node roles:
- Master: Cluster state management (lightweight, 3+ nodes for quorum)
- Data: Store indices and execute queries (most resources)
- Ingest: Pre-process documents (optional)
- Coordinating: Route requests (no data, no master)
- ML: Machine learning (optional)
Minimum production cluster: 3 master-eligible nodes + 2+ data nodes
# Master node config (es-master-1) cluster.name: production node.name: master-1 node.roles: [ master ] path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200 transport.port: 9300 discovery.seed_hosts: - master-1:9300 - master-2:9300 - master-3:9300 cluster.initial_master_nodes: - master-1 - master-2 - master-3 xpack.security.enabled: true xpack.security.transport.ssl.enabled: true --- # Data node config (es-data-1) cluster.name: production node.name: data-1 node.roles: [ data, ingest ] path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200 transport.port: 9300 discovery.seed_hosts: - master-1:9300 - master-2:9300 - master-3:9300 xpack.security.enabled: true xpack.security.transport.ssl.enabled: true # Hot/Warm architecture (data tiers) node.attr.data: hot # or warm, cold --- # Coordinating-only node (load balancer) cluster.name: production node.name: coordinator-1 node.roles: [ ] # No roles = coordinating only network.host: 0.0.0.0 http.port: 9200 transport.port: 9300 discovery.seed_hosts: - master-1:9300 - master-2:9300 - master-3:9300 - Step 11
Security: Authentication and TLS
Elasticsearch security features (X-Pack Security) provide authentication, authorization, and encryption. Essential for production deployments.
Security layers:
- TLS: Encrypt HTTP and transport communication
- Authentication: Built-in, LDAP, Active Directory, SAML, OpenID Connect
- Authorization: Role-based access control (RBAC)
- Audit logging: Track security events
- Field/document level security: Fine-grained access control
# Generate certificates for inter-node communication cd /usr/share/elasticsearch bin/elasticsearch-certutil ca --pem # Creates elastic-stack-ca.zip unzip elastic-stack-ca.zip bin/elasticsearch-certutil cert \ --ca-cert ca/ca.crt \ --ca-key ca/ca.key \ --pem \ --name node1 \ --dns node1.example.com \ --ip 192.168.1.10 # Copy certificates to config/certs/ mkdir config/certs cp node1/node1.crt node1/node1.key ca/ca.crt config/certs/ chmod 644 config/certs/* # Enable security in elasticsearch.yml xpack.security.enabled: true # TLS for transport (inter-node) xpack.security.transport.ssl.enabled: true xpack.security.transport.ssl.verification_mode: certificate xpack.security.transport.ssl.certificate: certs/node1.crt xpack.security.transport.ssl.key: certs/node1.key xpack.security.transport.ssl.certificate_authorities: [ "certs/ca.crt" ] # TLS for HTTP (client connections) xpack.security.http.ssl.enabled: true xpack.security.http.ssl.certificate: certs/node1.crt xpack.security.http.ssl.key: certs/node1.key xpack.security.http.ssl.certificate_authorities: [ "certs/ca.crt" ] # Set built-in user passwords bin/elasticsearch-setup-passwords auto # Or interactive: bin/elasticsearch-setup-passwords interactive # Create custom user curl -X POST "https://localhost:9200/_security/user/john" \ -u elastic:password -k \ -H 'Content-Type: application/json' -d' { "password" : "s3cr3t", "roles" : [ "kibana_admin", "monitoring_user" ], "full_name" : "John Doe", "email" : "john@example.com" }' # Create custom role curl -X POST "https://localhost:9200/_security/role/products_read" \ -u elastic:password -k \ -H 'Content-Type: application/json' -d' { "indices": [ { "names": [ "products*" ], "privileges": [ "read" ] } ] }' # Test authenticated request curl -u john:s3cr3t -k https://localhost:9200/_cluster/health - Step 12
Monitoring and Observability
Monitor Elasticsearch health and performance using built-in APIs and the Elastic Stack (formerly ELK Stack).
Key metrics to monitor:
- Cluster health (green/yellow/red)
- Node CPU, memory, disk usage
- JVM heap usage and GC times
- Query latency and throughput
- Indexing rate and latency
- Shard count and size
- Rejected threads (thread pool saturation)
# Cluster health curl -X GET "localhost:9200/_cluster/health?pretty" # Status: green (all good), yellow (replicas missing), red (primary missing) # Node stats (detailed metrics) curl -X GET "localhost:9200/_nodes/stats?pretty" # Index stats curl -X GET "localhost:9200/products/_stats?pretty" # Thread pool stats (watch for rejections) curl -X GET "localhost:9200/_cat/thread_pool?v&h=name,queue,active,rejected,completed" # Pending tasks (should be near zero) curl -X GET "localhost:9200/_cluster/pending_tasks" # Hot threads (troubleshoot CPU spikes) curl -X GET "localhost:9200/_nodes/hot_threads" # Recovery status (ongoing shard movements) curl -X GET "localhost:9200/_cat/recovery?v&active_only=true" # Allocation explanation (why shard isn't allocated) curl -X GET "localhost:9200/_cluster/allocation/explain?pretty" # Enable slow log for queries (add to elasticsearch.yml) index.search.slowlog.threshold.query.warn: 10s index.search.slowlog.threshold.query.info: 5s index.search.slowlog.threshold.query.debug: 2s index.indexing.slowlog.threshold.index.warn: 10s index.indexing.slowlog.threshold.index.info: 5s # Or set dynamically curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d' { "index.search.slowlog.threshold.query.warn": "10s", "index.search.slowlog.threshold.fetch.debug": "500ms" }' # Metricbeat for comprehensive monitoring # Download and configure Metricbeat, then enable elasticsearch module metricbeat modules enable elasticsearch metricbeat setup metricbeat -e - Step 13
Client Libraries and Language SDKs
Elasticsearch provides official clients for many programming languages. All clients support the full REST API with language-specific idioms.
Official clients:
- Java (High-Level REST Client, Java API Client)
- Python (elasticsearch-py)
- JavaScript/Node.js (@elastic/elasticsearch)
- Go (go-elasticsearch)
- Ruby (elasticsearch-ruby)
- PHP (elasticsearch-php)
- .NET (Elasticsearch.Net, NEST)
- Rust (elasticsearch-rs)
# Python client example from elasticsearch import Elasticsearch # Create client es = Elasticsearch( ["http://localhost:9200"], basic_auth=("elastic", "password") ) # Check cluster health health = es.cluster.health() print(f"Cluster status: {health['status']}") # Index a document response = es.index( index="products", id=1, document={ "name": "Laptop", "price": 999.99, "category": "electronics" } ) print(f"Indexed: {response['result']}") # Search response = es.search( index="products", body={ "query": { "match": { "name": "laptop" } } } ) for hit in response['hits']['hits']: print(f"{hit['_source']['name']}: ${hit['_source']['price']}") # Aggregation response = es.search( index="products", body={ "size": 0, "aggs": { "categories": { "terms": { "field": "category" } } } } ) for bucket in response['aggregations']['categories']['buckets']: print(f"{bucket['key']}: {bucket['doc_count']} products") - Step 14
JavaScript/Node.js Client Example
The official JavaScript client works in both Node.js and browser environments with full TypeScript support.
// npm install @elastic/elasticsearch const { Client } = require('@elastic/elasticsearch'); // Create client const client = new Client({ node: 'http://localhost:9200', auth: { username: 'elastic', password: 'password' } }); // Index a document async function indexDocument() { const result = await client.index({ index: 'products', id: 1, document: { name: 'Smartphone', price: 699.99, category: 'electronics', in_stock: true } }); console.log('Indexed:', result.result); } // Search with bool query async function search() { const result = await client.search({ index: 'products', query: { bool: { must: [ { match: { category: 'electronics' } } ], filter: [ { range: { price: { lte: 1000 } } } ] } }, sort: [ { price: 'asc' } ], size: 10 }); result.hits.hits.forEach(hit => { console.log(`${hit._source.name}: $${hit._source.price}`); }); } // Aggregation with sub-aggregation async function aggregate() { const result = await client.search({ index: 'products', size: 0, aggs: { by_category: { terms: { field: 'category' }, aggs: { avg_price: { avg: { field: 'price' } } } } } }); result.aggregations.by_category.buckets.forEach(bucket => { console.log(`${bucket.key}: ${bucket.doc_count} items, avg price $${bucket.avg_price.value.toFixed(2)}`); }); } // Run examples async function main() { await indexDocument(); await search(); await aggregate(); } main().catch(console.error); - Step 15
Common Use Cases
Elasticsearch excels in several key domains:
1. Full-text search: Power website search, e-commerce product search, documentation search. Think GitHub code search, Stack Overflow, Netflix search.
2. Log and event analytics: Centralize logs from applications and infrastructure. The "L" in the ELK/Elastic Stack (Elasticsearch + Logstash + Kibana).
3. Metrics and APM: Store and analyze application performance metrics, infrastructure metrics, business metrics.
4. Security analytics: SIEM (Security Information and Event Management), threat detection, audit logs. Elastic Security provides pre-built detections.
5. Business analytics: Real-time dashboards, customer behavior analytics, sales analytics. Kibana provides visualization layer.
6. Geospatial: Location-based search, geographic analytics, ride-sharing, delivery optimization.
7. Machine learning: Anomaly detection, forecasting, outlier detection via X-Pack ML.
- Step 16
Performance Tuning Best Practices
Optimize Elasticsearch for your specific workload:
Indexing performance:
- Increase
refresh_intervalduring bulk indexing (default 1s → 30s or -1) - Disable replicas during initial load, re-enable after
- Use bulk API instead of individual index requests
- Increase
index.translog.flush_threshold_sizefor write-heavy loads
Query performance:
- Use filters instead of queries when possible (filters are cached)
- Avoid deep pagination (use search_after instead of from/size)
- Use doc values for sorting and aggregations
- Limit
_sourcefields returned (_source_includes) - Use index aliases for zero-downtime reindexing
Shard sizing:
- Target 20-40 GB per shard for search workloads
- Target 40-50 GB per shard for logging workloads
- Avoid over-sharding (1000s of tiny shards hurt performance)
- Use shrink API to reduce shard count
- Use rollover for time-series data
Memory:
- 50% heap, 50% OS file cache is the golden rule
- Monitor JVM heap usage (target <75%)
- Use G1GC for heaps >4GB
- Consider disabling swapping (
bootstrap.memory_lock: true)
# Disable refresh during bulk indexing curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d' { "index": { "refresh_interval": "-1", "number_of_replicas": 0 } }' # Bulk index (do your indexing here) # Re-enable refresh and replicas curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d' { "index": { "refresh_interval": "1s", "number_of_replicas": 1 } }' # Force merge after bulk indexing (optimize segments) curl -X POST "localhost:9200/products/_forcemerge?max_num_segments=1" # Use search_after for deep pagination (more efficient than from/size) curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d' { "size": 10, "sort": [ { "created_at": "asc" }, { "_id": "asc" } ] }' # Use last hit's sort values in search_after for next page # Disable swapping (add to elasticsearch.yml) bootstrap.memory_lock: true # Then run on Linux: sudo systemctl edit elasticsearch # Add: [Service] LimitMEMLOCK=infinity - Increase
- Step 17
Backup and Restore
Elasticsearch snapshots provide backup and disaster recovery. Snapshots are incremental and stored in a repository (filesystem, S3, GCS, Azure).
Best practices:
- Automate snapshots (daily or hourly)
- Store snapshots off-cluster (S3, GCS, Azure)
- Test restores regularly
- Use Snapshot Lifecycle Management (SLM) for automation
# Register snapshot repository (filesystem) curl -X PUT "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/mount/backups/elasticsearch" } }' # Add to elasticsearch.yml first: # path.repo: ["/mount/backups/elasticsearch"] # Create snapshot curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true" # Snapshot specific indices curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_2" -H 'Content-Type: application/json' -d' { "indices": "products,logs-*", "ignore_unavailable": true, "include_global_state": false }' # List snapshots curl -X GET "localhost:9200/_snapshot/my_backup/_all?pretty" # Restore snapshot curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore" -H 'Content-Type: application/json' -d' { "indices": "products", "ignore_unavailable": true, "include_global_state": false, "rename_pattern": "products", "rename_replacement": "restored-products" }' # S3 repository (AWS) curl -X PUT "localhost:9200/_snapshot/s3_backup" -H 'Content-Type: application/json' -d' { "type": "s3", "settings": { "bucket": "my-es-backups", "region": "us-east-1", "base_path": "elasticsearch/snapshots" } }' # Requires repository-s3 plugin: # bin/elasticsearch-plugin install repository-s3 # Configure AWS credentials in elasticsearch-keystore # Delete old snapshots curl -X DELETE "localhost:9200/_snapshot/my_backup/snapshot_1" - Step 18
Kubernetes Deployment with ECK
Elastic Cloud on Kubernetes (ECK) is the official operator for deploying and managing Elasticsearch on Kubernetes. It automates deployment, upgrades, scaling, and monitoring.
# Install ECK operator kubectl create -f https://download.elastic.co/downloads/eck/2.12.0/crds.yaml kubectl apply -f https://download.elastic.co/downloads/eck/2.12.0/operator.yaml # Verify operator is running kubectl -n elastic-system logs -f statefulset.apps/elastic-operator # Deploy Elasticsearch cluster cat <<EOF | kubectl apply -f - apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: production namespace: elastic spec: version: 8.13.0 nodeSets: - name: master count: 3 config: node.roles: ["master"] volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: fast-ssd - name: data count: 3 config: node.roles: ["data", "ingest"] volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: fast-ssd podTemplate: spec: containers: - name: elasticsearch resources: requests: memory: 8Gi cpu: 2 limits: memory: 8Gi cpu: 4 env: - name: ES_JAVA_OPTS value: "-Xms4g -Xmx4g" EOF # Get cluster password PASSWORD=$(kubectl get secret production-es-elastic-user -o go-template='{{.data.elastic | base64decode}}') echo "Elasticsearch password: $PASSWORD" # Access via port-forward kubectl port-forward service/production-es-http 9200 curl -u "elastic:$PASSWORD" -k "https://localhost:9200" # Scale data nodes kubectl patch elasticsearch production --type='merge' -p ' { "spec": { "nodeSets": [ {"name": "data", "count": 5} ] } }' - Step 19
Resources & Next Steps
Documentation:
Community:
Related tools:
- Kibana - Visualization and dashboards
- Logstash - Data pipeline and ingestion
- Beats - Lightweight data shippers
- APM - Application performance monitoring
- Fleet - Centralized management for Elastic Agents
Learning:
- Elastic Training - Official courses
- Elasticsearch: The Definitive Guide
- Elasticsearch in Action (Manning)
Next guides:
- ELK Stack: Complete log analytics pipeline
- Kibana: Building dashboards and visualizations
- Logstash: Data ingestion and transformation
- Elasticsearch performance tuning deep dive
GitHub: https://github.com/elastic/elasticsearch Official site: https://www.elastic.co/elasticsearch Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/ Downloads: https://www.elastic.co/downloads/elasticsearch Community: https://discuss.elastic.co/ Training: https://www.elastic.co/training
Feature requests
Sign in to suggest features or vote on existing ones.
No feature requests yet.
Discussion
Sign in to join the discussion.
No comments yet.