Elasticsearch Tutorial: Search Millions of Records in Milliseconds

{
}
Complete Tutorial

ELASTICSEARCH

Search Made Lightning Fast
⚡ Millions of Records • 10ms Response • Node.js Ready
Production-Ready Examples Included

⚡ Real-World Impact: Elasticsearch can search through millions of documents in milliseconds. Companies like Netflix, Uber, and LinkedIn use it to power search across massive datasets. You can implement basic search functionality in under an hour.

What Is Elasticsearch and Why Should You Care?

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Think of it as a specialized database optimized for one thing: finding stuff blazingly fast.

Here's what makes it special:

  • Full-text search: Find documents by content, not just exact matches
  • Near real-time: Documents are searchable within seconds of indexing
  • RESTful API: Everything happens via HTTP requests
  • Scalable: Start with one server, scale to hundreds
  • Flexible: Works with structured and unstructured data

💡 Use Cases: Product catalogs, log analysis, autocomplete suggestions, user search, document management, monitoring dashboards, geospatial queries, and real-time analytics.

Core Concepts You Need to Know

Document

The basic unit of data. Think of it as a JSON object that represents a single record (a product, user, log entry, etc.)

Index

A collection of documents with similar characteristics. Similar to a database table, but more flexible.

Mapping

Defines how documents and their fields are stored and indexed. Like a schema, but dynamic.

Query DSL

Domain Specific Language for building complex search queries using JSON.

Shard

A subset of an index. Elasticsearch divides indexes into shards for distribution and parallelization.

Getting Started: Your First Index

Let's build a simple product search. First, create an index and add some documents:

Create an Index

PUT /products
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "price": { "type": "float" },
      "category": { "type": "keyword" },
      "in_stock": { "type": "boolean" },
      "created_at": { "type": "date" }
    }
  }
}

Add Documents

POST /products/_doc
{
  "name": "Wireless Bluetooth Headphones",
  "description": "High-quality noise cancelling headphones with 30hr battery",
  "price": 79.99,
  "category": "Electronics",
  "in_stock": true,
  "created_at": "2024-01-15"
}

POST /products/_doc
{
  "name": "Gaming Mouse RGB",
  "description": "Ergonomic gaming mouse with customizable RGB lighting",
  "price": 49.99,
  "category": "Electronics",
  "in_stock": true,
  "created_at": "2024-01-20"
}

💡 Field Types Matter: text is analyzed for full-text search, while keyword is for exact matching and aggregations. Choose wisely!

Basic Search Queries

1. Match All (Get Everything)

GET /products/_search
{
  "query": {
    "match_all": {}
  }
}

2. Full-Text Search

GET /products/_search
{
  "query": {
    "match": {
      "description": "gaming mouse"
    }
  }
}

This finds documents where "gaming" OR "mouse" appear in the description. Elasticsearch ranks results by relevance score.

3. Multi-Field Search

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "bluetooth headphones",
      "fields": ["name^2", "description"]
    }
  }
}

The ^2 boosts the name field's importance by 2x. Matches in name will rank higher.

4. Exact Match (Term Query)

GET /products/_search
{
  "query": {
    "term": {
      "category": "Electronics"
    }
  }
}

⚠️ Important: Use term queries with keyword fields only. For text fields, use match instead.

Advanced Queries You'll Actually Use

Bool Query (Combining Conditions)

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "wireless" } }
      ],
      "filter": [
        { "term": { "in_stock": true } },
        { "range": { "price": { "lte": 100 } } }
      ],
      "should": [
        { "match": { "category": "Electronics" } }
      ],
      "must_not": [
        { "match": { "name": "refurbished" } }
      ]
    }
  }
}

What's happening:

  • must: Documents MUST match (affects score)
  • filter: Documents MUST match (doesn't affect score, faster)
  • should: Documents SHOULD match (boosts score if they do)
  • must_not: Documents must NOT match

Range Queries

GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 20,
        "lte": 80
      }
    }
  }
}

Range operators: gt (greater than), gte (greater than or equal), lt, lte. Works with numbers, dates, and strings.

Fuzzy Search (Typo Tolerance)

GET /products/_search
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "headphonez",
        "fuzziness": "AUTO"
      }
    }
  }
}

This matches "headphones" even with spelling mistakes. AUTO fuzziness adjusts based on term length.

Aggregations: Analytics on Steroids

Aggregations let you calculate metrics and group data. Think SQL GROUP BY, but more powerful.

Count Products by Category

GET /products/_search
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      }
    }
  }
}

Calculate Average Price

GET /products/_search
{
  "size": 0,
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    },
    "price_stats": {
      "stats": {
        "field": "price"
      }
    }
  }
}

The stats aggregation gives you count, min, max, avg, and sum in one query.

Nested Aggregations

GET /products/_search
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_price_per_category": {
          "avg": { "field": "price" }
        }
      }
    }
  }
}

This groups by category and calculates average price for each category. Perfect for building dashboards!

Real-World Implementation in Node.js

Installation

npm install @elastic/elasticsearch

Complete Search API

const { Client } = require('@elastic/elasticsearch');

const client = new Client({
  node: 'http://localhost:9200'
});

// Search function
async function searchProducts(query, options = {}) {
  try {
    const { body } = await client.search({
      index: 'products',
      body: {
        query: {
          bool: {
            must: [
              {
                multi_match: {
                  query: query,
                  fields: ['name^2', 'description'],
                  fuzziness: 'AUTO'
                }
              }
            ],
            filter: options.filters || []
          }
        },
        from: options.page * options.size || 0,
        size: options.size || 10,
        sort: options.sort || [{ _score: 'desc' }]
      }
    });

    return {
      total: body.hits.total.value,
      results: body.hits.hits.map(hit => ({
        id: hit._id,
        score: hit._score,
        ...hit._source
      }))
    };

  } catch (error) {
    console.error('Search error:', error);
    throw error;
  }
}

// Usage example
async function main() {
  const results = await searchProducts('wireless headphones', {
    filters: [
      { range: { price: { lte: 100 } } },
      { term: { in_stock: true } }
    ],
    page: 0,
    size: 20
  });

  console.log(`Found ${results.total} products`);
  results.results.forEach(product => {
    console.log(`${product.name} - ${product.price}`);
  });
}

main();

🎯 Production Tips:

  • Always use pagination (from/size) for large result sets
  • Cache frequently-used queries with Redis
  • Use connection pooling in production
  • Monitor query performance with slowlog

Performance Optimization Tips

1. Use Filters Instead of Queries When Possible

Filters are cached and don't calculate scores, making them much faster for boolean conditions.

2. Limit the Number of Shards

For small indexes (under 50GB), use 1-2 shards. Too many shards hurt performance.

3. Use _source Filtering

Only return the fields you need to reduce network overhead.

4. Bulk Indexing

When indexing multiple documents, use the bulk API for 10x better performance.

Common Pitfalls to Avoid

Pitfall #1: Using Wildcards at the Beginning

Queries like *search* scan the entire index. Use ngrams instead.

Pitfall #2: Not Setting Refresh Interval

Default 1s refresh is overkill for most cases. Set to 30s or disable for bulk indexing.

Pitfall #3: Deep Pagination

Don't paginate beyond 10,000 results. Use search_after or scroll API for deep pagination.

Start Searching Faster Today

Elasticsearch transforms how users find information in your application. What takes SQL minutes to process, Elasticsearch does in milliseconds.

Don't let slow search frustrate your users. Implement Elasticsearch and watch engagement soar.

Essential Resources

  • Elasticsearch Official Docs - Comprehensive documentation and guides
  • Elastic Stack - Learn Kibana, Logstash, and Beats integration
  • Elasticsearch JavaScript Client - Official Node.js library docs
  • Elastic Blog - Best practices and case studies
  • Elasticsearch Community Forum - Get help from experts

Questions about Elasticsearch? Drop them in the comments below!

Post a Comment

0 Comments