Think of it as: A super-fast, scalable database designed for full-text search and complex queries.
It's built on top of Apache Lucene.
A high-performance, Java-based search library created by Doug Cutting (the same person behind Hadoop).
It’s not a standalone server, but a library that provides:
Text analysis (tokenization, stemming, stop-word filtering)
Inverted indexes
Scoring and ranking of search results
Think of Lucene as the core search brain — Elasticsearch is the distributed system wrapper around it.
Elasticsearch = Lucene + REST API + Clustering + Sharding + Replication
Searching for text within documents, rather than matching exact field values.
Instead of SQL’s WHERE column = 'value', you can search conceptually similar text.
For example:
Query: "fast car"
Matches: "speedy vehicle", "quick cars", etc.
How it works:
Documents are analyzed → broken into tokens.
Lucene builds an inverted index (word → list of docs containing that word).
At query time, Elasticsearch finds and ranks results by TF-IDF or BM25 (term frequency–inverse document frequency).
Search Engine: Full-text search on websites or apps (e.g., e-commerce, blogs)
Logging & Monitoring: Centralized log analytics with Kibana visualization
Metrics & Observability: Time-series data from servers or containers
Data Analytics: Aggregations and dashboards
Autocomplete & Suggesters: Real-time suggestions and recommendations
Beats: Lightweight data shippers (send logs/metrics)
Logstash: Ingest, filter, and transform data before indexing
Elasticsearch: Store and index data
Kibana: Visualization and dashboard layer
Cluster: A collection of one or more nodes (servers) that hold all data and provide search capabilities across indices.
Node: A single instance of Elasticsearch running on a machine. Each node stores data and participates in indexing/searching.
Nodes can serve different roles:
Master node → manages cluster state, shard allocation, and index creation.
Data node → stores data and executes queries/aggregations.
Coordinating node → routes client requests to the right nodes and merges results.
Shard: An index is divided into pieces called shards for horizontal scalability.
Each shard is a Lucene instance — a self-contained search engine.
Replica: A copy of a shard for high availability and load balancing.
Each index is split into primary shards → Sharding
Each primary shard can have replica shards (copies) → Replication
Elasticsearch automatically distributes shards across nodes → Scalability/Horizontal Scaling
If one node fails, data can still be served from replicas → High Availability/ Fault tolerance
Scalable (horizontal scaling via shards)
Real-time search and analytics
Schemaless JSON storage (flexible)
High memory and storage usage
Complex cluster management at scale
Updates/deletes are costly (immutable segments)
Not an OLTP database — better for search/analytics
Index: A collection of documents that share similar characteristics (e.g., logs, users, products) — similar to a table in RDBMS.
Document: The basic unit of information, stored as JSON. Like a row in RDBMS
Field: An attribute (as key-value pairs) in the document. Like a column in RDBMS
Mapping: Defines the schema for documents within an index:
Field names
Data types (text, keyword, integer, date, etc.)
How each field is indexed or analyzed
Analyzer: Used when indexing and searching text fields.
Breaks down text into tokens and normalizes them (lowercasing, stemming, stop-word removal).
Improves the quality of full-text search.
Client sends data to Elasticsearch via REST API.
The coordinating node receives the request and routes it to the appropriate primary shard.
The shard indexes the document using Lucene (with analyzers and mappings).
The data is asynchronously replicated to replica shards on other nodes.
Search queries are broadcast to all shards (scatter), results are merged (gather) and returned.
When you index (store) a document:
ES assigns it to a primary shard.
The shard indexes the data using Lucene.
Any replica shards get updated asynchronously.
Example (REST API):
PUT /books/_doc/1
{
"title": "Elasticsearch Deep Dive",
"author": "Bob A"
"year": 2025
}
When you search:
The query is broadcast to all shards (scatter phase).
Each shard returns its results (gather phase).
ES merges and ranks the results by relevance score.
Example (search API):
GET /books/_search
{
"query": {
"match": {
"title": "elasticsearch"
}
}
}
Under the hood (Lucene):
Text is analyzed using an analyzer → tokenized, normalized, filtered.
A term index (inverted index) is created mapping terms → document IDs.
Queries use these inverted indexes for super fast lookups.
Query Type | Purpose
---------- | ----------------------------------------------------------
`match` | Full-text search
`term` | Exact value search
`range` | Numeric or date range
`bool` | Combine multiple conditions (`must`, `should`, `must_not`)
`aggs` | Aggregations (e.g., counts, averages, histograms)
Example:
GET /books/_search
{
"query": {
"bool": {
"must": [
{ "match": { "author": "Bob" }},
{ "range": { "year": { "gte": 2020 }}}
]
}
},
"aggs": {
"by_year": { "terms": { "field": "year" }}
}
}
PUT= Create a resource with a specified ID, or replace if exists
The general pattern is: PUT /index_name/_doc/document_id
Example: PUT /books/_doc/1
POST = Create without specifying ID (auto-generated) or trigger an operation
POST /books/_doc/
GET = Retrieve a resource or perform a search
GET /books/_doc/1 or GET /books/_search
DELETE = Remove a resource (document, index, etc.)
DELETE /books/_doc/1
HEAD = Check if a resource exists (returns 200/404)
HEAD /books/_doc/1
Endpoint | Purpose | Example
-----------| ----------------------------------------------- | -----------------------
/_search | Search API | GET /books/_search
/_doc | Document API (create/read/update/delete) | GET /books/_doc/1
/_mapping | View or define index mappings | GET /books/_mapping
/_settings | View or update index settings | GET /books/_settings
/_cat | Human-readable info APIs (for debugging) | GET /_cat/indices?v
/_cluster | Cluster-wide info | GET /_cluster/health
/_nodes | Node-level info | GET /_nodes/stats
/_count | Count matching documents | GET /books/_count
/_bulk | Perform multiple index/update/delete operations | POST /_bulk
/_update | Update part of a document | POST /books/_update/1
User-level resource
Data endpoints: /books/, /books/_doc/
System-level resource:
Cluster management: /_cluster/health, /_cat/indices