Sign in
Log inSign up
How Elasticsearch Architecture Boost Search Performance

How Elasticsearch Architecture Boost Search Performance

Eugene Paitoo's photo
Eugene Paitoo
·Mar 8, 2022·

4 min read

In this article, we will look at ElasticSearch basic architecture and how it helps to boost search performance, offering it speed and relevance.

What is ElasticSearch and its use cases?

ElasticSearch is part of the ElasticStack, mainly referred to as ELK Stack (ElasticSearch, LogStash, and Kibana). It is the “heart” of the ElasticStack, which helps to store, search and analyze data.

According to Elastic Website, Elasticsearch Is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.

It provides a fast and relevant search experience in your app regardless of the number of data. It has some use cases, from Application and website search to logging and logs analytics and other use cases. There are a lot of companies using ElasticSearch notably, Uber, Shopify, Slack, and a lot more.

See this Link for more companies using ElasticSearch.

How is a Search done in Elasticsearch?

When a client sends a search request, the request is sent to the server. The server also sends the search request to ElasticSearch. Elasticsearch processes the data and sends the results back to the server. The server then also sends the search results to the client. Elasticsearch can process the data to give you search relevance as part of your results.

ElasticSearch Basic Architecture

In looking at the basic architecture of Elasticsearch, we will be introducing some terms and concepts and will explain these terms as we go along to give us a big picture.

Nodes

Whenever we start an instance of Elasticsearch, a node also starts. This node runs in a cluster. We can have multiple nodes running in a single cluster.

A cluster is simply a collection of nodes. In Elasticsearch, each node has a unique ID and name, and it also belongs to a single Cluster.

The nodes run and are distributed across separate machines running in the cluster. Nodes hold the data and are stored in Elasticsearch as documents.

NODES.png

Documents

A document is simply a JSON object stored under a unique Id. Example:

{
  "_index": "pets",
  "_type": "_doc",
  "_id": "1",
  "_version": 1, "_seq_no": 1, "_primary_term": 1,
  "found": true,
  "_source": {
      "name": "Fifi",
      "owner": "Eugene",
  }
}

Documents are grouped into an index. An Index in Elasticsearch are documents that share similar traits. For example documents of holiday destinations will be grouped under “holiday destination index”. Why are documents grouped into indexes? This is done to help us know where exactly to spot or find the data we want.

INDEXES.png

Shard

A shard in Elasticsearch is where the actual data is stored and retrieved when you run a search. That means a shard in a node holds the data and is on a disk.

In Elasticsearch, indexes can have multiple shards distributed across nodes in a cluster. We should note that the number of documents a shard can hold depends on its node capacity. Sharding helps to scale horizontally as data in our apps grows, reducing the time it takes to store and retrieve search results.

Search is processed in parallel when we have data distributed across the shards in each node. It makes the search process very fast. For instance, a single node in a cluster can hold 500k documents. It takes 10 seconds to get search results. If ten (10) nodes are in a cluster, and each node holds 50k documents, it will take only 1 second to get search results as the search process is done parallel across the nodes in the cluster.

Replica Shards

Actual data are on a disk in the Shard (Primary Shard), but what if a primary shard goes down? Well, in Elasticsearch, there exist replica shards.

Replica shards are simply copies of Primary Shard, and they are stored across the node, meaning a Node with id N-1 can store data in a Primary Shard P0, and node N-2 can also store the Replica Shard R0, which hold copies of data in Primary Shard P0 in case it goes down.

In ElasticSearch, search is not only performed on Primary Shards but is also performed on Replica Shards.

It helps to improve the performance of your search and provides a high availability if a primary shard P0 goes down.

FULL-DIAGRAM.png

Conclusion

Elasticsearch does not only provides faster search but, also analyzes data to help in search relevance. Hopefully, this gives you an idea of how it improves the performance of your search.

More Content to come and you can also follow me on LinkedIn and Github.

Happy Coding!!!