From a business perspective, you should regard an effective search engine as a powerful tool that is able to increase the conversion rate and bring more profit to website owners. If your website search mechanism doesn’t provide relevant results or its searching performance is too low, people will leave the website and go to its competitor.

So, what is an effective search engine?  Let’s find it out by comparing the most popular technologies: Solr vs Elasticsearch vs Sphinx!

The primary aim of the search is to retrieve the most relevant matches to the user’s queries, excluding other general content from the website.

Among the features that you can get from modern search engines, the most popular are:

  • full-text search (by simple words and phrases or multiple forms of a word or phrase)
  • multifield search
  • highlighting (a visual indication of the words entered in the search box)
  • search by synonyms
  • autocomplete suggestions
Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 1
Suggestions and highlighting on Bloomberg
  • faceted search (a count of attributes. For example, eCommerce sites use facets to tell customers how many items of a specific model, size, color, and other attributes are found)
Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 2
Faceted search on Boohoo
  • fuzzy search (typos, misspellings)
  • spelling corrections
  • geospatial search (for the object location according to its latitude and longitude)
Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 3
Geospatial search on TripAdvisor

The system should be able to narrow down the search by using ranges (price, dates, sizes, etc.), sorting (by popularity, date, price), and filtering (including only desirable parameters).

When we talk about web apps where information changes dynamically (prices, description details, availability of goods), it is extremely important to have near real-time updates; for example, in eCommerce or booking engines to show goods and services available in stock.

Apart from the general features listed above, engines can provide recommendations when looking up the most interesting products or information, to improve the user experience.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 4
Recommendations by Amazon

Elasticsearch vs Solr vs Sphinx: Which Technology to Choose?

There are about 20 search engines to choose from, but, if you are looking for a reliable and efficient solution for your web application, we would like to recommend one of the following three: Elasticsearch, Solr, or Sphinx - that were at the top for 2020, and still are.

All three are open-source search solutions, well-supported by their communities of contributors. They can all boast high performance, scalability, and flexibility, though they all still have their peculiarities.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 5
DB-engines ranking

We will not make comparisons like Elasticsearch vs Solr, Sphinx vs Solr, or Solr vs Sphinx, or Sphinx vs Elasticsearch as they all are decent competitors, with almost equal performance, scalability, and features. But each of them has specific peculiarities that can be influential for your project. Now, let’s take a look at which option can be better for your business.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 6

Elasticsearch, the absolute leader from the 2020 search engine ratings, proves its name for being truly “elastic” by being able to work in any environment. It is an open-source technology and is using the Apache Lucene library.

Many world-known companies use Elastic for their applications. Here you can find such names as TripAdvisor, Shopify, Mozilla, Foursquare, Etsy, Github, SoundCloud, eBay, Yelp, and Netflix, among others.

Elasticsearch’s strengths

1. Near real-time indexing

Elasticsearch is able to index rapidly changing data almost instantly (in less than 1 sec). It is appropriate to use it in projects where a database is constantly updating.

For example, in Uber, Elasticsearch aggregates business metrics on dynamic (surge) pricing and supply positioning, in real-time. It is able to handle more than 1,000 queries per second at peak time.

2. High scalability

When the database grows, it becomes more difficult to look up. But Elasticsearch scales up while your DB gets bigger, so the search speed does not slow down.

Expedia, one of the biggest hotel and airline ticket aggregators, provides seeking through up to 1TB a day with 300K events per second. With the help of Elasticsearch, they managed to improve their customers’ booking experience.

3. Storage

ES can be used not only as an indexer but also as data storage. Nevertheless, we would not recommend using it as your primary storage, and we still keep data in the main DB for better security and reliability, using ES only to index data and store logs.

For example, Florida.com, one of our clients and an application that aggregates all information about Florida resorts, supports a huge database of hotels, restaurants, events, attractions, sports, deals, etc. With Elasticsearch, the data stored in our DB is quickly indexed and becomes searchable by users instantly.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 7

4. Visualization of data

This is one of the trendy features today that is perfectly implemented in ES. Elastic Stack (the combination of ES, Logstash, and Kibana plugins) makes a great tool for analytics. It allows for real-time monitoring of traffic on your application (total number of visitors, number of unique visitors, IP addresses, most popular queries, most requested pages, device and browsers used, traffic logs by the time of day, and much more).

This information is visualized in colorful charts, maps, and tables in the dashboard. It is very helpful for working with distributed teams, as everyone can see up-to-date information at once and then use this data to get a better understanding of your audience and improve the content and UX of your product.

With the help of ES, The Guardian got a powerful analytics system that is able to process 40 million documents per day to create a vision of how content is consumed.

At Netflix, with 8 million events and 24GB per second during peak hours, ES is used for real-time analytics of events like video viewing activities, UI activities, error logs, performance, diagnostic events, etc.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 8

5. Security analytics

Elastic Stack is also a great security analytics tool. The near real-time log analytics and visualization allow you to identify security threats (problems with a web server, broken links, attempts for unauthorized access, attack locations, etc.). You can learn more from this official Elastic.co video. By migrating to ES, Dell increased their security by ensuring only authorized people could access its cluster. Dell also reduced the number of its servers by 25-30%.

6. Machine Learning

Elasticsearch can benefit from Machine Learning features provided by the X-Pack commercial plugin. Machine Learning algorithms are focused on anomaly detection and outlier detection in time series data.

7. Amazon Elasticsearch Service

Amazon Elasticsearch Service allows for quick and easy setup and operates and scales Elasticsearch in the cloud without the necessity to configure your own servers.

Elasticsearch’s weaknesses

Though Elastic is currently #1, it is still a young technology. Not all desired features come out of the box, and many should be added through various extensions. For example, ES does not have the “Did You Mean?” feature.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 9

Solr is another search engine based on Apache Lucene and, thus, it has many common features with Elasticsearch. But, still, they are different in architecture.

Among the companies that use Solr are Cnet, CitySearch, Bloomberg, Magento, Zappos, AOL, eTrade, Disney, Apple, NASA, MTV, and others.

Solr’s strengths

1. Faceted search

Solr has awesome faceted search capabilities, which makes this solution perfect for eCommerce websites like Zappos that use Solr for search and navigation across 150,000 styles of shoes and other products.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 10
Image by Zappos

2. a Rich set of features

Solr can boast rich full-text search features out of the box that is highly configurable (even more than Elasticsearch). Solr supports various suggester implementations, highlighting functionality (a visual indication of the words entered in the field) and spell checkers / “Did you mean?” (which are absent in ES).

At Greenice, we dealt with Solr while working on a project for an Australian client. Their website is meant for the exchange of experiences among small business entrepreneurs. The search features include highlighting, suggestions, and sorting.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 11
[Image by SavvySME]

3. Rich content docs

Solr is one of the few search engines that can read rich content documents, including PDF, Word, XML, or plain text.

This will perfectly fit with projects where there is a necessity to look through a large amount of PDF or Word files within a website (including contracts, resumes, learning materials, ebooks, etc.).

4. Data visualization

Banana is a visualization tool (a fork of Kibana) that works for Solr and allows admins to monitor events and log in to the dashboard on the fly.

For example, in banking, managers will be able to retrieve information about failed transactions and find out the reason for each issue almost “on the fly”, immensely reducing manual work. This can reduce the manual search on logs, as well. Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 12

5. Machine Learning

Solr, in cooperation with Bloomberg, implemented Machine Learning (Learning-to-Rank plug-in) using the concept of re-ranking of documents according to the score from a more complex query. Machine Learning is aimed at providing subscribers with even better experiences on the instant search for the most relevant companies, people, and news.

Solr’s weaknesses

Solr is not as quick as Elasticsearch and works best for static data (that does not require frequent changing). The reason is due to caches. In Solr, the caches are global, which means that, when even the slightest change happens in the cache, all indexing demands a refresh. This is usually a time-consuming process. In Elastic, on the other hand, the refreshing is made by segments.

Solr vs. Elasticsearch vs. Sphinx: Best Open-Source Search Platform Comparison - Image 13

Sphinx, was ranked only 5th among the search engines in 2018, though it is still a powerful and popular technology, having given way to Elasticsearch and Solr in terms of ranking.

Sphinx is used in such famous systems as Joomla.org, CouchSurfing.org, Wikimapia.org, Tumblr.com, and hundreds of other apps.

Sphinx’s strengths

1. Powerful and fast

Sphinx has evolved over recent years and has become able to provide a near real-time search. Its speed includes over 500 queries/sec against 1,000,000 documents, with the biggest registered number of indexing estimated at 25+ billion documents.

Craigslist, with the help of Sphinx, serves over 300 million queries per day. It has more than 50 billion page views per month. Infegy uses Sphinx to index 22+ billion Twitter, Facebook, and assorted blog posts to serve insightful social media monitoring and analytic queries.  

2. Faceted search

Sphinx has vast experience with faceted search capabilities.

Youku Tudou, China's biggest video site, uses Sphinx for the faceted search for content delivered to over 400 million users per month, with peak volumes of 15,000 queries per second.

At Greenice, we used Sphinx for an eCommerce computer hardware store. We implemented the faceted search on attributes like brand, type, purpose, screen resolution, matrix, diagonal, HDD capacity, SSD capacity, etc.

3. Nothing useless

If you need general search functions and do not need any additional features like data visualization and analysis, use Sphinx. It is quite fast and powerful for indexing and querying huge volumes of documents using limited computing resources, unlike Elasticsearch which consumes a lot of memory.

One of the examples is Boardreader, where Sphinx indexes up to 16 billion documents across 37 machines.

Sphinx’s weaknesses

Sphinx is good for structured data (predefined text fields and non-text attributes), but it is not the best choice for projects that deal with unstructured data (DOCs, PDFs, MP3s, etc.), as it takes developers a lot of time and effort to configure. This, together with other difficulties on configuration, makes Sphinx less comfortable to use than its competitors.

Open source search comparison

Here is a brief comparison of Elasticsearch vs. Solr vs. Sphinx:

Elasticsearch

Solr Sphinx
Types of Search Features

1. full-text

2. autocomplete suggestions

3. faceted

4. multifield

5. synonyms

6. fuzzy

7. geospatial

1. full-text

2. autocomplete suggestions

3. faceted

4. multifield

5. synonyms

6. fuzzy

7. highlighting

8. geospatial

9. spell checker

1. full-text

2. autocomplete suggestions

3. faceted

4. multifield

5. synonyms (called wordforms)

6. geospatial

7. highlighting (called snippets)

8. spell checker (called qsuggest)

Real-Time Indexing Yes Yes Yes
Performance High High High
Scalability High High High
Data Scheme Schema-free (Documents can be indexed without explicitly providing a schema) Dynamic (To confirm appropriate indexing and type semantics, defining a schema is recommended) Fixed schema (A set of predefined attribute columns)
Can be storage Yes Yes No
Visualization of Data Allowed by the Elastic Stack (ES, Kibana, and Logstash) Allowed by Banana plugin No
Machine Learning Yes Yes No

How to apply this to your business

When you notice that it takes a while to retrieve the results for your query on your website, it may negatively impact the user experience. By equipping your database with a powerful search engine, the performance of your application will dramatically increase.

Contemporary search engines provide sophisticated features like suggestions, full-text, faceted, fuzzy search, etc. for even more accurate and relevant results.

As you can see, the differences between Elasticsearch, Solr, and Sphinx are minimal. They all fulfill their main purpose - providing an effective and fast search. Testing many search engines during our work on different projects, we now mostly use Elasticsearch, as it has proven it has the best qualities for most projects. It is fast, flexible, and easy to work with, and provides not only speedy and relevant search capabilities but can be a means of storage, by itself. It is comfortable for searching data in logs to quickly identify problems with applications and provides effective visualization of everything that is going on in your web application in real-time.

If you already have a project on Solr or Sphinx, it may make no sense to transfer it to Elasticsearch. Regardless, it is better to rely on your developers who have the best experience and feel more comfortable with one of the search engines. As all projects are individual, we carefully analyze each request to come up with the most appropriate solution for your task.


If you have any questions on search engines, drop us a message for a FREE consultation!

Contact Us




Authors

Anna Klimenko

Anna Klimenko is a market researcher and author at Greenice with a deep technical writing background. She scrupulously investigates narrow business niches and creates insightful articles for entrepreneurs who are trying to start online businesses.

Read More
Max Lapko

Max is a CTO at Greenice. He supervises the most complex projects at our agency, ensuring top quality work and optimal productivity of his team. He is also a go-to person when anyone in the team faces a challenging technical problem.

Read More

Rate this article!

You should be logged in to be able to rate articles

5

rating 1 rating 1 rating 1 rating 1 rating 1

Comments (0)

Login to live the comment