From a business perspective, you should regard an effective search engine as a powerful tool that is able to increase the conversion rate and bring more profit to website owners. If your website search mechanism doesn’t provide relevant results or its searching performance is too low, people will leave the website and go to its competitor.
So, what is an effective search engine?
The primary aim of the search is to retrieve the most relevant matches to the user’s queries, excluding other general content from the website.
Among the features that you can get from modern search engines, the most popular are:
- full-text search (by simple words and phrases or multiple forms of a word or phrase)
- multifield search
- highlighting (a visual indication of the words entered in the search box)
- search by synonyms
- autocomplete suggestions
[Suggestions and highlighting on Bloomberg]
- faceted search (a count of attributes. For example, eCommerce sites use facets to tell customers how many items of a specific model, size, color, and other attributes are found)
[Faceted search on Boohoo]
- fuzzy search (typos, misspellings)
- spelling corrections
- geospatial search (for the object location according to its latitude and longitude)
[Geospatial search on TripAdvisor]
The system should be able to narrow down the search by using ranges (price, dates, sizes, etc.), sorting (by popularity, date, price), and filtering (including only desirable parameters).
When we talk about web apps where information changes dynamically (prices, description details, availability of goods), it is extremely important to have near real-time updates; for example, in eCommerce or booking engines to show goods and services available in stock.
Apart from the general features listed above, engines can provide recommendations when looking up the most interesting products or information, to improve the user experience.
[Recommendations by Amazon]
Which Technology to Choose?
There are about 20 search engines to choose from, but, if you are looking for a reliable and efficient solution for your web application, we would like to recommend one of the following three: Elasticsearch, Solr, or Sphinx – that are at the top for 2020.
All three are open-source search solutions, well-supported by their communities of contributors. They can all boast high performance, scalability, and flexibility, though they all still have their peculiarities.
We will not make comparisons like Sphinx vs Solr, or Solr vs Sphinx, or Sphinx vs Elasticsearch as they all are decent competitors, with almost equal performance, scalability, and features. But each of them has specific peculiarities that can be influential for your project. Now, let’s take a look at which option can be better for your business.
Elasticsearch, the absolute leader from the 2020 search engine ratings, proves its name for being truly “elastic” by being able to work in any environment. It is an open-source technology and is using the Apache Lucene library.
Many world-known companies use Elastic for their applications. Here you can find such names as TripAdvisor, Shopify, Mozilla, Foursquare, Etsy, Github, SoundCloud, eBay, Yelp, and Netflix, among others.
1. Near real-time indexing
Elasticsearch is able to index rapidly changing data almost instantly (in less than 1 sec). It is appropriate to use it in projects where a database is constantly updating.
For example, in Uber, Elasticsearch aggregates business metrics on dynamic (surge) pricing and supply positioning, in real-time. It is able to handle more than 1,000 queries per second at peak time.
2. High scalability
When the database grows, it becomes more difficult to look up. But Elasticsearch scales up while your DB gets bigger, so the search speed does not slow down.
Expedia, one of the biggest hotel and airline ticket aggregators, provides seeking through up to 1TB a day with 300K events per second. With the help of Elasticsearch, they managed to improve their customers’ booking experience.
ES can be used not only as an indexer but also as data storage. Nevertheless, we would not recommend using it as your primary storage, and we still keep data in the main DB for better security and reliability, using ES only to index data and store logs.
For example, Florida.com, one of our clients and an application that aggregates all information about Florida resorts, supports a huge database of hotels, restaurants, events, attractions, sports, deals, etc. With Elasticsearch, the data stored in our DB is quickly indexed and becomes searchable by users instantly.
4. Visualization of data
This is one of the trendy features today that is perfectly implemented in ES. Elastic Stack (the combination of ES, Logstash, and Kibana plugins) makes a great tool for analytics. It allows for real-time monitoring of traffic on your application (total number of visitors, number of unique visitors, IP addresses, most popular queries, most requested pages, device and browsers used, traffic logs by the time of day, and much more).
This information is visualized in colorful charts, maps, and tables in the dashboard. It is very helpful for work with distributed teams, as everyone can see up-to-date information at once and then use this data to get a better understanding of your audience and improve the content and UX of your product.
With the help of ES, The Guardian got a powerful analytics system that is able to process 40 million documents per day to create a vision of how content is consumed.
At Netflix, with 8 million events and 24GB per second during peak hours, ES is used for real-time analytics of events like video viewing activities, UI activities, error logs, performance, diagnostic events, etc.
5. Security analytics
Elastic Stack is also a great security analytics tool. The near real-time log analytics and visualization allow you to identify security threats (problems with a web server, broken links, attempts for unauthorized access, attack locations, etc.). You can learn more from this official Elastic.co video.
By migrating to ES, Dell increased their security by ensuring only authorized people could access its cluster. Dell also reduced the number of its servers by 25-30%.
6. Machine Learning
Elasticsearch can benefit from Machine Learning features provided by the X-Pack commercial plugin. Machine Learning algorithms are focused on anomaly detection and outlier detection in time series data.
7. Amazon Elasticsearch Service
Amazon Elasticsearch Service allows for quick and easy setup and operates and scales Elasticsearch in the cloud without the necessity to configure your own servers.
Though Elastic is currently #1, it is still a young technology. Not all desired features come out of the box, and many should be added through various extensions. For example, ES does not have the “Did You Mean?” feature.
Solr is another search engine based on Apache Lucene and, thus, it has many common features with Elasticsearch. But, still, they are different in architecture.
Among the companies that use Solr are Cnet, CitySearch, Bloomberg, Magento, Zappos, AOL, eTrade, Disney, Apple, NASA, MTV, and others.
1. Faceted search
Solr has awesome faceted search capabilities, which makes this solution perfect for eCommerce websites like Zappos that uses Solr for search and navigation across 150,000 styles of shoes and other products.
[Image by Zappos]
2. a Rich set of features
Solr can boast rich full-text search features out of the box that is highly configurable (even more than Elasticsearch). Solr supports various suggester implementations, highlighting functionality (a visual indication of the words entered in the field) and spell checkers / “Did you mean?” (which are absent in ES).
At Greenice, we dealt with Solr while working on a project for an Australian client. Their website is meant for the exchange of experiences among small business entrepreneurs. The search features include highlighting, suggestions, and sorting.
[Image by SavvySME]
3. Rich content docs
Solr is one of the few search engines that can read rich content documents, including PDF, Word, XML, or plain text.
This will perfectly fit with projects where there is a necessity to look through a large amount of PDF or Word files within a website (including contracts, resumes, learning materials, ebooks, etc.).
4. Data visualization
Banana is a visualization tool (a fork of Kibana) that works for Solr and allows admins to monitor events and log in the dashboard on the fly.
For example, in banking, managers will be able to retrieve information about failed transactions and find out the reason for each issue almost “on the fly”, immensely reducing manual work. This can reduce the manual search on logs, as well.
5. Machine Learning
Solr, in cooperation with Bloomberg, implemented Machine Learning (Learning-to-Rank plug-in) using the concept of re-ranking of documents according to the score from a more complex query. Machine Learning is aimed at providing subscribers with even better experiences on the instant search for the most relevant companies, people, and news.
Solr is not as quick as Elasticsearch and works best for static data (that does not require frequent changing). The reason is due to caches. In Solr, the caches are global, which means that, when even the slightest change happens in the cache, all indexing demands a refresh. This is usually a time-consuming process. In Elastic, on the other hand, the refreshing is made by segments.
Sphinx, is ranked only 5th among the search engines in 2018, though it is still a powerful and popular technology, having given way to Elasticsearch and Solr in terms of ranking.
Sphinx is used in such famous systems as Joomla.org, CouchSurfing.org, Wikimapia.org, Tumblr.com, and hundreds of other apps.
1. Powerful and fast
Sphinx has evolved over recent years and has become able to provide a near real-time search. Its speed includes over 500 queries/sec against 1,000,000 documents, with the biggest registered number of indexing estimated at 25+ billion documents.
Craigslist, with the help of Sphinx, serves over 300 million queries per day. It has more than 50 billion page views per month.
Infegy uses Sphinx to index 22+ billion Twitter, Facebook, and assorted blog posts to serve insightful social media monitoring and analytic queries.
2. Faceted search
Sphinx has vast experience with faceted search capabilities.
Youku Tudou, China’s biggest video site, uses Sphinx for the faceted search for content delivered to over 400 million users per month, with peak volumes of 15,000 queries per second.
At Greenice, we recently used Sphinx for an eCommerce computer hardware store. We implemented the faceted search on attributes like brand, type, purpose, screen resolution, matrix, diagonal, HDD capacity, SSD capacity, etc.
3. Nothing useless
If you need general search functions and do not need any additional features like data visualization and analysis, use Sphinx. It is quite fast and powerful for indexing and querying huge volumes of documents using limited computing resources, unlike Elasticsearch which consumes a lot of memory.
One of the examples is Boardreader, where Sphinx indexes up to 16 billion documents across 37 machines.
Sphinx is good for structured data (predefined text fields and non-text attributes), but it is not the best choice for projects that deal with unstructured data (DOCs, PDFs, MP3s, etc.), as it takes developers a lot of time and effort to configure. This, together with other difficulties on configuration, makes Sphinx less comfortable to use than its competitors.
Open source search comparison
Here is a brief comparison of Elasticsearch vs. Solr vs. Sphinx:
|Types of Search Features||
2. autocomplete suggestions
9. spell checker
|Data Scheme||Schema-free∗||Yes, but dynamic∗||Yes∗|
|Can be storage||Yes||Yes||No|
|Visualization of Data||Allowed by the Elastic Stack (ES, Kibana, and Logstash)||Allowed by Banana plugin||No|
How to apply this to your business
When you notice that it takes a while to retrieve the results for your query on your website, it may negatively impact the user experience.
By equipping your database with a powerful search engine, the performance of your application will dramatically increase.
Contemporary search engines provide sophisticated features like suggestions, full-text, faceted, fuzzy search, etc. for even more accurate and relevant results.
As you can see, the differences between Elasticsearch, Solr, and Sphinx are minimal. They all fulfill their main purpose – providing an effective and fast search.
Testing many search engines during our work on different projects, we now mostly use Elasticsearch, as it has proven it has the best qualities for most projects. It is fast, flexible, and easy to work with, and provides not only speedy and relevant search capabilities but can be a means of storage, by itself. It is comfortable for searching data in logs to quickly identify problems with applications and provides effective visualization of everything that is going on in your web application in real-time.
If you already have a project on Solr or Sphinx, it may make no sense to transfer it to Elasticsearch. Regardless, it is better to rely on your developers who have the best experience and feel more comfortable with one of the search engines. As all projects are individual, we carefully analyze each request to come up with the most appropriate solution for your task.
Co-author: Max Lapko, Team Leader at Greenice. Max supervises the most complex projects at our agency, ensuring top quality work and optimal productivity of his team. He is also a go-to person when anyone in the team faces a challenging technical problem.
If you have any questions on search engines, drop us a message for a FREE consultation!