What is Apache Solr

John Thuma
5 min readAug 8, 2018

LITTLE HISTORY: Developed by Yonik Seeley in 2004, Solr was an in-house project at CNET Networks to provide search capabilities for its company website. CNET Networks then donated it to the Apache Software Foundation in 2006. In 2009 Yonik Seeley joined Lucidworks which provided commercial support, training, and consulting services. The Apache Lucene and Solr projects merged in 2010 and Solr became a Lucene sub project. It has since then gone through many iterations but is but one component in a very competitive marketplace. Alternatives include commercial products and other open source projects such as ElasticSearch, Oracle Endeca Information Discovery, Swiftype, Google Search Appliance, Coveo, and IDOL. The Solr in Apache Solr is pronounced “Solar.”

WHO:

According to Lucidworks the following organizations are using Apache Solr: AOL, Apple, Cisco, Cheaptickets, Buy.com, xfinity, Disney, FCC, Ebay, ETrade, and eHarmony, just to name a few. Lucidworks is based in San Francisco, California and provides commercial support for enterprise search, an application development platform, consulting, and training services. They employ under 100 people and were founded in 2007 by Marc Krellenstein, Grant Ingersoll, Erik Hatcher, and Yonik Seeley.

Job descriptions for people interested in Apache Solr are generally in Java Solr programing and administration. Indeed.com currently has under 2,000 job postings for Solr professionals.

BOTTOM LINE: Apache Solr has been around for more than 10 years now and is a mature technology offering. Yonik Seeley first developed the technology in 2004 and it has grown in popularity with many high profile clients using it today. There are several thousand jobs available to support development and administrative functions. Lucidworks provides commercial support for Apache Solr.

WHAT:

Apache Solr provides a scalable enterprise wide search capability for a diverse set of data types including: NoSQL, rich document (PDF/Binary/MS-Word), relational database, and more. Features include faceted search, hit highlighting, full-text search, and real-time indexing. It was designed for scale and fault tolerance.

Solr is a sub project under Apache Lucene and is written in Java. It provides rich set of programming interfaces that allow it to be leveraged by software development for custom applications. Arcadia Data leverages Apache Solr to perform analytics and dashboarding. Take a look at the following video to learn more about Arcadia Data, Solr, and Kudu.

BOTTOM LINE: Enterprise search enables an organization to index and search content from a variety of decentralized and a myriad of data sources and types. Apache Solr provides an open source enterprise search tool and a rich programming interface that enables organizations to customize solutions with these capabilities.

WHERE:

Apache Solr can be installed on premises, on a laptop, or in the cloud. There is a really cool 5 minute tutorial on how to install, start, perform an index and search, and finally shutting down here: http://www.solrtutorial.com/solr-in-5-minutes.html

BOTTOM LINE: It is very easy to setup and install a simple Apache Solr instance. Like most things, as it gets bigger, meaning lots of data and infrastructure it gets harder and more expensive. The great thing is that it is easy to establish an environment in the cloud and make mistakes and start over if necessary.

WHEN:

Relational databases are great at performing transaction based and some analytical use cases. They are particularly excellent for data that fits nicely into rows and columns. They are not very good with data that is unstructured like: documents, PDF, MS-Word/MS-Office, relational data, and HTML. Apache Solr is excellent for breaking down these data sources into an index and then allowing for a natural search. Apache Solr is not the best tool for storing transactions that have to be reliable and trustworthy such as a bank account or point of sale system.

When do you need to leverage an enterprise search solution like Apache Solr? Many years ago I worked with a mid-sized telecommunications company that grew through mergers and acquisitions. In roughly 12 months this organization went from 1 organization to about 8 different organizations. The one thing they wanted was a consolidated view of all the data about customers and products. They wanted a solution to this problem in 3 months. (OH {enter favorite curse word}!!!) I was presented with this challenge. First thing I did was take an inventory of major systems and network connectivity. Then I looked at Apache Solr to help consolidate and index the tons of data stored in relational databases, documents, file servers, CRM, and support ticket systems. Then I built a search interface around the index and in 6 months I was done. I was sort of bummed that it took me 3 months longer than the expected goals. The new CIO wasn’t upset at all and told me he never expected me to finish.

BOTTOM LINE: Data and organizations are complex, DUH! Think of the years of data evolution via systems, documents, email, CRM, web, and ERP. How do you build a card catalog that indexes and allows you to find what you need? You look for an enterprise search tool like Apache Solr!

WHY:

Organizations that can manage their data about their customers will grow and thrive. We are living in the society of ‘Right Now!’ Today, people are carrying a device with them that is more powerful than most PC’s 5 years ago. They are smart, and they want the organizations from whom they buy things from to be smart too. Knowing who your customers are is critical. How can you possibly know who your customers are if you don’t have a handle on their data.

People also use multiple channels to communicate with your business and create huge amounts of data. Being able to harness this data should be a priority to not only capture the voice of the customer but to exploit the voice of the customer. Apache Solr enables this capability though multi-channel consolidation via indexing and search.

BOTTOM LINE: Every enterprise wants to consolidate systems to minimize technical debt and surface area. Doing this enables business agility. System consolidation projects often fail or take years to accomplish. Enterprise tools like Apache Solr help you take a shortcut.

Watch this killer demo on Connected Vehicles:

http://watch.arcadiadata.com/watch/Ra1HYFHwKumtRfQ6LuVZp9?

HOW:

Go find a partner that is great at enterprise search and has the experience necessary to get you started. One such partner is Lucidworks. They will help guide you through the pitfalls and challenges. They can also help you with roadmaps and other capabilities around this technology for your organization.

BOTTOM LINE: You can go it alone but you probably shouldn’t. If Apache Solr is implemented properly your organization can reap the rewards. It will require some decent technical talent and expertise so finding a good partner will benefit greatly.

Check out Arcadia Data which has an Apache Solr connector. You can download a free version of our tool on which you can explore our visualization capabilities: Arcadia Instant. If you are curious to learn more, ping me and I will show you around!

--

--

John Thuma

Experienced Data and Analytics guru. 30 years of hands-on keyboard experience. Love hiking, writing, reading, and constant learning. All content is my opinion.