Using Pivotal GemFire/Apache Geode with Lucene Indexing for Application UI Typeahead

July 17, 2017 Kyle Dunn

Pivotal GemFire 9.1 is Here!

We’re pleased to inform you that GemFire 9.1 has been released.  Pivotal GemFire 9.1 is an in-memory data grid based on the recently released Apache Geode V1.2. More information on GemFire can be found on the GemFire page on Pivotal.io. The release is now available for download from Pivotal Network .

This release features the tight integration of Lucene with partitioned data in GemFire. The Lucene indexes are stored alongside the corresponding GemFire data partitions. Just like the data, the indexes are horizontally scalable - addition of servers (data partitions) is supported automatically. 

The use cases for this include the need to search through JSON documents. Also, this feature enables lookups by partial name, social security number, or other attributes. A specific use of the partial lookup use case is for type ahead searches that progressively narrow the search results while the user is entering the search term. In this blog post, we will do a deep dive into this type ahead use case.

Motivation

Whether you're parsing through Google Maps results for dinner or fantasizing the next vacation with AirBnB, the user-centric autocomplete (aka typeahead) is nearly as important as the search ability itself. These examples are just two of the familiar faces employing this feature; modern, enterprise applications are likely to benefit from this capability as well.

Improving business productivity has always been the promise of IT, either with fully baked vendor offerings or home-grown software applications and automation infrastructure. While typeahead is a very small contributor to effective IT, having a boilerplate example proven out makes for an easier sell when trying to coerce your Product Manager to accepting such a nicety into the backlog.

Approach

In alignment with the open source philosophy of sharing more (rather than less) code, many of the tech unicorns (Pivotal included) publish code in the skunkworks-gone-production genre. For typeahead, Twitter created an exceptional Javascript library, aptly-named typeahead.js. While the details of using the library are beyond the scope of this writing, simply put, it requires three things for basic functionality:

  • Data to search against

  • A mechanism to perform the search/match

  • Some HTML and Javascript to invoke the search and display the results

For toy examples or static datasets, a purely Javascript and HTML implementation is enough. In practical applications, a database and fuzzy search capabilities are necessary; for SQL backends, the predicate pattern: LIKE %mySearchString%, against an indexed column, can easily accomplish this. From a "boxes and lines" slidedeck (aka marketecture) perspective, this will work. As pragmatic technologists, we know it's never quite this simple.

Gotchas

The nuances of implementing "query by LIKE" as the database grows, either in ingest rate or total volume, quickly become apparent. Our foremost concern: Which users will be using this and how many of them are there? An executive-only dashboard and a web-scale application have drastically different levels of forgiveness to deficiencies. Secondarily, how many records are in the dataset we're querying? How frequently does this dataset change? How fast does it grow in volume? Coincidentally, these are common questions our Data Engineering team asks customers when embarking on data architecture questions. It's clear there is no one-size fits all way to provide a persistence layer for applications.

GemFire ♥ Lucene

The persistence layer chosen in this case is the in-memory data grid: Pivotal GemFire. This architecture decision affords the safety to leave many of the aforementioned concerns to the tool's featureset, instead of the implementation details. The requirement for a substring match capability, opens the opportunity to highlight a new feature in GemFire: Lucene indexing.

Apache Lucene is a popular, Java-based indexing and search tool. In GemFire 9.1 and Geode 1.2, this indexing and query capability has been integrated into GemFire/Geode regions (equivalent to a database table).  While the applicability for typeahead would be a more nuanced discussion, it's worth mentioning Pivotal's Greenplum MPP database offers a similar indexing capability using the GPText add-on but GPText is targeted for text analytics not high concurrency lookups.

Result

So after a long haul of context and whiteboarding, we've arrived at a decision for the minimal viable product (MVP in agile/startup lingo):

  • The data store is a Lucene-indexed GemFire region

  • Search and match mechanism is done with the GemFire-Lucene query API

  • The frontend UI component is implemented with the Handlebars Javascript templating library

These design decisions manifest themselves in different parts of the code base:

Data store setup in the application configuration:

  @Bean
    public Region<Integer, Property> propertyRegion(Cache cache,
LuceneService luceneService) {
        // Create Index on fields with default analyzer:
        luceneService.createIndexFactory()
.setFields("Address").create("propIndex", "/Property");

        Region region = cache.createRegionFactory(RegionShortcut.PARTITION)
.create("Property");

        return region;
    }

Lucene Search/Query in the repository implementation:

public Iterable<Property> getAFewByAddress(String address, Integer limit) 
throws Exception {
        LuceneQuery<Integer, Property> query = 
luceneService.createLuceneQueryFactory()
                .setLimit(limit)
                .create("propIndex", "/Property", address, "Address");

        return query.findValues();
    }

typeahead.js Javascript: 
 
        $('#remote .typeahead').typeahead(
        {
            minLength: 3,
            highlight: true
        }, 
        {
            source: addressMatches, /* this function performs the REST call to the controller  */
            display: displayField,
            items: 8,
            templates: {
                empty: [
                    '<div class="empty-message">',
                    'unable to find any addresses that match the current query',
                    '</div>'
                ].join('\n'),
                suggestion: Handlebars.compile("<div><strong>{{address}}</strong> – elevation: {{elevationFeet}}'</div>")
             }
        }) 

When you put a bow on all of this, what is left is this kind of app cool-factor: 

All the code is available on Github here: https://github.com/kdunn-pivotal/gemfire-typeahead-spike/tree/lucene

Happy hacking!

About the Author

Kyle Dunn

Kyle Dunn works professionally as a Data Engineer (aka data "nerd") for Pivotal Software, based in Denver, Colorado, USA. His professional background spans electric utility engineering, distributed systems/HPC, and more recently, RDBMS and data-driven workflows for enterprises. He earned a Bachelors of Science in Electrical Engineering from the University of Colorado, Denver, with academic publications ranging from heterogeneous cloud computing to direct current bus protection schemes. Find him on Twitter (@kdunn926) and LinkedIn (https://www.linkedin.com/in/dunnkyle).

Follow on Twitter Follow on Linkedin
Previous
Building Cloud Foundry On-Demand Services Just Got a Lot Easier
Building Cloud Foundry On-Demand Services Just Got a Lot Easier

The On-Demand Services SDK makes it significantly simpler and faster for ISVs to build easy-to-consume, ful...

Next
How We Harden a Cloud Foundry Stemcell (So You Don’t Have to)
How We Harden a Cloud Foundry Stemcell (So You Don’t Have to)

Stemcells help you embrace immutable infrastructure while improving your security posture. Here's how stemc...