Thursday, December 3, 2009

A Quick Taste of Endeca’s Secret Sauce

The semi-structured model helps overcome the drawbacks of traditional rigid overarching relational schemas that are too limited to handle diverse (structured and unstructured) and ever-changing data. On the other hand, OLAP (online analytical processing) cubes might be able to overcome relational databases’ inability to provide near instantaneous analysis and display of large amounts of data, but they are still not able to accommodate ever growing number of new (perhaps esoteric) data attributes or dimensions (e.g., “find all basketball point-guards that played in Europe during high school”).

The MDEX engine handles such requirements with relative ease via an extensible markup language (XML)-like data model of self-describing objects. Endeca ITL (information transformation layer) plays the extract, transform, load (ETL) role of importing data from disparate data sources. These data objects can come from multiple sources, such as structured content (e.g., a content management system [CMS], databases, etc.) and unstructured content (text documents and user-generated content [UGC] such as blogs, wikis, podcasts, multimedia files, etc.).

In addition, the software must be able to access rapidly changing data such as ESPN sports scores, news feeds, or an online store’s new items in catalogs and all products’ availability (stock situation). One of Endeca’s landmark (and trademark) capabilities that MDEX enables is Guided Navigation™ or the ability of the search engine to not only return results, but also the options to further select subsets within these results. The user might not even be aware that these options and relations exist.

Endeca is not based on the “rocket science” of some overly complex optimization algorithms. Neither is the platform trying to invent data based on, say, predictive analytics. According to the “you can’t make cheese out of chalk” adage, if some combination of data attributes is not yet available, that is fine, and Endeca will not try to create results that do not exist just to impress and possibly mislead the user.

Conversely, if some relationships between data and related indexes exist, Endeca will return both the results and further suggestions (while breadcrumb trails are kept updated), and choices will either expand or narrow depending on the path that the user selects in a point-and-click manner. Simple as that, or, in other words: WYSIWYG (what you see is what you get). If you know how to order movies over Netflix or select channels on a JetBlue flight, you are ready to use Endeca.

For instance, NFL aficionados might search the ESPN portal for “Tom Brady” and will get about 6,000 records as possible results. But on the left side, the site will offer search refinements, such as by type (i.e., stories, audio, photo, video), by date (i.e., last 7 days, last 30 days, last 365 days, etc.), by team, by columnist, etc. Each option will show in brackets the number of related records (further potential results that match the current search criteria).

For more informed sports fans (or even fanatics), ESPN administrators might use the page builder tool to create landing pages or topic pages. Namely, instead of the list of possible search results, the user is rather directed to a specially designed page for the query, i.e., the page dedicated to Tom Brady (the future Hall of Fame quarterback) or to the New England Patriots.

Small wonder then that Endeca’s online media customers (i.e., newspapers and magazines, professional knowledge providers, cable and TV, libraries, bookstores and publishers, etc.) rave about real results. I’ve repeatedly heard about the examples of fivefold increase in Web traffic, 20 percent increase in page views (PVs), 15 percent increase in subscription renewals, 15 percent increase in search click-through rates (CTRs), and so on and so forth.

No comments:

Post a Comment