CQE was designed by Cataphora engineers precisely to address the exacting search requirements of our clients. It represents a major improvement over most search engines in use today, which lack important capabilities in expressing queries and are optimized to return a handful of documents most relevant to a few keywords. CQE's zero-time re-indexing architecture and extensible query architecture helps significantly with the timeliness of execution for very complex searches, and can be used for accurate analysis in real-time data streaming environments.
Application examples
Here are some examples encountered by customers who, struggling with inadequate search engines, were helped by CQE.
Real time support
- Without CQE: Most search engines are optimized to support rapid simple keyword searches through the use of a keyword index. By spending the computing time up front, users get a quick response, as long as the query is in the index. In a real-time environment, where gigabytes of data need to be searched as they pass through the network, this indexing time can cause unacceptable delays, with compliance officers seeing search results that are retroactive, and less actionable.
- With CQE: CQE uses a zero-time re-indexing architecture that allows complex queries to be run over large volumes of streaming data in near real time. Compliance and governance users get rapid results over the data stream in time to take action.
Structural scoping
- Without CQE: When searching for documents containing the word "marketing" a customer was seeing all the emails sent from or received by the marketing department, because almost everyone in that organization had the title "marketing" in their email signature. After many trials, the client decided to gather all possible marketing titles in the company, and to exclude them from the search. He later realized that this was also excluding emails from people who were not in the marketing organization but who happened to mention a marketing person by title in their email.
- With CQE: Because of its ability to scope its functions, this problem does not occur with CQE. For example, CQE can be configured to ignore logical sections of a document, such as boilerplate disclaimers or signature blocks at the ends of email messages.
Case sensitivity
- Without CQE: A client could not set up his search to differentiate between "visa" (as in immigration) and "Visa" (as in credit card). Not all search engines are case sensitive. And even for those that are case sensitive, this is not configurable: they are always case sensitive for all searches.
- With CQE: CQE has no problem with case sensitivity. Moreover, case sensitivity in CQE is fully scoped - that is to say, it can be applied to selected parts of the text, or certain queries, and not to others.
Logically complex queries
- Without CQE: When running a key term list for culling data for review a client noticed a lot of false positives. To reduce their overgeneration they added some layers of logic to the query. So rather than searching for "shut down" they began to look for "shut down" within five words of "factory" only in emails. Still getting a lot of false hits they added more conditions, such as the people involved and the specific time period. Even though the data set was the same size for each search, and the result sets were getting smaller, the time to run the search kept getting bigger.
- With CQE: Most search engines are built to quickly return the most relevant results for a simple query. As you add more logic and requirements to the query, the results need to be sorted for relevance against many more criteria. CQE is ideal for when you want not just the most relevant results, but also the most accurate and complete results, since its search speed is based on data size, and does not grow in proportion to increased query complexity.
Search for virtually any character
- Without CQE: A client could not indentify messages sent by her company's researchers with patent references because the search engine she used could not be made to recognize the dash character.
- With CQE: CQE not only includes punctuation signs, but also allows the user to control the tokenizer on the fly: no matter the complexity, virtually all possible tokens can be recognized and parsed.
Key features
Among the features that allow CQE to cope with these kinds of queries are:
- Massive parallelism: Supports extremely long and/or numerous queries
- Powerful query syntax: Can express complex queries
- Results on the match, not just the document: Allows hits to be recorded on the match level, not just the document level. This enables extremely expressive context-sensitivity-based occurrences in different on document sections, for example.
- Zero-time re-indexing: Allows for monitoring of continuous data streams or addition of any number of documents without incurring substantial up-front time penalties.
- Execution speed: In tests, 80 times faster than Lucene, an industry standard search engine. This differential is even greater with increased query complexity.
- Technology control: CQE has been entirely developed by Cataphora engineers based on several key, patented technologies. As such, it's always possible for our engineers to address unusual and/or specific requests in a timely manner.
- Powerful API: An API allows results to be customized so that any post-processing can be applied on-the-fly to any matching document. This can, for example, be used to highlight matching instances in documents.
- Full access to query syntax: No matter how strange or infrequent the tokens (meaningful sequences of one or more characters) encountered in a document (or in a query), CQE can be adapted appropriately.
|
|