Free Text Search

Table of Contents

The CDM supports high-performance free-text ("google-like") searching of the data that it stores. It uses the hibernate-search library to integrate the popular apache Lucene search software into the CDM. The persistence layer includes hibernate-search integration by default, so objects are added to the lucene index when applications save entities, and the indices are updated when applications update or delete objects. All fields are converted to lowercase during indexing, and queries are converted to lowercase during parsing. Several properties are indexed per object type, and it is possible to search individual fields or combinations of fields. The basic syntax used for free text queries is described on the lucene website.

All classes have a default field that is searched when a field is not specified. In the case of classes that extend IdentifiableEntity the titleCache field is used. By default, query strings are broken into individual terms and objects are returned that match any of the terms (e.g. Acherontia atropos). To return objects that match all terms, in any order, the an AND operator can be used (e.g. Acherontia AND atropos). By enclosing individual terms in double quotes, you can specify that terms must appear in a certain order (e.g. "Acherontia atropos").

To search a specific property, prepend the name of the property, followed by a colon to the query (e.g. nameCache:"Acherontia atropos"). Properties of related entities can be searched too, provided that they have been indexed, using java-beans-like dot-notation. For example, to return all references written by Schott you could use authorTeam.titleCache:Schott, and to return all publications written in the 1940's you could use either datePublished.start:194* or datePublished.start:[1940* TO 1949*] (to specify a range).