M5.21a Operational specimen web portal

Since 2001, the GBIF (Global biodiversity Information Facility) has been networking worldwide biodiversity data and making them freely available on the Internet. Currently, more than 150 million records (from simple occurrence data to highly structured specimen observation) are indexed by GBIF and accessible through several web portals. Almost all records available via these systems include the Latin name of the organism; which is used as a primary search term.

The scientific names provided by the data sources may include synonyms. For the GBIF Index, these are referred to accepted names in the indexing process, using checklists such as the Catalogue of Life (Froese and Bisby 2002). Users can only access records by searching the accepted name or the synonym referenced by the checklists used.

Search results could be significantly improved with a thesaurus system that allows for user-controlled expansion of query terms to synonyms, related taxa in the taxonomic hierarchy and related taxonomic concepts from additional checklists.

In the context of the SYNTHESYS project (Networking Activity D) a checklist-driven access to European collection and observation data was developed, the “TOTO prototype” (see http://search.biocase.org/toto). TOTO is capable of taking Latin names and expanding them with related query terms from a taxonomic thesaurus (checklist). EDIT has taken TOTO and the BioCASE portals (see http://search.biocase.org), further developed and integrated them into a search portal providing a new GBIF-data explorer for taxonomist. This new EDIT portal is giving users full control of the query expansion process.

Implementation

The Botanic Garden and Botanical Museum Berlin-Dahlem is responsible for the implementation of an access interface tailored to taxonomist's needs, building on the specimen access software (interfaces and generic thesaurus access) developed by the BioCASE and SYNTHESYS projects.

Available in 11 languages, this new tool provides users with fast and easy-to-use access to worldwide biodiversity data, and offers full control over taxonomic query expansion. Users can choose which thesauruses to use; choose to include or
exclude types of relationships (synonyms, related taxa in the taxonomic hierarchy, and related taxonomic concepts such as misapplied names); and individually mark or unmark discovered “related” names for inclusion in the search.

The portal accepts one or more Latin names, suggests related query terms for both zoological and botanical data, expands the query accordingly and offers complete BioCASE portal functionality for resulting specimen and observation data. It can be accessed under http://search.biocase.org/edit.

Currently (July 2008), the EDIT data explorer for taxonomists uses the following sources for query expansion:

  • “Euro+Med Plantbase” - an information resource for Euro-Mediterranean plant diversity
  • “European Register of Marine Species” - an
    authoritative taxonomic list of species occurring in the European marine
    environment
  • “Fauna Europaea” - a database of scientific names and distribution of all living multicellular European land and fresh-water animals
  • “German standard checklist” - the standard list of the ferns and flowering plants of Germany
  • “Reference list for the German Bryophytes”

Since a standard interface for taxonomic thesauri is used, this list can be expanded with relatively little effort.

As a next step in the development driven by EDIT, users will have the option to save the full data received from their queries in a CDM (EDIT Common Data Model) database and thus use them in the context of the EDIT Platform for
Cybertaxonomy.

Please read more here