Settings for Searches in Chemical Data

General Search Mode

In the drop-down box Mode, the flavour of queries for chemical substances are settable as exact search, substructure search and similarity search. With exact structure searching, all molecules that are isomorphic with the query are returned. Substructure search finds all molecules that contain the query structure as a portion in the molecule.

When you issue a similarity query, the server will calculate a so-called fingerprint of the structure given in order to perform a fuzzy search. The fingerprint is an attempt to capture the most salient features of chemical compounds and reduces spatial arrangements of atoms to strings of bits. A comparison of the fingerprints of the compound entered by you and the compounds stored in the database is then performed; the number of corresponding bits for each pair of fingerprints is counted and interpreted as a measure of similarity.

The Tanimoto coefficient is one way to normalize these comparative bitcounts; it is defined as C/(A+B-C), where A and B are the number of bits set in the fingerprints of molecule A and B, respectively, and C is the number of bits describing the same feature that are set in both. The result, the similarity, is a fractional number between zero and one, with smaller values indicating less and greater numbers indicating more similarity.

Similarity Threshold

In the drop-down box Min Similarity, you can set a threshold for the similarity value. Settings below 0.6 generally indicate a rather poor degree of similarity, and results with coefficients below 0.4 should not be considered as candidates for like structures. Since the curve describing the number of result set members shows an ever steeper slope towards similarity coefficients of one, a more finely grained scale of settings is offered for thresholds between 0.9 and one.

Restricting Molecules by Weight

The settings in the two drop-down menus Min mol.weight and Max mol.weight allow you to restrict results to those chemical compounds that are either lighter than a minimum weight, or heavier than a maximum weight, or whose weight is in the range between the two limits.

Restricting Maximum Search Time

The setting of the field Max search time optionally restricts the maximum time the core search algorithm will spend with examining database entries; values between 45 seconds and 30 minutes are possible here.

Restricting the Number of Results

Using the field Max results, you can set the maximum number of chemical compounds returned by database searches; numbers between 50 and 8000 are possible here. The setting (all) guarantees an exhaustive perusal of all entries in the database.

Note The value entered in this field also determines the minimum number of records the search algorithm will look at. Searching will stop when there are as many substances in the result set as given in the field Max results, or when all records have been examinated. This means that all searches with Max results set to a value other than (all) are not guaranteed to deliver exhaustive or reproducible results. Also note that the maximum number of results shown per page is still determined by the settings explained in Determining Layout and Size of Results Page, below.

Settings for Searches in Botanical Data

Restricting the Search to Certain Tribes

By choosing one out of several tribes listed in the drop-down box labelled Tribus, you can restrict searches in the botanical data to those plants whose accepted taxonomical status puts them into that group. Synonyms of plant names that would result in a differing tribe are not considered. We regret that no searches solely based on the tribe are permitted at this time.

Restricting the Search to Certain Plant Names

The textbox Plant Name Substring allows you to select plants whose names (accepted names or synonyms) contain the fragment entered.

All strings are processed as case insensitive, normalized substring searches with wildcards. The two possible wildcard characters are the asterisk, *, for any number of characters and the question mark ? for any one character. Case insensitivity and normalization mean that accented and upper case letters are converted to unaccented, lower case letters and wildcards, thus assuring best matches between slightly differing spelling variants.

For example, when you enter the searchstring L*é, the application will first normalize this to l*e and then return all plant names that have somewhere a letter l or L, followed by zero or more characters, followed by any of e, E, é, ê... (in order for the search to be truly useful, you will probably have to be a tad more specific).

At present, since all plant names start with the designation of their genus, a search for plant genera is also accomplished via strings entered in the plant name textbox.

Settings Concerning Result Display

Determining Layout and Size of Results Page

The settings Results per row and Rows per page determine the width and the height of the grid that holds the chemical substances returned by the query. If your screen is displaying 1024 pixels, you can probably choose four results per row. Observe that the maximum page size is restricted to 200 chemical compounds, no matter what preferences are entered here.

Determining the Looks of Chemical Substances

The setting Image Model gives you a choice between the more parsimonious wireframe and a fancy ball-and-stick rendering of chemical compounds.



© Freie Universität Berlin, Botanischer Garten und Botanisches Museum Berlin-Dahlem,
Seitenverantwortlicher / Page editor: W. Berendsohn, Kontakt / Contact: bohlmann@bgbm.org.    
BGBM Impressum / Imprint

Disclaimer / Haftungsausschluss

This page last updated on 13-08-2003