Dallwitz, M.J. 2000 onwards. A comparison of interactive identification programs. http://delta-intkey.com DELTA Home

PDF Version (863KB)


A Comparison of Interactive Identification Programs

7 April 2007

M. J. Dallwitz

Contents

Introduction

Programs Compared

Comparative Table — General Features

Comparative Table — User Interface

Notes

     Advantages over conventional keys

     Guidance in character selection

     Recording and matching character values

     Subsets

     Character interpretation

     Images and sounds

     Linked keys

     Information retrieval

     Data sharing

     Usability

     Details of user interface

Acknowledgements

References and Citation

Introduction

This is an updated version of a paper presented at the Symposium on Interactive Identification Systems at the International Botanical Congress in August 1999. The programs compared are those that were presented at the Symposium. For contact information for other interactive identification programs, see Dallwitz (1996). The versions originally tested were the latest ones publicly available at the time of the Congress, or ones recommended by the authors. Later versions of some of the programs have since been evaluated — see details below.

Comments on the criteria used, or on the evaluation of particular programs, are welcome. They will be incorporated in future versions of this document, either directly or by modifying the criteria or evaluations.

For more details about the criteria, see Dallwitz, Paine and Zurcher (1998, 2000).

When testing an interactive-key program, it is tempting to pretend to do an identification — that is, to go through the motions of an identification when you already know the answer. This is not generally a good test of the value of a program for real identification. Ideally, the test should be carried out with taxa with which you are unfamiliar (the more so, the better, for example, in a Kingdom different from your speciality). If this is not possible, pay special attention to the easy availability of features which help an inexperienced user: ‘best’ characters, character illustrations and notes, error tolerance, and expressing uncertainty by selecting more than one state.

Programs Compared

Intkey

Free. Windows 95/NT or later. Reviewed August 2000. Version tested: 5.10, August 2000. http://delta-intkey.com.

IdentifyIt

Commercial. Macintosh, Windows. Reviewed July 1999. Version tested: ‘Euphausiids of the World Ocean’, Macintosh version, January 1999. http://www.eti.uva.nl.

Lucid

Commercial. Windows. Reviewed March 2000. Version tested: 1.5 Build 18, September 1999. http://lucidcentral.org.

For a critique of this document by K. Thiele (one of the authors of Lucid), see http://delta-intkey.com/www/thiele.htm.

MEKA

Free. Windows. Reviewed July 1999. Version tested: 3.00, January 1996. http://ucjeps.berkeley.edu/meacham/meka/.

NaviKey

Free. WWW. Reviewed July 1999. Version tested: 2.2, July 1999. Internet Archive: http://www.herbaria.harvard.edu/computerlab/web_keys/navikey/. For later versions, see http://delta-intkey.com/www/idprogs.htm.

PollyClave

Free. WWW. Reviewed July 1999. http://prod.library.utoronto.ca:8090/polyclave/.

XID

Commercial. Windows, Macintosh. Reviewed July 1999. Version tested: Demonstration Version 1.31D, January 1999. http://www.xidservices.com.

Comparative Table — General Features

Notes

 

The entries in the first column of the table are linked to explanatory notes.

Importance of the feature

 

+++

Essential for correct identification.

 

++

Very important for identification.

 

+

Important for identification.

 

No ‘+’ or ‘–’

Useful for identification or for other purposes.

 

Undesirable for identification.

Scoring

 

Y

The implementation of the feature is satisfactory.

 

y

The implementation of the feature is unsatisfactory (incomplete, inconvenient, or error prone). See Notes for details.

 

blank

The feature is not implemented.

 

?

Not known.

 

*

See Notes.

 

Undesirable for identification.

 

A

In Intkey, normally available only in Advanced mode. (Any feature can be made available in Normal mode, at the discretion of the author of a data set.)

Weighted count of features

 

The weighted count of features in the first line of the table is calculated as follows. The basic count depends on the importance of the feature: ‘+++’ counts 16, ‘++’ counts 8, ‘+’ counts 4, ‘–’counts 0, and other features count 2. The basic count is halved if the implementation of the feature is unsatisfactory (for example, an unsatisfactory implementation of a feature marked ‘++’ would be counted as 4).

 

 

Intkey

Iden-
tifyIt

Lucid

MEKA

Navi-
Key

Polly-
Clave

XID

Weighted count of features (max. 236)

217

56

133

62

71

92

77

Advantages over conventional keys

Unrestricted character use ++

Y

Y

Y

Y

Y

Y

Y

Character deletion and changing +

Y

Y

Y*

Y

Y

Y

Y

Error tolerance ++

Y*

 

Y*

 

 

 

 

Locating errors +

Y

 

*

 

 

 

 

Expressing uncertainty ++

Y

*

Y*

y*

 

Y

y*

Numeric characters ++

Y

*

Y

 

Y

Y

 

Guidance in character selection

Best characters ++

Y

Y

y*

 

 

y*

Y

Separating a taxon +

YA

 

*

 

 

 

y*

Best routes

 

 

Y

 

 

 

 

Differentiating attributes

 

 

y –*

Y –*

 

 

 

Removing redundant characters

Y*

Y*

Y*

Y*

Y*

 

Y*

Removing redundant character states

 

 

y –*

 

y –*

 

y –*

Character reliabilities +

Y

 

 

 

 

Y

Y

Attribute reliabilities +

*

 

 

 

 

 

 

Searching the character list +

Y

 

Y

 

 

y*

 

Recording and matching character values

Retaining unknowns +++

Y

*

Y

Y

Y

Y

y*

Character dependencies +

Y

 

y*

 

 

 

 

Automatic controlling characters +

Y*

 

 

 

 

 

 

Gaps for integer numerics +

Y

 

Y

 

Y

Y

 

Text characters

Y

 

 

 

y*

y*

 

Special values for keys +

*

 

y*

 

 

 

 

Probabilistic identification +

 

 

*

 

 

 

 

Inapplicable versus unknown

Y

 

 

 

y*

Y

 

Expanded ranges for numerics

y*

 

 

 

 

y*

 

Unknown state values

 

 

Y

Y

 

 

 

Exact characters

YA

 

 

 

 

 

 

Fixing character values

YA

 

Y

 

 

 

 

Subsets

Named subsets of characters +

Y*

 

Y*

Y*

 

 

Y*

Global subsets of characters +

Y*

 

Y*

Y*

 

 

 

Local subsets of characters

YA

 

 

 

 

 

y*

Named subsets of taxa +

Y

 

 

 

 

 

 

Global subsets of taxa +

Y*

 

 

 

 

 

 

Local subsets of taxa

YA

 

 

 

 

 

 

Character interpretation

Character notes +

Y*

y*

Y*

 

 

 

Y*

Glossaries +

 

*

 

 

 

 

 

Character illustrations +

Y*

y*

y*

 

 

Y*

Y*

State selection from illustrations +

Y*

y*

y*

 

 

 

 

Images and sounds

Taxon illustrations +

Y

Y

Y

Y

Y

y*

y*

Taxon illustrations by subject +

Y*

Y*

Y*

 

 

 

 

Flexible display of illustrations +

Y

*

*

*

*

*

*

Text with illustrations

Y

 

Y

 

 

 

Y

Sounds

Y

Y

Y

 

 

 

 

Videos

Y

Y

Y

 

 

 

 

Running without illustrations

Y*

*

*

*

*

*

*

Linked keys

Integral hierarchical keys

Y

 

 

 

 

 

 

Separate hierarchical keys

Y

 

Y

 

 

 

 

Information retrieval

Searching the taxon names

Y

 

y*

 

 

y*

 

Descriptions from the data

Y

Y

Y

 

Y

Y

Y

Differences between taxa +

Y*

y*

y*

 

 

 

 

Similarities between taxa

YA*

y*

y*

 

 

 

 

Diagnostic descriptions +

Y*

 

 

 

 

 

 

Taxon retrieval by attributes

Y

 

y*

*

*

*

*

Control of value matching

YA*

 

y*

y*

 

 

y*

Character-value distributions

YA

 

 

 

 

 

y*

Most similar taxa

*

 

Y

 

 

 

 

Text files attached to taxa

Y

Y

Y

 

 

 

 

Data sharing

Importing DELTA format

Y

 

y*

 

Y

Y

 

Exporting DELTA format

Y

 

y*

 

Y

Y

 

Data output

YA*

 

 

 

 

 

 

Links with description writing

Y

 

 

 

Y

Y

 

Links with key generation

Y

 

 

 

Y

Y

 

Links with classification

Y

 

 

 

Y

Y

 

Usability

Online help

Y

y*

y*

 

y*

y*

Y

Command files or macros

Y

 

y*

y*

 

 

 

User-definable toolbar +

Y

 

 

 

 

 

 

External program text

Y

 

Y

 

 

 

 

Log files

YA

 

*

 

 

 

 

Unlimited data size +

Y

y?*

Y?

Y?

Y?

Y?

Y

Unlimited field lengths

Y*

*

y*

*

Y*

Y*

Y

No special memory requirements

Y

Y

Y

Y

Y

Y

Y

Fast execution +

Y*

Y*

*

Y*

*

*

Y*

Internet capability

Y*

 

 

 

Y*

Y*

 

Installation unnecessary

Y

 

 

Y

 

 

Y

Comparative Table — User Interface

The table shows the number of operations (mainly mouse clicks or double clicks) required for components of an identification. Full details of the operations are given in the Notes, which are linked to the entries in the first column. Separate figures are given for using text only (T) and using illustrations (I). The use of illustrations is particularly important for users who are unfamiliar with the group treated in the key.

In some cases, the number of operations depends on the circumstances. For example, when character states are individually illustrated, it may not be necessary to view all the illustrations. In these cases, the number is given as a range. When changing values of a character, it may be necessary to find the required character in a list. The number of operations required is indeterminate, but in large data sets it would usually be quite large. For simplicity, it has been scored by adding 1 to the upper end of the range, and appending ‘+’.

For a comparison of the main screens of Intkey and Lucid, see Dallwitz (2000).

 

 

Intkey

IdentifyIt

Lucid

MEKA

Navi-
Key

Polly-
Clave

XID

T or I

T

I

T

I

T

T

T

I

T

I

Start a new identification

1

2

2

2

2

1

1

1

1

1

1

Basic cycle with ‘best’

2

3

4–8

3

3–7

 

 

4

6

3

6–9

Use a multistate character

2

3

4–10

2

3–9

1

2

4

6

3

6–12

Use a numeric character

3

 

 

4

5–7

 

3

4

6

 

 

Use multiple state values

3

 

 

3

5–13

4

 

5

7

6

9–18

Change state values

3

5–6+

6–11+

2–4+

(4–9+)

3–4+

4+

5–6+

7–8+

4–5+

7–11+

Display character notes

1

1–6

 

3–6

2–6

 

 

 

 

2–9

0

Notes

Advantages over conventional keys

Unrestricted character use

No restrictions on the order in which characters can be used (apart from restrictions imposed by ‘Character dependencies’ — see below).

Intkey. Yes. IdentifyIt. Yes. Lucid. Yes. MEKA. Yes. NaviKey. Yes. PollyClave. Yes. XID. Yes.

Character deletion and changing

Removing characters used in an identification, or changing their values.

Intkey. Yes. IdentifyIt. Yes.

Lucid. Yes, but not when using the ‘character-graphics’ window — see ‘Change state values’ below.

MEKA. Yes. NaviKey. Yes. PollyClave. No. XID. Yes.

Error tolerance

The ability to reach a correct identification after errors have been made, or if there are errors in the data.

Intkey. The error tolerance can be set manually or automatically (when no taxa match the specimen).

IdentifyIt. No.

Lucid. The error tolerance can only be set manually.

MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Locating errors

When the error tolerance is non-zero, a taxon that differs from the specimen can remain in contention in an identification. The differences may be due to errors by the user, errors in the data, or both. The program should be able to display these differences.

Intkey. Yes. IdentifyIt. No.

Lucid. No. There is no direct way find the differences between the specimen and a taxon that is still in the ‘taxa remaining’ window. It can be done indirectly by reducing the error tolerance to 0, selecting the required taxon in the ‘taxa discarded’ window (where it can be difficult to find because all the taxa are usually there), and pressing ‘why discarded?’.

MEKA. No.

NaviKey. No. PollyClave. No. XID. No.

Expressing uncertainty

The user can specify uncertainty by entering more than one state value, or a range of numeric values.

Intkey. Yes.

IdentifyIt. No. Selected states are combined with ‘and’. Users need to be aware that selecting more than one state will usually lead to an incorrect identification.

Lucid. Yes. Difficult when using character images (a non-default setting is required, and the selected states cannot be seen).

MEKA. Yes. The user has the option of marking states ‘present’, ‘absent’, ‘or-present’, or ‘or-absent’. These markings are then combined with ‘and’. To express uncertainty, the user must press ‘Begin or’, mark states ‘present’ or ‘absent’ as required, then press ‘End or’. Multiple states sometimes cannot be selected from the ‘subset’ windows (because states are removed from these windows if they do not discriminate between the remaining taxa).

NaviKey. No. PollyClave. Yes.

XID. Yes. The user has the option of marking states ‘yes’, ‘no’, or ‘or’. These markings are then combined with ‘and’. To express uncertainty, the user must mark states with ‘or’. Users need to be aware that marking more than one state with ‘yes’ will usually lead to an incorrect identification.

Numeric characters

Using numeric characters directly (without converting to multistate by dividing the values into ranges).

Intkey. Yes.

IdentifyIt. No. Numerical values are stored as multistate characters, although in the identification process the user may enter a numerical value, and the program selects the corresponding state.

Lucid. Yes. MEKA. No. NaviKey. Yes. PollyClave. Yes. XID. No.

Guidance in character selection

Best characters

Advice on the most suitable characters for use at any stage of an identification. It is important that numeric characters be included, as they tend to be better than multistate characters.

Intkey. Yes. IdentifyIt. Yes.

Lucid. Yes. Numeric characters are not included. There is no indication of which characters do not have any separating power (although these can be removed with a separate command — see ‘Removing redundant characters’ below). Also, the computation is prohibitively slow for moderately sized data sets (a few hundred characters and taxa) unless only a small number of taxa remain (see Fast execution below).

MEKA. No. NaviKey. No.

PollyClave. Yes. There is no indication of which characters do not have any separating power.

XID. Yes.

Separating a taxon

Ranking characters according to how well they separate a given taxon from the rest. This is useful for confirming a tentative identification.

Intkey. Yes. IdentifyIt. No.

Lucid. No. The ‘Diagnose’ option searches only for single attributes that are diagnostic for any of the remaining taxa.

MEKA. No. NaviKey. No. PollyClave. No.

XID. Yes, but the characters cannot be directly used from the list.

Best routes

Paths, similar to conventional keys, embedded in the interactive key. A path may be followed from the start of an identification; after it is left, ordinary interactive identification is resumed. This provides some guidance in the choice of characters. The method is inherently much less flexible than ‘Best characters’ (see above), and this limits its usefulness.

Intkey. No. IdentifyIt. No. Lucid. Yes. MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Differentiating attributes

Attributes that are exhibited by only a small number of the remaining taxa. This is not a desirable feature for identification, but it is included here because it is implemented in some programs. These characters tend to be the worst to use in identification, apart from those that do not differentiate the taxa at all.

Intkey. No. IdentifyIt. No.

Lucid. Yes. Attributes are listed only if exhibited by only one of the remaining taxa.

MEKA. Yes. Attributes are sorted by the number of remaining taxa that exhibit the attribute.

NaviKey. No. PollyClave. No. XID. No.

Removing redundant characters

Removing from the list of available characters those that cannot separate the remaining taxa in an identification.

Intkey. Yes. This is done as part of the ‘Best characters’ computation (see above).

IdentifyIt. Yes. This is done as part of the ‘Best characters’ computation (see above). The characters are not actually removed, but the separating power is shown as 0.

Lucid. Yes. However, the computation is prohibitively slow for moderately sized data sets (a few hundred characters and taxa) unless only a small number of taxa remain. Furthermore, there is no way of aborting the calculation once it has started. The implementation is scored ‘satisfactory’, because execution speed has been scored separately (see Fast execution below).

MEKA. Yes. This is done as part of the ‘Differentiating attributes’ computation.

NaviKey. Yes. The computation is rather slow (see ‘Fast execution’ below).

PollyClave. No.

XID. Yes. This is done as part of the ‘Best characters’ computation (see above).

Removing redundant character states

This is a way of preventing the selection of character states that are not exhibited by any of the remaining taxa in an identification. It can be done either by removing the states entirely from the display, or by greying them (which is preferable, as the user is then aware of the situation). Numeric characters can be treated similarly, by not allowing the entry of values not exhibited by the remaining taxa. This is not a desirable feature for identification, but it is included here because it is implemented in some programs. The feature encourages the user to enter values consistent with the values of previously used characters, thereby hindering the detection and correction of errors.

Intkey. No. IdentifyIt. No.

Lucid. Yes. This is combined with the option for ‘Removing redundant characters’ (see above).

MEKA. No.

NaviKey. Yes. This is combined with the option for ‘Removing redundant characters’ (see above).

PollyClave. No.

XID. Yes. This is always done automatically.

Character reliabilities

The ‘reliability’ of a character is a subjective measure, usually supplied by the author, of the character’s accuracy and/or ease of use. The ‘Best characters’ algorithm (see above) should take into account both the separating power of a character, and its reliability. This is especially important with large character lists, so that the user does not have to skip over large numbers of inconvenient or unreliable characters when choosing a character.

Intkey. Yes. IdentifyIt. No. Lucid. No. MEKA. No. NaviKey. No. PollyClave. Yes. XID. Yes.

Attribute reliabilities

An ‘attribute’ is the value or values of a character for a particular taxon. The ‘reliability’ of an attribute is a subjective measure, supplied by the author, of the attribute’s accuracy and/or ease of use. It is recorded as a increase or decrease in the overall or average reliability of the character (see ‘Character reliabilities’ above). Attribute reliabilities allow better performance of the ‘Best characters’ algorithm (see above).

Intkey. Not yet implemented. See Dallwitz, Paine and Zurcher (1993b, 1993c) for proposals for storing this information in DELTA format.

IdentifyIt. No. Lucid. No. MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Searching the character list

Finding text strings in the character list.

Intkey. Yes. IdentifyIt. No. Lucid. Yes. MEKA. No. NaviKey. No.

PollyClave. Yes, via the Web browser.

XID. No.

Recording and matching character values

Retaining unknowns

Taxa for which a character is not recorded are retained when that character is used (with any value) in an identification.

Intkey. Yes.

IdentifyIt. No. ‘Unknown’ is treated as an additional state of each character. Missing values in the database will lead to incorrect identifications.

Lucid. Yes. MEKA. Yes. NaviKey. Yes. PollyClave. Yes.

XID. No. There is no distinction between ‘unknown’ and ‘inapplicable’. Missing values are designated ‘not specified’, and taxa for which a character is ‘not specified’ are eliminated if the character is scored ‘YES’ or ‘OR’. The author of XID recommends that database authors should treat ‘not specified’ as ‘inapplicable’, and code ‘all states present’ if the values are unknown. Treating ‘not specified’ as ‘unknown’ will lead to incorrect identifications.

Character dependencies

Relationships specifying that some characters are inapplicable when other characters take certain values.

Intkey. Yes. IdentifyIt. No.

Lucid. Yes. By default, inapplicable characters are automatically removed from the ‘available’ list. However, dependency relationships in the information entered by the user are not checked, and this can lead to incorrect results. (Also, the Lucid Builder does not check the data for consistency in relation to the dependencies.)

MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Automatic controlling characters

Automatically setting controlling characters to the appropriate value(s) when dependent characters are used. The author must have the option of overriding the automatic setting in the (rare) cases when it would lead to incorrect or inconvenient results. In these cases, the user should be required to use the controlling character before the dependent character.

Intkey. Yes. IdentifyIt. No. Lucid. No. MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Gaps for integer numerics

The possibility of gaps in recorded values for integer numeric characters, e.g. ‘5 or 10’ distinguishable from ‘5 to 10’.

Intkey. Yes. IdentifyIt. No. Lucid. Yes. MEKA. No. NaviKey. Yes. PollyClave. Yes. XID. No.

Text characters

Storing and searching free-text information about taxa.

Intkey. Yes. IdentifyIt. No. Lucid. No. MEKA. No.

NaviKey. Stored, but not searchable.

PollyClave. Stored, but not searchable.

XID. No.

Special values for keys

Flagging values in the data for use only in keys. This would be done for values which are not strictly exhibited by a taxon, but which a user might be likely to assign erroneously to a specimen belonging to the taxon. The use of these values should be under the control of the user of the key, as the true values give better discrimination.

Intkey. Not yet implemented. See Dallwitz, Paine and Zurcher (1993b, 1993c) for proposals for storing this information in DELTA format.

IdentifyIt. No.

Lucid. Yes, except for numeric characters.

MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Probabilistic identification

Using probabilities of state values in the taxa, and probabilities of user errors, to calculate the probabilities that the specimen belongs to a given taxon.

Intkey. No. IdentifyIt. No.

Lucid. No. Taxa having state values designated as ‘rare’ are sorted to the end of the list of remaining taxa.

MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Inapplicable versus unknown

Distinguishing inapplicable values, including those not resulting from character dependencies, from unknown values.

Intkey. Yes. IdentifyIt. No. Lucid. No. MEKA. No.

NaviKey. Inapplicables not resulting from dependencies are distinguished, but those resulting from dependencies are treated as unknown.

PollyClave. No.

XID. No. Missing values are designated ‘not specified’. The author of XID recommends that database authors should treat ‘not specified’ as ‘inapplicable’, and code ‘all states present’ if the values are unknown.

Expanded ranges for numerics

Expanding the range of values recorded for a numeric character that has been poorly sampled in a taxon.

Intkey. Single numeric values in the data can be transformed to a range during the translation from DELTA to Intkey format. However, the original value is not available to the Intkey user for other purposes.

IdentifyIt. No. Lucid. No. MEKA. No. NaviKey. No.

PollyClave. As for Intkey.

XID. No.

Unknown state values

Recording individual state values of multistate characters (as opposed to the character as a whole) as unknown.

Intkey. No. IdentifyIt. No. Lucid. Yes. MEKA. Yes. NaviKey. No. PollyClave. No. XID. No.

Exact characters

Specifying characters whose values are assumed not be subject to error.

Intkey. Yes. The feature is used in two ways in Intkey. Firstly, exact characters are not subject to the ‘Error tolerance’ mechanism (see above). Secondly, if the user enters a value for a controlling character that has been specified as exact, then all dependent values that are inconsistent with the specified controlling value are effectively removed from the data matrix.

IdentifyIt. No. Lucid. No. MEKA. Yes. NaviKey. No. PollyClave. No. XID. No.

Fixing character values

Specifying character values that are not to be cleared when a new identification is started. This is convenient when identifying several specimens that are known (or thought) to share some attributes, usually the place of origin or belonging to a higher taxon.

Intkey. Yes. IdentifyIt. No. Lucid. Yes. MEKA. No. NaviKey. No. PollyClave. No. XID. No.

Subsets

Named subsets of characters

A mechanism for naming subsets of the characters.

Intkey. Yes. Can be defined by the author or the user.

IdentifyIt. No.

Lucid. Yes. Must be defined by the author.

MEKA. Yes. Must be defined by the author.

NaviKey. No. PollyClave. No.

XID. Yes. Must be defined by the author.

Global subsets of characters

Specifying subsets of characters to which all subsequent operations will be restricted.

Intkey. Yes. Characters can be restricted to the sets named by the author or the user, or to sets made up of any combinations of the named sets and individual characters or characters selected by the user.

IdentifyIt. No.

Lucid. Yes. Characters can be restricted to one of the sets defined by the author.

MEKA. Yes. Characters can be restricted to one of the sets defined by the author.

NaviKey. No. PollyClave. No. XID. No.

Local subsets of characters

Specifying subsets of characters for a single operation.

Intkey. Yes. IdentifyIt. No. Lucid. No. MEKA. No. NaviKey. No. PollyClave. No.

XID. Yes. Available only for selecting characters (as arranged in the data) in an identification. The ‘best’ operation cannot be applied to a subset.

Named subsets of taxa

A mechanism for naming subsets of the taxa.

Intkey. Yes. Can be defined by the author or the user.

IdentifyIt. No. Lucid. No. MEKA. No. NaviKey. No. PollyClave. No. XID. No.