Return to the GPC Home Page 

Import Format for the GPC Prototype Database


EI EI HPr HPr

Import format specification for datasets

With the aim to streamline the import of datasets, the October 1998 meeting of the IOPI checklist committee decided to define a tabular specification, closely following the model of the provisional checklist database. Data adhering to these specifications can directly be imported in the GPC database, and feedback can be provided to data donors if records have been involved in the taxonomic editing process.

The tables can be provided in any common format (comma delimited, dBase, etc.), but provision in MS-Access format is preferred. Please be aware of the caveats involving diacritics and special characters and note our experiences with regard to common errors


Table Specifications

Model diagram (.gif) | Model diagram (Word97) | PotTaxonName | GeoTDWGAssoc | RefTitle | StatusAssignment | Other data

The following tables and attributes are supported for import into the database. Those marked ** are obligatory, those marked * are obligatory under certain conditions. Of the remainder, you can pick those you can supply and ignore the rest.

Table: PotTaxonName **

This table holds all information on the name as well as any additional information provided by the source (with the exception of TDWG standard geographic distribution records, see note).

At present, only records of names of generic rank or below should be included. Hybrid formulae are not yet supported.

PotTaxon_Pk1 *
Identifier of the entire source dataset. Value provided by IOPI.
Numeric (integer). Not null. Set to 0 for single new dataset. In the IOPI database, the value must correspond to a value of the attribute RefTitle_Pk in the table RefTitle.
* Obligatory if the data provided are to replace existing data in the GPC database.
PotTaxon_Pk2 *
Primary key identifier of the name record within the source dataset.
Char(50). Unique. Not null.
Note: this key value will be used to communicate results of the taxonomic editing process to data providers.
* Obligatory if used to link synonyms to accepted names.
Rank_Fk **
Indicates the rank of the name. Foreign key to the table Rank.
Numeric (integer). Must correspond to a value of the attribute Rank_Pk in the table Rank. Please communicate ranks not cited. The prefix "notho" should be replaced by a hybrid marker (see below).
** Obligatory.
SourceHigherTaxon **
Higher taxon for the name in the record (e.g. family name), as used in the source dataset.
Char(100).
** Obligatory.
SourceStatusDesignation **
The status of the name in the record as used by the source. E.g. "Accepted", "Synonym", "Undefined".
Char(10).
** Obligatory.
GeneraKew_Fk **
Foreign key to the generic record in the table GeneraKew corresponding to the name in the attribute Genus below. Explanation.
Numeric (integer). Must correspond to a value of the attribute GeneraKew_Pk in the table GeneraKew. Please communicate genera not cited.
** Obligatory.
Genus **
Generic name.
Char(50). Do not include hybrid marker.
**Obligatory (but see note on names of taxa).
Spepi **
Species or infrageneric epithet.
Char(50). Do not include hybrid marker.
**Obligatory (but see note on names of taxa).
Inepi **
Infraspecific epithet.
Char(50). Do not include hybrid marker.
**Obligatory (but see note on names of taxa).
AuthorString **
Full citation of authors of name, including parenthetical authors and "ex-authors" but not "in-authors" (the latter being part of the literature citation). In the case of monomials, the citation refers to the entire name. For bi- or trinomials, it refers to the final epithet of the name being entered, or, in the case of autonyms, the name the autonym was derived from.
Char(100).
**Obligatory (but see note on names of taxa).
GenHybMarker *
Generic hybrid marker.
Char(1). Either X for hybrid, or + for graft chimaera, or <space> 
* Obligatory where applicable (also see note on names of taxa).
SpHybMarker *
Species or infrageneric hybrid marker.
Char(1). Either x for hybrid, or + for graft chimaera, or <space> 
* Obligatory where applicable (also see note on names of taxa).
InHybMarker *
Infraspecific hybrid marker.
Char(1). Either X for hybrid, or + for graft chimaera, or <space> 
* Obligatory where applicable (also see note on names of taxa).
NameString *
Name of taxon (genus or lower) without name authors and hybrid markers; i.e. a monomial, binomial, or trinomial with the rank abbreviation inserted where necessary according to the rules of botanical nomenclature.
Char(100).
* Obligatory, but can be calculated (see note on names of taxa).
FullName *
The full name as used in the source. Includes the name string as well as the author team and appropriately intercalated hybrid markers. May include additional information (see note on names of taxa).
Char(200).
* obligatory, but may be calculated (see note on names of taxa).
TaxCitation
Nomenclatural literature citation, including "in authors" where applicable.
Char(150).
Type
Citation of the type specimen, illustration, or name.
Char(150).
CommonNames
Common name(s).
Char(250). If possible, use ";" between individual names.
SecSourceCitation
One or more literature (or dataset) citations as to be given credit as a secondary source for the information in the source dataset record. A short text referencing a URL may be used if the available space is insufficent.
Char(150).
Notes
Other information which does not fit elsewhere.
Char(240).
GlobalDistributionPhrase **
All data on higher level geographic distribution (concatenated) of the taxon (see also note on geographic distribution).
Char(150).
** Obligatory (for future datasets).
GlobalIntrodCultivPhrase
The global distribution as an introduced or cultivated plant.
Char(150).
GlobalNativeDistributionPhrase
The complete native global distribution.
Char(150).
GlobalCompletenessFlag
Indicates that the global distribution cited in the GlobalDistributionPhrase or as given by the BRU's (see note on geographic distribution) is considered to be complete.
Bit (1 for true, 0 for false).

Table GeoTDWGAssoc

This table dissolves the n:m relationship between the tables PotTaxonName and GeoTDWG. The latter contains the abbreviation and full text designations for the standard geographical units according to Hollis & Brummitt (1992), the standard abbreviations for the different levels of the standard is used as the primary key. It can be downloaded from http://www.bgbm.fu-berlin.de/TDWG/geo/default.htm.

PotTaxon_Fk1, PotTaxon_Fk2
Korresponding primary key values for the taxon name referenced.
ContinentCode_Fk1
Standard code for continent.
Integer. Not null. Combination of _Fk1 to _Fk4 must correspond to a respective combination of  _Pk (primary key) values in table GeoTDWG.
Example: 9 (for "Antarctic".)
RegionCode_Fk2
Standard code for region.
Char(5). Not null. Combination of _Fk1 to _Fk4 must correspond to a respective combination of  _Pk (primary key) values in table GeoTDWG.
Example: 80 (for "Mesoamerica")
BotCountryCode_Fk3
Standard code for botanical country.
Char(6). Not null. Combination of _Fk1 to _Fk4 must correspond to a respective combination of  _Pk (primary key) values in table GeoTDWG.
Example: LBS (for "Lebanon-Syria")
BasicRecUnitCode_Fk4
Standard code for basic recording unit.
Char(5). Not null. Combination of _Fk1 to _Fk4 must correspond to a respective combination of  _Pk (primary key) values in table GeoTDWG.
Example: SY (for "Syria")
Doubtful
May contain a question mark to express doubtful presence of taxon in the geographic unit.
Char(1).
Please note that a record may be referred to any level in the hierarchy. For example, a record may be referred to the continent of Africa, in which case the region, botanical country and basic recording unit would be left empty (empty strings, not null). However, this does not imply that all lower units are automatically included.

Table RefTitle **

This table contains the dataset-related metadata currently supported by the GPC database. In addition, for "living" datasets (which continue to be developed at their source), a mechanism can be implemented to directly query the actual state of the source database for the displayed record.

RefTitle_Pk *
A unique identifier of the entire source dataset. Value provided by IOPI.
Numeric (integer). Not null. Set to 0 for single new dataset. Correspond to a value of the attribute PotTaxon_Pk1 in the table PotTaxonName.
* Obligatory if the data provided are to replace existing data in the GPC database.
RefTitleString **
A reference to your dataset as you wish to have it cited as "Source" output.
Char(255).
** Obligatory.
RefURL
A URL you want to include as a link (inserted after the RefTitleString in the "Source" output).
Char(50).
QueryURL
A URL to a form where the source dataset can be queried "live".
Char(50).
QuerySyntax
The full URL to query a specific record in the "live" source dataset. Include IOPI field names in braces ({}) where the value must be included. Possible fields: NameString or a combination of Genus, Spepi, Inepi, and RankAbbrev.
Example (for the IOPI provisional GPC):   http://www.bgbm.fu-berlin.de/scripts/asp/gpc/entry.asp?name={NameString}
Char(255).
ExportDate **
Date
** Obligatory for datasets generated from a database.
Permission **
Person or legal entity who is considered to be the owner of the dataset and/or gave the permission to use it in the GPC.
Char(50).
** Obligatory.
MaintainedBy **
Person or legal entity who maintained the dataset at the time of the export.
Char(50).
** Obligatory.
Conversion
Notes on the conversion process from original source dataset to IOPI format dataset.
Text.

Table StatusAssignment

StatusAssignment_Pk *
Primary key for the status assignment. Value provided by IOPI.
* Obligatory if the data provided are to replace existing data in the GPC database.
RefTitle_Fk *
Reference assigning the status. Value provided by IOPI.
Numeric (integer). Not null. Set to 0 for single new dataset. Must correspond to a value of the attribute RefTitle_Pk in the table RefTitle.
* Obligatory if the data provided are to replace existing data in the GPC database.
AssignedStatus_Fk **
Reference to the status assigned to the name defined by PotTaxon_Fk1 and PotTaxon_Fk2 below. A foreign key to the table AssignedStatus, which currently contains only 3 values: 1 for accepted (or "preferred") name, 2 for synonym, and 3 for unresolved synonym.
Numeric (integer). Not null. Must be either 1, 2, or 3.
** Obligatory.
PotTaxon_Fk1, PotTaxon_Fk2 **
Foreign keys referencing the name to which the status is assigned.
The combination of key values must correspond to a respective combination of  _Pk (primary key) values in table PotTaxonName.
** Obligatory.
Further rules: Every record in PotTaxonName must have at least one corresponding record in StatusAssignation. If a record in StatusAssignation assigns Accepted status (1) or Unresolved synonym (3) to a record in PotTaxonName, there must not be another record for that name in StatusAssignation. Several AssignedStatus records with Synonym (2) status may co-exist.
AcceptedPotTaxon_Fk1, AcceptedPotTaxon_Fk2 *
Foreign keys referencing the accepted name for a synonym.
The combination of key values must correspond to a respective combination of  _Pk (primary key) values in table PotTaxonName.
* Obligatory for records with AssignedStatus_Fk = 2, must be Null for others.
DoubtfulFlag
Flag indicating reservations with respect to the status assignment. Normally used to indicate doubtful acceptance of a name.
Bit (1 for true, 0 for false).
LevelNo **
Expressing the degree of taxonomic editing within the IOPI GPC.
Numeric (integer). Set to 0 for source datasets.
** Obligatory.
SynStatSuffix
A phrase added to the output of synonyms and (especially) concept synonyms for fully or partially edited entries (LevelNo >0). Examples: "presumably included in toto".
Char(50).
StaAssignmNotes
Notes referring to the status assignment  for fully or partially edited entries (LevelNo >0).
Char(255).

Further data

In case that a "proprietary" encoding of the geographical distribution is used, please provide a text or table (will be included as a URL in the notes field of checklist output; example: Flora Europaea, Med-Checklist). The same should be done for encoding schemes used for diacritical characters not defined in the ANSI character set.


Diacritics and Other Special Characters

The GPC database uses the standard character set ISO 8859-1 (Latin 1 or ANSI). These characters are converted to HTML tags during a query when output to the World Wide Web is generated. 

We can convert any character encoding to ANSI, as long as the codes are unique and the specification is provided with the dataset. However, source data files which use ANSI or HTML tags are the least troublesome.Characters not in the ANSI character set must be transliterated or encoded by the provider of the dataset. This information should be included in the description of the dataset. 


Common errors

Some common errors we found in past export sets: 
- Names of non-vascular plants were included in the dataset 
- Autonyms included the author name in the "NameString" field 
- Leading X used for hybrids in NameString field (must be put into the GenericHybridMarker field) 
- Hybrid formulae (not yet supported by the database) 
- Infraspecific names included the species author team 
- Names of taxa of a rank higher than Genus (not yet supported by the database) 


Generic classification

The current (provisional) database fails to implement an important feature of the IOPI model (as published in Taxon 46:283-309, 1997), namely, support for alternative classifications. Although the classification provided by the source record is cited ("SourceHigherTaxon"), assignation of higher taxa for record retrieval purposes is based on a very simple hierarchical scheme. All records in the PotTaxonTable (i.e., all names in the database) are linked to a generic record in the table KewGenera, which in turn provides the family name. 

The data present in the KewGenera table are based on the publication Vascular Plant Families and Genera compiled by R.K.Brummitt and published by the Royal Botanic Gardens, Kew in 1992 (available from Kew's Mail Order Department). The file currently undergoes major revision and an updated version will be provided. However, for the time being the file (and the additions introduced in the process of importing datasets for the IOPI database) continues to be used as the generic backbone of the GPC.

Assigning a source record to a genus in the table may be hampered by the presence of generic homonyms. Since the authors are not cited from the Kew list, this is only relevant (for the functioning of the GPC) if the family assignment differs between the homonyms. The complete table can be provided upon request.


Names of taxa in the provisional GPC database

A certain redundancy of name-related attributes may be noted in the table PotTaxonName. This is partly due to the effort to maintain the original data as far as possible, partly it is due to technical reasons. The attributes NameString and FullName could be calculated from the name elements given in Genus, Spepi, Inepi, RankAbbrev, AuthorString, and the hybrid markers. However, the FullName may include information present in the dataset which is not otherwise supported (e.g. intercalation of species authors in a infraspecific name or additional ranks cited, such as a subspecies epithet for a variety). The NameString attribute (in combination with the respective family designation from the GeneraKew table) is output in the lists displayed as the result of a query.


Geographic distribution

Geographic distributions for fully edited taxon records in the GPC database are to use the Botanical Recording Units (BRU's) as published in Hollis, S. & Brummitt, R. (1992): World Geographical Scheme for Recording Plant Distributions, Plant Taxonomic Database Standards No. 2, International Working Group on Taxonomic Databases for Plant Sciences (TDWG), Hunt Institute for Botanical Documentation, Pittsburgh. The data are available in electronic form under http://www.bgbm.fu-berlin.de/TDWG/geo/default.htm. (a second version of the document will be published in 1999).

However, most of the source datasets provided to IOPI do not adhere to this standard, so that the "distribution phrase" attributes are used to accomodate non-standard (or proprietary standard) geographical data. In parallel, BRU's can be accommodated by means of the Table GeoTDWGAssoc.


Please send us your comments. 

1996-1999 by The International Organization for Plant Information. 
Updated: 10-Feb-99