Test Data Sets
From ETaxonomy
Paul's Three Data Sets
Webprojects has three Specify databases, one for each of Paul's three test datasets. The mappings of database names to data sets is:
- testfrog: dataset a
- testfish: dataset b
- testfern: dataset c
The database names do not have any particular significance (i.e. there are not frog records in testfrog).
Umbfp has one Specify database:
- testfish: dataset b
The duplicate (same collector number and collector name) records:
| CatalogNumber | Barcode | CollectorNumber | Collectors | Taxon | a & c | 431 | 000000267 | 4887 | B. A. Krukoff | Tovomita krukovii | | 432 | 000000268 | 4887 | B. A. Krukoff | null null | b & c | 448 | 000000284 | 1866 | G. Klug | Graffenrieda colombiana | | 453 | 000000289 | 1866 | G. Klug | Graffenrieda colombiana | a & b [none]
Simple Data set of 20 records
Two tables with a one to one relationship, slightly more complex than DarwinCore, but only trivially so:
Collection object with barcode/catalog number and a determination and a type status and a collector's number Fields: BARCODE HERB_ACRONYM FORMAT REPRO TAXON TYPE_STATUS COLLECTOR_NO SITE_ID where site_id is the internal HUH database foreign key value for the site at which the collecting event where this specimen was collected, collector_no is the field number/collectors number assigned by the botanist who collected this specimen to the specimen at the time it was collected.
Tab delimited file: Image:Test1_collection_object.csv
Collecting event with a collector, locality description, and geopolitical placement.
Fields: SITE_ID COLLECTOR START_YEAR START_MONTH START_DAY LOCALITY H_COUNTRY_NAME H_PRIMARY_NAME where start year, month, and day are the date collected, and h_primary_name is the name of the state/province level geopolitical entity within country where the collecting event occurred.
Tab delimited file Image:Test1_collecting_event.csv
select BARCODE, HERB_ACRONYM, FORMAT, REPRO, TAXON, TYPE_STATUS, COLLECTOR_NO, COLLECTOR, START_YEAR START_MONTH, START_DAY, LOCALITY, H_COUNTRY_NAME, H_PRIMARY_NAME from collection_object left join collecting_event on collection_object.SITE_ID = collecting_event.SITE_ID
Generated from ASA with query:
select barcode, herb_acronym, format, repro, taxon, type_status, collector_no,
collector, start_year, start_month, start_day, locality, h_country_name, h_primary_name
from view_specimen_leftjoin_item, view_type_specimen_join_taxon, view_site_and_geo
where (collector_id = 138403 or collector_id = 109283 or collector_id = 104041)
and view_specimen_leftjoin_item.id_specimen = view_type_specimen_join_taxon.specimen_id
and view_specimen_leftjoin_item.site_id = view_site_and_geo.id_site
order by taxon_id