Demo Scenario

From ETaxonomy

Jump to: navigation, search

Contents

SPNHC Demo Scenario

Scene One

Goal: Show the quality problem

Workflow composition and function:

  • CollectionRead actor: read specimen data collection from database and import it into workflow
  • Visualization actor: using Google map shows the following quality issues:
    • P1. point out of map:
      • visible on map
      • correct locality but wrong lat/log
      • will be fixed by using GeoManer/GeoLocate
    • P2. species distribution outlier (only for US): (GeoManer/GeoLocate)
      • user query one species from the dataset
      • the map shows the distribution polygon (states) of the species by using Hong’s data, and flag the data outside the polygon
      • correct locality but wrong lat/log
      • will be fixed by using GeoManer/GeoLocate
    • P3. flag collecting event outlier:
      • user query one collector and a month from the dataset
      • map shows collection event points distribution
      • the outlier is visible
      • it’s an error of collecting time
      • the workflow will use CollectingEventOutlierFinder to find it out and flag it
    • P4. scientific name error: (IPNI)
      • Provide a button for scientific name validation
      • Search each scientific name in IPNI (local database/cache)
      • flag data not found on map
    • P5. flowering time error: (Hong’s data)
      • Provide a button for flowering time error
      • Compare flowering time of the dataset with hong’s data and flag the problematic data.
      • show the correct flowering time

Scene Two

Goal: Shows how easy to add one actor into workflow (drag & drop on the wire) and fix some quality problem.

Workflow composition and function:

  • CollectionRead actor
  • GeoLocate actor: fix P1 and P2 and annotate the problematic data
  • Visualization actor: shows the P1 and P2 are gone

Scene Three

Goal: Shows a workflow to correct all the errors by adding more actors.

Workflow composition and function:

  • CollectionRead actor
  • GeoLocate actor: fix P1 and P2 and annotate the problematic data
  • CollectingEventOutlierFinder actor: find out and annotate the outlier data by using the clustering method (the method is not the point)
  • DuplicatesFinder actor: find duplicates by using clustering method
  • Fuse Actor: create a fused record for each duplicate set
  • ScientificNameProcessor actor: for each fused record, annotate the data which can’t be found in IPNI/GNI; for the data could be found in IPNI or GNI(lexical group), add lsid.
  • FloweringTimeValidator actor: For each fused record, correct the data with wrong flowering time by using Hong’s data and annotate such problematic data.
  • DuplicatesUnifier actor: use fused record to unifies each duplicates (self-cleaning)
  • Visualization actor: shows all problems are gone, except the following problems which could be fixed by curator
    • collecting event outlier is flagged which is actually a time error
    • scientific name of one record is wrong which is because in the duplicate set, two records have wrong name while only one record has correct name. However, the wrong name is chosen for the fused record due to the fuse strategy.
  • CurationSummary actor: present a curation summary report

Scene Four

Goal: Shows how to easily replace an actor and how human interaction is introduced to further help curation.

Workflow composition and function:

  • CollectionRead actor
  • Replace GeoLocate actor with BioGeoMancer: fix P1 and P2 and annotate the problematic data. No comparison just to show possibility to use different services.
  • CollectingEventOutlierFinder actor: find out and annotate the outlier data by using the clustering method (the method is not the point)
  • DuplicatesFinder actor: find duplicates by using clustering method
  • Fuse Actor: create a fused record for each duplicate set
  • ScientificNameProcessor actor: for each fused record, annotate the data which can’t be found in IPNI/GNI; for the data could be found in IPNI or GNI(lexical group), add lsid.
  • FloweringTimeValidator actor: For each fused record, correct the data with wrong flowering time by using Hong’s data and annotate such problematic data.
  • OAuthAuthenticator actor: get OAuth token and secret from google
  • DataImporter actor: import data to be curated by curator into spreadsheet
  • CurationCollector actor: collect curation from curator
  • DuplicatesUnifier actor: use fused record to unifies each duplicates (self-cleaning)
  • Visualization actor: shows all problems are gone
  • CurationSummary actor: present a curation summary report

Scene Five

Show curation history in provenance-browser

Scene Six

Push annotation back to FPush network and see the annotation from the web client.

terms

  • Institution code
  • Collection code
  • Dataset name
  • Catalog number
  • Field Number
  • Collecting year, month, day
  • Reproductive condition
  • decimal latitude
  • decimal longitude
  • coordinate uncertainty in meters
  • Geodetic datum
  • Country, stateprovince, county, municipality, locality
  • min, max elevation in meters ?
  • identified by
  • dateidentified
  • family
  • scientific name
  • scientific name authorship
Personal tools
All Hands Meeting