Talk:PubMan Func Spec Ingestion

From MPDLMediaWiki
Revision as of 12:59, 24 February 2009 by Uat (talk | contribs) (→‎Phase 2)
Jump to navigation Jump to search

work in progress

Phase 1[edit]

  • provide multiple item submission (batch import) for local Endnote files, bibtex files and WoS records, containing more than one reference.

following formats have to be supported: endnote, escidoc xml, bibtex, wos, RIS--Ulla 12:58, 24 February 2009 (UTC)

  • For endnote, consider files in various versions: Either version 1.x-7 or verion 8.x
    • encoding of files depends on endnote version: 1.x to 7 support ASCII, 8.x support UTF8
    • Mapping to PubMan Genres depends on endnote version (different mappings needed)

First Prio: Endnote version in use by ICE and MPI pflanze--Ulla 12:58, 24 February 2009 (UTC)

  • For WoS, consider:
    • include "times cited"? (AEI)
      Please note, that this information is not stable, but evolves over time. Newly published articles are not cited at all --Inga 10:25, 28 July 2008 (UTC)

exactly, rather provide feature by look-up service--Ulla 12:58, 24 February 2009 (UTC)

  • for BibTeX consider, that the record can contain an URL tag, which points to the fulltext belonging to the bibliographic record, which should be uploaded to PubMan (see also BibTeX maping)

Important scenario for ICE: references are maintained in Endnote and ingested from time to time to PubMan. Therefore, use case has to include:

  • decision by user if ingest

a) creates new data b) overwrites existing data (based on local Endnote ID)

to be clarified:

  • in case endnote import contains both new references and modified references...can user select "last modified entries" in his local endnote library to import only modified entries to pubman?
  • is institute aware that PubMan is richer than endnote, ie. additional data submissions/modifications might have to be done on PubMan?--Ulla 12:58, 24 February 2009 (UTC)

Phase 2[edit]

duplicate identification, duplicate handling

basic duplicate identification (based on ID) should be part of Phase 1--Ulla 12:59, 24 February 2009 (UTC)

Phase 3[edit]

workflow based ingestion, incl. task manager and processing of ingested items

Functional specification[edit]

UC_PM_IN_01 import file in structured format[edit]

In order to save manual typing for example, the user wants to upload a file in a structured format such as BibTeX, EndNote Export Format or RIS.

Status/Schedule[edit]

  • status: in specification
  • schedule: R 4.2

Triggers[edit]

  • the user wants to upload a file in structured format in order to create eSciDoc items

Actors[edit]

  • Import manager

Pre-Conditions[edit]

  • Target collection has to be selected.
  • validation rules for import have to be selected.

Flow of events[edit]

  • 1. The user starts to import a file to the system
    • 1.1 The system prompts for the path of the import file. And the specification of the Import Format (BibTeX, EndNote Export Format, RIS).
    • 1.2 The user enters the path to the file, specifies the import format and confirms the input.
    • 1.3 The system import the file.
    • 1.4 The system checks the size of the file against the defined maximum file size and the import format.
      • 1.4.1a. The file size is less or equal to the maximum file size.
      • 1.4.1b. The import format is valid.
      • 1.4.2a. The file size is greater than the maximum file size. The system discards the file and displays an error message. Continue with 1.
      • 1.4.2b. The import format is invalid. The system discards the file and displays an error message. Continue with 1.
    • 1.5 The system creates new PubMan entries and displays them after the successful import in the import manager workspace.

Post conditions[edit]

New PubMan Entries have been created.

Future development[edit]

  • Duplicate checking.
  • Automatic upload (give the URL of the server from where to get the data on a regular basis).