Talk:PubMan Func Spec Ingestion

From MPDLMediaWiki
Jump to navigation Jump to search

work in progress

Phase 1[edit]

  • provide multiple item submission (batch import) for local Endnote files, bibtex files and WoS records, containing more than one reference.

following formats have to be supported: endnote, escidoc xml, bibtex, wos, RIS--Ulla 12:58, 24 February 2009 (UTC)

  • For endnote, consider files in various versions: Either version 1.x-7 or verion 8.x
    • encoding of files depends on endnote version: 1.x to 7 support ASCII, 8.x support UTF8
    • Mapping to PubMan Genres depends on endnote version (different mappings needed)

First Prio: Endnote version in use by ICE and MPI pflanze--Ulla 12:58, 24 February 2009 (UTC)

  • For WoS, consider:
    • include "times cited"? (AEI)
      Please note, that this information is not stable, but evolves over time. Newly published articles are not cited at all --Inga 10:25, 28 July 2008 (UTC)

exactly, rather provide feature by look-up service--Ulla 12:58, 24 February 2009 (UTC)

  • for BibTeX consider, that the record can contain an URL tag, which points to the fulltext belonging to the bibliographic record, which should be uploaded to PubMan (see also BibTeX maping)

Important scenario for ICE: references are maintained in Endnote and ingested from time to time to PubMan. Therefore, use case has to include:

  • decision by user if ingest

a) creates new data b) overwrites existing data (based on local Endnote ID)

to be clarified:

  • in case endnote import contains both new references and modified references...can user select "last modified entries" in his local endnote library to import only modified entries to pubman?
  • is institute aware that PubMan is richer than endnote, ie. additional data submissions/modifications might have to be done on PubMan?--Ulla 12:58, 24 February 2009 (UTC)

Phase 2[edit]

duplicate identification, duplicate handling

basic duplicate identification (based on ID) should be part of Phase 1--Ulla 12:59, 24 February 2009 (UTC)

Phase 3[edit]

workflow based ingestion, incl. task manager and processing of ingested items

Functional specification[edit]

UC_PM_IN_01 import file in structured format[edit]

In order to save manual typing for example, the user wants to upload a file in a structured format such as BibTeX, EndNote Export Format or RIS.

we should refer here to a separate page, where we list all supported structured formats. (incl. escidoc xml, WOS) and their respective mappings. so we do not have to update the use case each time we provide additional format--Ulla 17:50, 15 March 2009 (UTC)

Status/Schedule[edit]

  • status: in specification
  • schedule: R 5

Triggers[edit]

  • the user wants to upload a file in structured format in order to create eSciDoc items

Actors[edit]

  • Import manager

Pre-Conditions[edit]

  • Target collection has to be selected.
  • validation rules for import have to be selected.

What kind of validation rules are meant?--Ulla 17:48, 15 March 2009 (UTC)

Flow of events[edit]

  • 1. The user starts to import a file to the system
    • 1.1 The system prompts for the path of the import file. And the specification of the Import Format (BibTeX, EndNote Export Format, RIS). Further more the user can specify if there are customizable fields in the import file and where the values of them should be mapped to in PubMan.
    • 1.2 The user enters the path to the file, specifies the import format and confirms the input.
    • 1.3 The system import the file.
    • 1.4 The system checks the size of the file against the defined maximum file size and the import format.

Do we have a defined maximum size?--Ulla 17:51, 15 March 2009 (UTC)

      • 1.4.1a. The file size is less or equal to the maximum file size.
      • 1.4.1b. The import format is valid.
      • 1.4.2a. The file size is greater than the maximum file size. The system discards the file and displays an error message. Continue with 1.
      • 1.4.2b. The import format is invalid. The system discards the file and displays an error message. Continue with Step 1.
    • 1.5 The system creates new PubMan entries in status pending and displays them after the successful import in the import manager workspace.

The error handling in case the import fails is not clear to me. User needs clear indication

  • why import failed
  • in case import failed, complete import should be cancelled. I.e. user has to re-start complete import. (I assume, we do not import some items, some not.)--Ulla 18:00, 15 March 2009 (UTC)

Post conditions[edit]

New PubMan Entries have been created with the information on when they have been imported and information on the import source.

Continue with UC_PM_IN_02.

Future development[edit]

  • Duplicate checking
  • Automatic upload (give the URL of the server from where to get the data on a regular basis).