Difference between revisions of "Talk:PubMan Func Spec Ingestion"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 110: Line 110:


===Flow of events===
===Flow of events===
*1. The user selects one or more items from the import workspace and specifies
*1. The user selects one or more items from the import workspace, which s/he would like to release.
**1.1. The system performs a duplicate check on item level and shows the user the possible duplicates.
*2. The system provides the user with an interface, on which s/he can specify which item set s/he would like to batch release.
***1.1.1 No duplicates have been found: the system releases the imported items. The use case ends successfully.
*3. The user selects the item set, which s/he would like to release. Only items from the same context are allowed for a set.
 
*4. The system checks the items against the validation rules, provided for the context.
Isn't that depending on the workflow for the target context, i.e. simple or standard workflow?--[[User:Uat|Ulla]] 18:04, 15 March 2009 (UTC)
**4.1 The items are valid. Continue with step 5.
:: Don't get the point.
**4.2 One or more items are invalid. The system shows the invalid items and gives the user the possibility to edit them. Continue with step 4.
***1.1.2 Possible duplicates have been found: the system provides the user with the following possibilities to proceed.
*5. One or more items have been released.
****a) create new MD for one or more items
****b) create new revision for one or more items
****c) cancel the batch release.
****d) remove one or more items from the batch operation
 
On which basis do we detect possible duplicates? on identifier (e.g. WOS/ISI identifier, on local identifier, if provided, on title matching?) this is relevant to understand how ICE can do import of endnote references, which have been modified locally, i.e. import only references which have been modified in endnote--[[User:Uat|Ulla]] 18:08, 15 March 2009 (UTC)
**1.2 The system checks if the creators within the imported data already exist within the CoNE Service.
***1.2.1 The creators don't exist in the CoNE Service: New unauthorized entries are being created in CoNE.
***1.2.2 The creators already exist in the CoNE Service: The system displays the possible entry in CoNE. The user can decide to use the CoNE person or to add a new person in CoNE or to add name variants, affiliations etc. to an existing CoNE entry.
**2. One or more items have been released.


==UC_PM_IN_03 batch delete items==
==UC_PM_IN_03 batch delete items==

Revision as of 16:09, 22 March 2009

work in progress

Phase 1[edit]

  • provide multiple item submission (batch import) for local Endnote files, bibtex files and WoS records, containing more than one reference.

following formats have to be supported: endnote, escidoc xml, bibtex, wos, RIS--Ulla 12:58, 24 February 2009 (UTC)

  • For endnote, consider files in various versions: Either version 1.x-7 or verion 8.x
    • encoding of files depends on endnote version: 1.x to 7 support ASCII, 8.x support UTF8
    • Mapping to PubMan Genres depends on endnote version (different mappings needed)

First Prio: Endnote version in use by ICE and MPI pflanze--Ulla 12:58, 24 February 2009 (UTC)

  • For WoS, consider:
    • include "times cited"? (AEI)
      Please note, that this information is not stable, but evolves over time. Newly published articles are not cited at all --Inga 10:25, 28 July 2008 (UTC)

exactly, rather provide feature by look-up service--Ulla 12:58, 24 February 2009 (UTC)

  • for BibTeX consider, that the record can contain an URL tag, which points to the fulltext belonging to the bibliographic record, which should be uploaded to PubMan (see also BibTeX maping)

Important scenario for ICE: references are maintained in Endnote and ingested from time to time to PubMan. Therefore, use case has to include:

  • decision by user if ingest

a) creates new data b) overwrites existing data (based on local Endnote ID)

to be clarified:

  • in case endnote import contains both new references and modified references...can user select "last modified entries" in his local endnote library to import only modified entries to pubman?
  • is institute aware that PubMan is richer than endnote, ie. additional data submissions/modifications might have to be done on PubMan?--Ulla 12:58, 24 February 2009 (UTC)

Phase 2[edit]

duplicate identification, duplicate handling

basic duplicate identification (based on ID) should be part of Phase 1--Ulla 12:59, 24 February 2009 (UTC)

Phase 3[edit]

workflow based ingestion, incl. task manager and processing of ingested items

Functional specification[edit]

UC_PM_IN_01 import file in structured format[edit]

In order to save manual typing for example, the user wants to upload a file in a structured format such as BibTeX, EndNote Export Format or RIS.

we should refer here to a separate page, where we list all supported structured formats. (incl. escidoc xml, WOS) and their respective mappings. so we do not have to update the use case each time we provide additional format--Ulla 17:50, 15 March 2009 (UTC)

Status/Schedule[edit]

  • status: in specification
  • schedule: R 5

Triggers[edit]

  • the user wants to upload a file in structured format in order to create eSciDoc items

Actors[edit]

  • Import manager (?)
  • Moderator (?)

Pre-Conditions[edit]

  • Target collection has to be selected.
  • validation rules for import have to be selected.

What kind of validation rules are meant?--Ulla 17:48, 15 March 2009 (UTC)

Michael always uses very simple, or no validation rules for the import of eDoc items. I suppose for the start this would be easy to use. --Nicole 08:30, 20 March 2009 (UTC)

Flow of events[edit]

  • 1. The user starts to import a file to the system
    • 1.1 The system prompts for the path of the import file, the specification of the Import Format (BibTeX, EndNote Export Format, RIS) and the context to where s/he would like to import the items. Further more the user can specify if there are customizable fields in the import file and where the values of them should be mapped to in PubMan.
    • 1.2 The user enters the path to the file, specifies the import format and confirms the input.
    • 1.3 The system checks the import format and the available Fedora storage availability.
      • 1.3.1 The import format is valid. Continue with step 1.4
      • 1.3.2 The import format is invalid. The system discards the file and displays an error message saying that the import format is invalid and that the user should check the file and try to upload again.
    • 1.4 The file is uploaded to the system.
    • 1.5 The system performs a duplicate check on item level and shows the user the possible duplicates.
      • 1.5.1 No duplicates have been found. Continue with step 1.6
      • 1.5.2 Possible duplicates have been found: the system provides the user with a report of possible duplicates and with the following possibilities to proceed.
        • a) import only the non duplicate items
        • b) create new version for one or more duplicate items and no new version for the new items.
        • c) remove the duplicate items and copy only the new items
        • d) cancel the upload
    • 1.6 The system checks if the creators within the imported data already exist within the CoNE Service.
      • 1.6.1 The creators don't exist in the CoNE Service: New unauthorized entries are being created in CoNE.
      • 1.6.2 The creators already exist in the CoNE Service: The system displays the possible entry in CoNE. The user can decide to use the CoNE person or to add a new person in CoNE or to add name variants, affiliations etc. to an existing CoNE entry.
    • 1.7 The system creates new PubMan entries in status pending and displays them after the successful import in the import manager workspace.

Comment 1: The imported items shall only get into status pending if the functionality "batch release" is available. Otherwise the items should directly be released. --Nicole 08:55, 20 March 2009 (UTC)

Comment 2: There is no maximum file size. The size of the file, that can be uploaded is variable and depends on how many users are trying to upload or are currently uploading at the same time (talked to Willi). So it can be, that it is not possible to upload a file, but after some minutes if there is less traffic it is possible in fact. Don't know how to integrate that into the specification. Any ideas? --Nicole 08:54, 20 March 2009 (UTC)

Comment 3: Please leave 1.6 out if not possible for now. --Nicole 16:01, 22 March 2009 (UTC)

Post conditions[edit]

New PubMan Entries have been created with the information on when they have been imported and information on the import source.

Continue with UC_PM_IN_02.

Future development[edit]

  • Check for CoNE ID
  • Automatic upload (give the URL of the server from where to get the data on a regular basis).
  • Check if journal names within the import file are already in CoNE.

UC_PM_IN_02 Batch release imported items[edit]

The user wants to make especially imported items visible to the public via PubMan and save time.

Status/Schedule[edit]

  • status: in specification
  • schedule: R 5

Triggers[edit]

  • the user wants to release items in order to make them publicly available via PubMan

Actors[edit]

  • Import manager (?)
  • Moderator (?)

Pre-Conditions[edit]

  • One or more items have to be selected (e.g. via basket)
  • validation rules for release item have to be selected.

Flow of events[edit]

  • 1. The user selects one or more items from the import workspace, which s/he would like to release.
  • 2. The system provides the user with an interface, on which s/he can specify which item set s/he would like to batch release.
  • 3. The user selects the item set, which s/he would like to release. Only items from the same context are allowed for a set.
  • 4. The system checks the items against the validation rules, provided for the context.
    • 4.1 The items are valid. Continue with step 5.
    • 4.2 One or more items are invalid. The system shows the invalid items and gives the user the possibility to edit them. Continue with step 4.
  • 5. One or more items have been released.

UC_PM_IN_03 batch delete items[edit]

The user wants to delete several items from the import manager interface, as they where duplicates to items, which already existed in PubMan and are no longer needed.

UC_PM_IN_04 batch attach local tags[edit]

The user wants to assign one or more local tags to a set of items.

UC_PM_IN_05 batch assign organizational units[edit]

The user wants to assign one or more OUs to a set of items.

Future development[edit]

  • Check for new version of item. It should be possible to check if a newer version of the PubMan item has been created at the import source.
  • Regular automated imports including an update of the existing items.