Difference between revisions of "Talk:PubMan Func Spec Ingestion"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 46: Line 46:
# The user chooses a collection where he has depositor privileges ''and where the collection settings allow multiple upload of items? --[[User:Natasab|Natasa]] 12:47, 22 January 2008 (CET)''
# The user chooses a collection where he has depositor privileges ''and where the collection settings allow multiple upload of items? --[[User:Natasab|Natasa]] 12:47, 22 January 2008 (CET)''
# The user indicates which format and which version (of EndNote) he is using (see constraints below)
# The user indicates which format and which version (of EndNote) he is using (see constraints below)
# (optionally) The user decides to de-activate the validation rules for the upload (i.e the validation rules defined for validation point submission for the collection.)      ''[Nicole]: I would suggest to have one collection per institute for ingests. This collection will have no validation point for submit, but for modify item.''
# (optionally) The user decides to de-activate the validation rules for the upload (i.e the validation rules defined for validation point submission for the collection.)      ''[Nicole]: I would suggest to have one collection per institute for ingests. This collection will have no validation point for submit, but for modify item.'' ''[Natasa]: Agreed, see also the modification in the second step for this issue --[[User:Natasab|Natasa]] 12:47, 22 January 2008 (CET)''
''[Natasa]: Agreed, see also the modification in the second step for this issue --[[User:Natasab|Natasa]] 12:47, 22 January 2008 (CET)''
# The user uploads the file. ''The system uploads the file, creates items and releases them  --[[User:Natasab|Natasa]] 12:47, 22 January 2008 (CET)''
# The user uploads the file.
#''The system uploads the file, creates items and releases them  --[[User:Natasab|Natasa]] 12:47, 22 January 2008 (CET)''
# The user gets a success message.
# The user gets a success message.



Revision as of 11:48, 22 January 2008

work in progress

Implementation approach:

Note: is currently based on assumption that no workflow engine will be implemented to support more complex ingestion tasks/processing of items

Phase 1[edit]

  • provide multiple item submission (batch import) for local Endnote files and WoS records
  • eDoc format? eSciDoc xml?
  • ingestion done by depositor for any collection where he has depositing rights
  • simple workflow: submit and immediate release
  • no duplicate checking (?, maybe identification?)
  • supported EndNote versions: up to 6, 6.x
  • validation rules?
  • check pubmed as possible provider?
  • no provision of mapping of customizable endnote fields to escidoc

Phase 2[edit]

  • fetch metadata from external system by providing external identifier (arXiv)=> OAI-PMH?
  • provide BibTeX import (generic styles possible?)
  • automatic fetch fulltext from external system based on fulltext locator

Phase 3[edit]

duplicate identification, duplicate handling workflow based ingestion, incl. task manager and processing of ingested items

Comments on Functional specification[edit]

in progress!

UC_PM_ING_01 upload file in structured format[edit]

Status/Schedule[edit]

  • Status: in specification
  • Schedule:R3

Motivation[edit]

  • The user wants to upload a locally created EndNote file or a reference file from Web of Science, containing one or more references.

Expected ouctome[edit]

References are batch uploaded to a collection on Pubman and are immmediately released.

The items created on PubMan can be edited/modified afterwards.

Actors[edit]

  • Depositor

Steps[edit]

  1. The user chooses to upload a file in structured format.
  2. The user chooses a collection where he has depositor privileges and where the collection settings allow multiple upload of items? --Natasa 12:47, 22 January 2008 (CET)
  3. The user indicates which format and which version (of EndNote) he is using (see constraints below)
  4. (optionally) The user decides to de-activate the validation rules for the upload (i.e the validation rules defined for validation point submission for the collection.) [Nicole]: I would suggest to have one collection per institute for ingests. This collection will have no validation point for submit, but for modify item. [Natasa]: Agreed, see also the modification in the second step for this issue --Natasa 12:47, 22 January 2008 (CET)
  5. The user uploads the file. The system uploads the file, creates items and releases them --Natasa 12:47, 22 January 2008 (CET)
  6. The user gets a success message.

Alternatives[edit]

6. The user gets an error message, indicating type of error (time out during upload, invalid file, validation rules not met).

6a. User tries the upload again. continue with step 3.

6b. User cancels the upload procedure.

Data involved[edit]

endnote files, from endnote version 1.x to 7.

endnote files, from endnote version 8.x

Reference file from Web of science

=> all files are structured format, .txt. file

Actors involved[edit]

user with depositing rights for at least one collection

Constraints[edit]

  • encoding of files depends on endnote version:

1.x to 7 support ASCII

8.x support UTF8

  • Mapping to PubMan Genres depends on endnote version (different mappings needed)
  • the file upload is only successful, if all references have been uploaded. No "partly" upload possible.

Comments on Abstract Prototype[edit]

In general some more informtation would be good concerning options e.g. Is the list of options final or does it change/grow significantly for forthcoming releases?

A little bit more background on options/criteria would be fine to decide for suitable controls (probably not in this prototype because it is pretty clear here)

e.g.

- is only one option possible/necessary ore more? - estimates on the options would be good: one is important, one might be rarely used, one depends on ... - is it mandatory to choose here explicitly or is it more optionally (pass this with a good default)

Rupert 17:35, 20 December 2007 (CET)