Difference between revisions of "ESciDoc Ingest Tool"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 30: Line 30:
* Shall the Ingest Tool support a stylesheet transformation from input format to item XML?
* Shall the Ingest Tool support a stylesheet transformation from input format to item XML?


[[Category:ESciDoc| ]]
[[Category:ESciDoc]]
[[Category:Projects]]
[[Category:Projects]]

Revision as of 14:29, 30 April 2008

Functional and Technical Requirements for an eSciDoc Ingest Tool[edit]

Being able to ingest large numbers (100,000 - 200,000) of objects is crucial for the successful roll-out of most eSciDoc Solutions. Nearly all the data has to be migrated/ingested by the end of May 2008. Therefore, we need to provide a working solutions asap., which will help to meet the goal of a timely roll-out, and a much improved version of the ingest tool after the initial productive deployment of PubMan and other solutions. Therefore, we splitted up the planning into short-term and mid-term goals.

Improved Performance of Object Manager[edit]

Short-term Goals[edit]

  • Ingest rate of 1 item/s or ~ 80,000 objects/day (with 2 components and one descriptive MD record besides DC)
  • Create a separate ingest method in Object Manager which allows for
    • the explicit setting of the item status (pendig, submitted, released)
    • inclusion of existing PIDs
    • delayed synch of the triple store

Mid-term Goals[edit]

  • Ingest rate of 5 items/s or ~ 400.000 objects/day (with 2 components and one descriptive MD record besides DC)
  • Automatic transformation of item XML into separate FOXML 1.1 objects

Ingest Tool[edit]

Short-term Goals[edit]

  • Read in validated item XML files from local disc and ingest them into Object Manager
  • Item XML must include proper references to Context, Creator, and Content Type (mra: is this sufficient? correct?)

Mid-term Goals[edit]

  • Accept objects from other sources than just local disc
  • Accept more formats than just item XML
  • Better support for collections

Open Questions[edit]

  • How to handle dates (should the original creation date be maintained as property)?
  • Shall the Ingest Tool support a stylesheet transformation from input format to item XML?