Difference between revisions of "ESciDoc Ingest Tool"
Jump to navigation
Jump to search
Line 30: | Line 30: | ||
* Shall the Ingest Tool support a stylesheet transformation from input format to item XML? | * Shall the Ingest Tool support a stylesheet transformation from input format to item XML? | ||
[[Category:ESciDoc | [[Category:ESciDoc]] | ||
[[Category:Projects]] | [[Category:Projects]] |
Revision as of 14:29, 30 April 2008
Functional and Technical Requirements for an eSciDoc Ingest Tool[edit]
Being able to ingest large numbers (100,000 - 200,000) of objects is crucial for the successful roll-out of most eSciDoc Solutions. Nearly all the data has to be migrated/ingested by the end of May 2008. Therefore, we need to provide a working solutions asap., which will help to meet the goal of a timely roll-out, and a much improved version of the ingest tool after the initial productive deployment of PubMan and other solutions. Therefore, we splitted up the planning into short-term and mid-term goals.
Improved Performance of Object Manager[edit]
Short-term Goals[edit]
- Ingest rate of 1 item/s or ~ 80,000 objects/day (with 2 components and one descriptive MD record besides DC)
- Create a separate ingest method in Object Manager which allows for
- the explicit setting of the item status (pendig, submitted, released)
- inclusion of existing PIDs
- delayed synch of the triple store
Mid-term Goals[edit]
- Ingest rate of 5 items/s or ~ 400.000 objects/day (with 2 components and one descriptive MD record besides DC)
- Automatic transformation of item XML into separate FOXML 1.1 objects
Ingest Tool[edit]
Short-term Goals[edit]
- Read in validated item XML files from local disc and ingest them into Object Manager
- Item XML must include proper references to Context, Creator, and Content Type (mra: is this sufficient? correct?)
Mid-term Goals[edit]
- Accept objects from other sources than just local disc
- Accept more formats than just item XML
- Better support for collections
Open Questions[edit]
- How to handle dates (should the original creation date be maintained as property)?
- Shall the Ingest Tool support a stylesheet transformation from input format to item XML?