EsciDoc Item Origin

From MPDLMediaWiki
Jump to navigation Jump to search

When fetching data from an external service it is desirable to store the information where and when this data was fetched. For long term archiving aspects it is desirable to store the original fetched metadata record, as well.


Four approaches are still under discussion:

  • Store the fetched data as own item and relate the corresponding escidoc item.
    • Pro: Version, retrieval (only needed data is retrieved), source indicated in relation
    • Con: More complex, LTA Interface: Perhaps problems, more complex when exporting our data
    • Open: Where to store origin MD (PREMIS?) in origin or escidoc item. How to handle the relation when an item is updated?
  • Store the fetched data as component of the created escidoc item
    • Pro: Metadata can be stored in different formats, not only xml
    • Con: Logic changes needed (invisible component)
    • Open:
rejected due to item overload
  • Create additional metadata record where this info is stored
    • Pro: MD is stored directly in item
    • Con: Increasing FoXML size due to versioning
    • Open: How can we handle fetched components (full texts)?
  • Store all data in one item (fetched data is first version, escidoc data second etc.)
    • Pro: No overhead, Easy retrieval of data.
    • Con:
    • Open:
rejected, as there is no clear way how to match always to version 1

I would actually mostly vote for one of two alternatives: additional metadata record or additional component; where additional metadata record is logically better approach; However, here we have technical issues with large FoXMLs and visioning. Therefore, probably more pragmatical approach would be to store it as a component with special content category. The content of this component should be XML i.e. a Premis metadata record that provides information on the original identifier, original repository, time of creation etc. this could be more detailed including the metadata on rights, etc. see Ulla's remark below. --Natasa 15:08, 26 February 2009 (UTC)

Why we want to store origin information[edit]

  • Storing of origin information as legal issue
  • Source depended BibTex export
  • Different visualization on GUI for imported items?
  • Source depended rights handling
  • Maintenance issues (richer transformation after mapping rework)


  • Do we handle Metadata and full texts differently?
  • In case fetched data are transformed a) into escidoc items and/or b) into other format (e.g. part of authoring tools), the idea of storing original data related to escidoc data sounds good to me. In any case, information on date and source should be kept, just in case we have to legitimate the quality of fetched data. In addition, we might get confronted with rights information related to the source, which has to be stored as well (e.g. constraints for re-use by source, or indication of ownership)--Ulla 13:37, 26 February 2009 (UTC)

Possible workflows[edit]

MPI ICE Workflow[edit]

  • All data are stored in a big EndNote file
  • The original data (endnote) has to be stored
  • Duplicate check necessary (on endnote or item level?)