ESciDoc Component Checksum Calculation

From MPDLMediaWiki
Jump to: navigation, search

Checksum calculation of eSciDocEnhanced Scientific Documentation component is important from several aspects:

  • Long-term preservation aspects
  • to ensure that users will get exactly the content that is stored in a eSciDocEnhanced Scientific Documentation repository instance

eSciDocEnhanced Scientific Documentation Object manager core service must make sure that the checksum is calculated for each internally managed content of an eSciDocEnhanced Scientific Documentation component.

Scenarios

  • A checksum is always calculated for the newly created files
  • A checksum is always calculated when file content is newly uploaded thus replacing previous content

Post-condition

  • The internally managed content checksum and the algorithm used is delivered with the component properties, or within the content properties

eSciDocEnhanced Scientific Documentation and FedoraFlexible Extensible Digital Object Repository Architecture checksum built-in functionality

FedoraFlexible Extensible Digital Object Repository Architecture enables calculation of the checksum for each FedoraFlexible Extensible Digital Object Repository Architecture datastream in the repository. The default configuration of FedoraFlexible Extensible Digital Object Repository Architecture does not perform the checksum calculation of FedoraFlexible Extensible Digital Object Repository Architecture object datastreams.

For more details see Fedora checksum calculation

However, there are two options:

  • overwrite default FedoraFlexible Extensible Digital Object Repository Architecture checksum operation (as described in FedoraFlexible Extensible Digital Object Repository Architecture documentation)
  • provide separate checksum operation from within internal logic of the Object manager component


Successfully tested to enable checksum calculation for content datastream in FedoraFlexible Extensible Digital Object Repository Architecture. It should be discussed how the checksum is delivered inside the component. If the checksum method (e.g. MD5, SHA-1) is delivered inside the component it is easy to implement setting the method creating an object; it may be possible to change the method with every content update. Frank 13:26, 12 March 2009 (UTCCoordinated Universal Time)

Checksum and method come with eSciDocEnhanced Scientific Documentation Infrastructure for Component Content and will be displayed with two additional elements inside Component Properties:

...
<properties>
  ...
  <prop:checksum>ed7d42ce826da388e64e3cdbf62ae1f1</prop:checksum>
  <prop:checksum-algorithm>md5</prop:checksum-algorithm>
  ...
</properties>
<content
    xlink:type="simple" 
    xlink:title="Content escidoc:7" 
    xlink:href="/ir/item/escidoc:8/components/component/escidoc:7/content" 
    storage="internal-managed"
    />
...

Checksum Algorithms

Questions

  • Some systems create checksums calculated by several different algorithms at the same time. Is it important also for eSciDocEnhanced Scientific Documentation Repository?
  • Is checksum also needed to prevent malicious manipulation? If so, signatures over these checksums are needed assuming someone who is able to change content in the storage backend may be also able to change FOXMLFedora Object XML in the storage backend. Frank 13:12, 12 March 2009 (UTCCoordinated Universal Time)
no requirement so far --Natasa 10:41, 30 March 2009 (UTCCoordinated Universal Time)