Difference between revisions of "ESciDoc Component Checksum Calculation"

From MPDLMediaWiki
Jump to navigation Jump to search
m
 
(7 intermediate revisions by 3 users not shown)
Line 10: Line 10:


==Post-condition==
==Post-condition==
*A checksum for internally managed content is available latest before an item is released
*The internally managed content checksum and the algorithm used is delivered with the component properties, or within the content properties
*The internally managed content checksum and the algorithm used is delivered with the component properties, or within the content properties


Line 26: Line 25:
Successfully tested to enable checksum calculation for content datastream in Fedora. It should be discussed how the checksum is delivered inside the component. If the checksum method (e.g. MD5, SHA-1) is delivered inside the component it is easy to implement setting the method creating an object; it may be possible to change the method with every content update. [[User:Frank|Frank]] 13:26, 12 March 2009 (UTC)
Successfully tested to enable checksum calculation for content datastream in Fedora. It should be discussed how the checksum is delivered inside the component. If the checksum method (e.g. MD5, SHA-1) is delivered inside the component it is easy to implement setting the method creating an object; it may be possible to change the method with every content update. [[User:Frank|Frank]] 13:26, 12 March 2009 (UTC)


Content may be displayed inside the component as:
Checksum and method come with eSciDoc Infrastructure for Component Content and will be displayed with two additional elements inside Component Properties:
 
...
<properties>
  ...
  <prop:checksum>ed7d42ce826da388e64e3cdbf62ae1f1</prop:checksum>
  <prop:checksum-algorithm>md5</prop:checksum-algorithm>
  ...
</properties>
  <content
  <content
     xlink:type="simple"  
     xlink:type="simple"  
Line 32: Line 39:
     xlink:href="/ir/item/escidoc:8/components/component/escidoc:7/content"  
     xlink:href="/ir/item/escidoc:8/components/component/escidoc:7/content"  
     storage="internal-managed"
     storage="internal-managed"
AND
    checksum="md5:ed7d42ce826da388e64e3cdbf62ae1f1"
OR
    checksum="ed7d42ce826da388e64e3cdbf62ae1f1"
    checksum-method="MD5"
AND
     />
     />
[[User:Frank|Frank]] 13:26, 12 March 2009 (UTC)
...


==Checksum Algorithms==
==Checksum Algorithms==
Line 45: Line 46:
*see [http://en.wikipedia.org/wiki/Cryptographic_hash_function Hash functions overview]
*see [http://en.wikipedia.org/wiki/Cryptographic_hash_function Hash functions overview]
*see [http://en.wikipedia.org/wiki/MD5 MD5 algorithm]
*see [http://en.wikipedia.org/wiki/MD5 MD5 algorithm]
[[Category:ESciDoc]]


==Questions==
==Questions==
Line 52: Line 52:


* Is checksum also needed to prevent malicious manipulation? If so, signatures over these checksums are needed assuming someone who is able to change content in the storage backend may be also able to change FOXML in the storage backend. [[User:Frank|Frank]] 13:12, 12 March 2009 (UTC)
* Is checksum also needed to prevent malicious manipulation? If so, signatures over these checksums are needed assuming someone who is able to change content in the storage backend may be also able to change FOXML in the storage backend. [[User:Frank|Frank]] 13:12, 12 March 2009 (UTC)
:no requirement so far --[[User:Natasab|Natasa]] 10:41, 30 March 2009 (UTC)
[[Category:ESciDoc|Component Checksum Calculation]]

Latest revision as of 09:38, 7 January 2011

Checksum calculation of eSciDoc component is important from several aspects:

  • Long-term preservation aspects
  • to ensure that users will get exactly the content that is stored in a eSciDoc repository instance

eSciDoc Object manager core service must make sure that the checksum is calculated for each internally managed content of an eSciDoc component.

Scenarios[edit]

  • A checksum is always calculated for the newly created files
  • A checksum is always calculated when file content is newly uploaded thus replacing previous content

Post-condition[edit]

  • The internally managed content checksum and the algorithm used is delivered with the component properties, or within the content properties

eSciDoc and Fedora checksum built-in functionality[edit]

Fedora enables calculation of the checksum for each Fedora datastream in the repository. The default configuration of Fedora does not perform the checksum calculation of Fedora object datastreams.

For more details see Fedora checksum calculation

However, there are two options:

  • overwrite default Fedora checksum operation (as described in Fedora documentation)
  • provide separate checksum operation from within internal logic of the Object manager component


Successfully tested to enable checksum calculation for content datastream in Fedora. It should be discussed how the checksum is delivered inside the component. If the checksum method (e.g. MD5, SHA-1) is delivered inside the component it is easy to implement setting the method creating an object; it may be possible to change the method with every content update. Frank 13:26, 12 March 2009 (UTC)

Checksum and method come with eSciDoc Infrastructure for Component Content and will be displayed with two additional elements inside Component Properties:

...
<properties>
  ...
  <prop:checksum>ed7d42ce826da388e64e3cdbf62ae1f1</prop:checksum>
  <prop:checksum-algorithm>md5</prop:checksum-algorithm>
  ...
</properties>
<content
    xlink:type="simple" 
    xlink:title="Content escidoc:7" 
    xlink:href="/ir/item/escidoc:8/components/component/escidoc:7/content" 
    storage="internal-managed"
    />
...

Checksum Algorithms[edit]

Questions[edit]

  • Some systems create checksums calculated by several different algorithms at the same time. Is it important also for eSciDoc Repository?
  • Is checksum also needed to prevent malicious manipulation? If so, signatures over these checksums are needed assuming someone who is able to change content in the storage backend may be also able to change FOXML in the storage backend. Frank 13:12, 12 March 2009 (UTC)
no requirement so far --Natasa 10:41, 30 March 2009 (UTC)