Metadata Encoding and Transmission Standard

From MPDLMediaWiki
Jump to navigation Jump to search


"The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation."[1]

"METS is intended to provide a standardized XML format for transmission of complex digital library objects between systems".[2] One METS file corresponds to one digital object (i.e. one digitized work) and provides separate sections for descriptive metadata, administrative metadata, structural metadata, files and behaviors. The structural parts are directly defined by the METS standard, while the other sections incorporate "extension schemas", e.g. MARC/Dublin Core for descriptive metadata or MIX for technical metadata. METS is very powerful for grouping together various digital items into one research object, e.g. to combine scans and TEI transcription of one work.

METS is highly flexible and allows multiple representations of the same digital object. In particular, METS does not restrict the usage of metadata schemas (it only defines a set of supported schemas = "extension schemas") and the structural maps can be organized in multiple ways. Therefore, the standard itself does not provide interoperability. METS profiles may reduce this problem to a certain extend.


METS structure[edit]

An example METS xml is available from the Fedora homepage[3] and a METS structure diagram is provided as well[4]

Header (metsHdr)[edit]

Information about the METS document itself, e.g. various time stamps and the institutions and/or individuals (agent) involved in creating the package:

<METS:metsHdr ID="BOOK1" CREATEDATE="2007-05-20T06:32:00" LASTMODDATE="2007-05-22T06:32:00" RECORDSTATUS="A">
  <METS:agent>ROLE="CREATOR" TYPE="ORGANIZATION">
    <METS:name>Max Planck Institute for History of European Law</METS:name>
  </METS:agent> 
</METS:metsHdr>

Descriptive Metadata (dmdSec)[edit]

One or several bibliographic records describing the work - in any metadata format. Descriptive metadata might be embedded in the METS object (mdWrap) or stored externally and pointed to (mdRef).

<METS:dmdSec ID="DMD1">
  <mdRef LOCTYPE="URL" MIMETYPE="application/xml" MDTYPE="OTHER" 
  LABEL="MAB record"></mdRef>
</METS:dmdSec>
<METS:dmdSec ID="DMD2">
  <mdWrap MIMETYPE="application/mab" MDTYPE="OTHER" LABEL="MAB Record">
    <binData>0471nM2.01010024      h001 66230�002 19941207000000.0�003 20070608000000.0�030 zz5d||rz||||7�050 ||||||||||||||�051 n||||||�077 �c0�100 Oertel, Christian Gottfried�331 VollstÉandiges corpus gravaminum evangelicorum�359 An das Licht gestellet von Christian Gottfried Oertel�410aRegensburg�412aNeubauer�501 Erschienen: 1 (1771) - [8] (1775). -Bd. [8] im Verl. Montag, Regensburg, erschienen�710 Corpus Evangelicorum / Gravamen�902   |Corpus Evangelicorum�902   |Gravamen�� 
    </binData>
  </mdWrap>
</METS:dmdSec>
<METS:dmdSec ID="DMD3">
  <mdWrap MIMETYPE="text/xml" MDTYPE="DC" LABEL="Dublin Core Metadata">
    <xmlData>
      <dc:title>Vollständiges corpus gravaminum evangelicorum�</dc:title>
      <dc:creator>Oertel, Christian Gottfried</dc:creator>
      <dc:date>1 (1771) - [8] (1775)</dc:date>
      <dc:publisher>Montag, Regensburg</dc:publisher>
      <dc:type>text</dc:type>
    </xmlData>
  </mdWrap>
</METS:dmdSec>

Administrative Metadata (amdSec)[edit]

A collection of administrative metadata available for a METS document and/or its components. This can be:

  1. technical metadata (techMD): information regarding the file, e.g. compression, bit depth, etc.
  2. IPR metadata (rightsMD): copyright and/or license statement
  3. source metadata (sourceMD): "descriptive and administrative metadata regarding the analog source from which a digital library object derives".[5]
  4. digital provenance metadata (digiprovMD): "information regarding source/destination relationships between files".[5]

Again, the information can be embedded (mdWrap) or just pointed to (mdRef).

<METS:amdSec>
  <METS:techMD ID="TMD1">
    <METS:mdWrap MDTYPE="OTHER" MIMETYPE="text/xml" OTHERMDTYPE="TECHMD">
      <METS:xmlData>
        <techmd:compression NAME="LZW"/>
        <techmd:image>
          <techmd:bitDepth>24</techmd:bitDepth>
          <techmd:storage PLANARCONFIGURATION="UNKNOWN" SEGMENT="STRIP"/>
          [...]
        </techmd:image>
      </METS:xmlData>
    </METS:mdWrap>
  </METS:techMD>
  <METS:rightsMD ID="RMD1">
    <METS:mdWrap MDTYPE="OTHER" MIMETYPE="text/xml" OTHERMDTYPE="RIGHTSMD">
      <METS:xmlData>
        <rightsmd:versionStatement>Copyright by MPIeR</rightsmd:versionStatement>
      </METS:xmlData>
    </METS:mdWrap>
  </METS:rightsMD>
</METS:amdSec>

File List (fileSec)[edit]

The file list is the inventory of all files which comprise the digital object. This section is not repeatable, thus each file is listed once and then referenced from the structural map. The inventory arranges the files into groups (fileGrp), which may represent the hierarchy of the document. Every file element (file) may optionally reference descriptive as well as administrative metadata.

The content data streams may be referenced (xlink:href) or embedded in the METS document.

<METS:fileSec>
  <METS:fileGrp ID="DATASTREAMS">
    <METS:fileGrp ID="DS1" USE="MASTER IMAGE">
      <METS:file ID="DS1.0" CREATED="2007-05-20T06:32:00" MIMETYPE="image/tiff" SIZE="8238866"
        ADMID="TMD1 RMD1" DMDID="DMD1" OWNERID="E">
        <METS:FLocat LOCTYPE="URL" xlink:href="http://www.escidoc.mpg.de/virr/12433.tiff"/>
      </METS:file>
      [...]
    </METS:fileGrp>
    <METS:fileGrp ID="DS2" USE="text/tei">
      <METS:file ID="DS2.0" CREATED="2007-10-20T06:32:00" MIMETYPE="text/xml" SIZE="7343"
        ADMID="RMD1" DMDID="DMD1" OWNERID="X">
         <METS:FLocat LOCTYPE="URL" xlink:href="http://www.escidoc.mpg.de/virrbeame.xml"/>
      </METS:file>
      [...]
    </METS:fileGrp>
  </METS:fileGrp>
</METS:fileSec>

Structural Map (structMap)[edit]

A representation of the complete object modeled as tree structure. "The structural map is the heart of a METS document, defining the hierarchical arrangement of a primary source document which has been digitized. This hierarchy is encoded as a tree of div elements. Any given div can point to another METS document via the mptr element, or to a single file, to a group of files, or to segments of individual files or groups of files through the fptr and subsidiary elements."[2]

Maps might focus one the physical composition of the digitized work (e.g. a series with books with pages, etc.) or the intellectual structure of the work (e.g. a book with table of contents, chapters, etc.). The type attribute specifies which kind of structural map is provided.

The div element supports parallel number via the ORDER, ORDERLABEL, and LABEL attributes. "[...] imagine a text with 10 roman numbered pages followed by 10 arabic numbered pages. Page iii would have an ORDER of '3', an ORDERLABEL of 'iii'; and a LABEL of 'Page iii';, while page 3 would have an ORDER of '13';, an ORDERLABEL of '3'; and a LABEL of 'Page 3'".[2]

<METS:structMap TYPE="physical">
  <METS:div TYPE="multiVolume" LABEL="Vollstaendiges corpus gravaminum evangelicorum">
    <METS:div TYPE="book" LABEL="Vollstaendiges corpus gravaminum evangelicorum, Band 1" ORDERLABEL="Band 1" ORDER="1">
      <METS:div TYPE="page" LABEL="Blank page" ORDER="1"></METS:div>
      <METS:div TYPE="page" LABEL="Page i: Half title page" ORDERLABEL="i" ORDER="2">
        <METS:fptr FILEID="DS1.0"/>
        <METS:fptr FILEID="DS2.0"/>
      </METS:div>
      <METS:div TYPE="page" LABEL="Page ii: Blank page" ORDERLABEL="ii" ORDER="3"></METS:div>
      [...]
    </METS:div>
  </METS:div>
</METS:structMap>

Structural Link (structLink)[edit]

"The Structural Links section of METS allows METS creators to record the existence of hyperlinks between nodes in the hierarchy outlined in the Structural Map. This is of particular value in using METS to archive Websites."[5]

Note: Probably not required within the ViRR context

Behaviors[edit]

"A behavior section can be used to associate executable behaviors with content in the METS object."[5]


Tools for METS generation[edit]

An overview of METS tools is available on the METS homepage

  • 7Train - "an XSLT 2.0 tool for generating METS files from XML input. It builds the basic METS structure so that the user can worry about what is specific to the user's project. Includes examples for generating METS from OAI and CONTENTdm records."
  • METS Java Toolkit - "for the procedural construction, validation, and marshalling and unmarshalling for METS"


METS profiles[edit]

"METS Profiles are intended to describe a class of METS documents in sufficient detail to provide both document authors and programmers the guidance they require to create and process METS documents conforming with a particular profile."[6]

METS profiles define the use of extension schema, rules of description and specify the technical characteristics. METS profiles allow implementers to reduce the flexibility to those constraints they would like to support.


References[edit]

Further METS examples and documents[edit]