Talk:ESciDoc Container Toc

From MPDLMediaWiki
Jump to navigation Jump to search

Example of a eSciDoc TOC:[edit]

  • The TOC schema does not allow xlink:label which would be good to reference within one document, what we have between logical and physical section. Any chance FIZ can add this?--Kleinfercher 10:04, 23 September 2008 (UTC)
Seems to me related to xlink:type="locator"!? XLink specs claims an extended-type element as parent for a locator-type element. The xlink:label may then be used to define arcs. Maybe the ID attributes of div and ptr can be used to reference within one document. METS smLink is out of the scope of TOC, in my opinion. Frank 11:00, 23 September 2008 (UTC)
  • I would suggest that the toc element also can contain further toc elements. As the schema is for now i have to put the logical and the physical toc section in divs.--Kleinfercher 10:04, 23 September 2008 (UTC)
The structure is intended to contain one TOC. A toc is related to METS struct-map. So, physical and logical should be defined as two different TOCs. Frank 11:00, 23 September 2008 (UTC)
True, the structure contains only 1 TOC (one root element). All below will be divided to switch between physical and logical. Agreed also on internal meeting Rike, Markus, Natasa --Natasa 10:08, 9 October 2008 (UTC)

Relation between TOC, Containers and Items for different use cases[edit]

  • Proposal: During ingestion a default TOC Object is generated with all the information about the physical section. Additionally a default logical section is generated with the book as the root element (type=monograph), as a user will not create the book as the first structural element.--Kleinfercher 13:43, 23 September 2008 (UTC)

ViRR as new TOC use case[edit]

We have discovered a new use case (coming up from VIRR project, for more info please check VIRR Pages).

Deep-level TOC[edit]

For table of contents (the TOC for e.g. scanned books should contain the whole structure of a container i.e. there is one single TOC for the whole top-level container e.g. book)

Isn't the requirement of an "deep-level TOC" for the top-level container an indicator that a digitized book should be better represented as an individual item? I'm not sure if the same requirement would be true for other kinds of containers as well. In addition, the idea of a "deep-level TOC" sounds to me like container sub-structure is no longer used as the main browse entry, but the TOC is. Keeping TOC and container structure in sync could create a massive overhead (see alternatives provided below) and I would recommend to avoid these dependencies at this point in time --Inga 23:00, 15 March 2008 (CET)
I really agree with that concern about container sub-structure. This should certainly be discussed once more. Frank 10:52, 17 March 2008 (CET)
Frank,Inga, i think you are right! See below the "Attempt to map to OAI-ORE" :) --Natasa 18:48, 17 March 2008 (CET)

Synchronization of TOC objects with the Container structure[edit]

  • alternative: generated TOC is not automatically in sync with the real structure of the container. Utility for validating the TOC structure based on the container structure can be provided. The modification of a container is not automatically changing/invalidating the TOC.
  • alternative: generated TOC is always in sync with the real structure of the container. Modification of a container validates related TOC objects (is not modifying them but marks them as invalid). Users who are then responsible for TOC object editing should re-generate the TOC object by using a system utility.
  • alternative: generated TOC is always in sync with the real structure of the container. Modification of a container validates related TOC objects, removes nodes no longer existing in the container structure and adds nodes new in the container structure. Users who are then responsible for TOC object editing may manually remove TOC object node(s) if they do want some object nodes NOT to appear in the Toc after the automatic generation.


I think, it's agreed that the infrastructure validates the TOC (no reference to a resource outside the container hierarchy is allowed) and modifies the TOC when members are added to or removed from the contianer hierarchy. If entries are added to the TOC automatically, they should be marked as invisible. Frank 15:34, 17 March 2008 (CET)

Separating TOC object from Container[edit]

We would probably need to remove the TOC datastream from Containers (as it is valid for a single level only) and count with a separate "TOC object"

In case we create separate TOC objects that can be related to a container, we should also provide a utility to "generate initial TOC" based on the Container structure (starting from a specific container node and traversing through all other nodes). The utility in fact can generate (RSS lists as depicted already with the toc-view? METS document?).

TOC as user-readable structural map[edit]

We would probably need to reconsider current TOC as being treated as "user-readable structural map" of a container (actually the RSS channel is very good idea for this)

What is meant with "user-reabable"? --Inga 23:00, 15 March 2008 (CET)

Logical order[edit]

Another requirement coming up from the same project: we need to establish "logical" order of elements in the container (in addition to the "structural" grouping. The logical order should depict something like book with book pages (both sides of a page in case the page is printed only one-sided and on the back side there are some remarks, comments):

  • previous object
  • current object
    • current object-recto
    • current object-verso
  • next object
    • next object-recto
    • next object-verson

In a real example, this would mean that the logical order is the "labeling" (in the example marked with bold), and the physical order is the "numbered order as understood by a machine" (in the example marked with italic) e.g. :

Example 1:

  • Book (top level node, no labeling, no numbering yet)
    • Book Chapter I - 1
    • Book Chapter II - 2
      • Page A - 2.1
      • Page B - 2.2
      • Page B-1 - 2.3
    • Book Chapter III -3

Note: user requirement when navigating is to be able to directly go to page "B-1", or to list pages A through C.

Example 2:

  • Encyclopedia (top level node, no labeling, no numbering yet)
    • Part A-M - 1
    • Part N-T - 2
    • Part U-Z - 3

Note: user requirement when navigating is to be able to directly go to part "A-M" (to list parts A through Z would be to my understanding too ambitious, but listing parts A-M through N-T can be requirement)

Example 3a:

  • Architectural drawings collection of N.N. (top level node, no labeling, no numbering yet)
      • Drawing XII - 1.1 - foreface - recto
      • Drawing XII - 1.2 - backface - verso

Example 3b:

  • Architectural drawings collection of N.N. (top level node, no labeling, no numbering yet)
      • Drawing XII - 1 - (image of foreface, image of backface)


Note: example 3 (a, b) is here not very well thought trough: as we are not yet certain if both foreface and backface would be considered as separate objects like depicted above or as separate components of a same object.

Whether we do it in a container or we try to depict it with the TOC object only - it is something we need to clarify.

Are digitized books (ViRR) a good container show case?[edit]

Following conclusions have been copied from User:Inga/container_tocs and only represent my personal view.

  • Considering the ViRR requirements I believe that digitized books are strong entities and the individual scanned-in pages are no independent resources. Thus every change in the description of one page or in the table of contents should create a new version of the book. Therefore, I would suggest to implement digitized books as items with an structural map datastream. This would also help us providing METS exports at a later stage because we would operate on the same granularity. Anyway, the ViRR project still could be a test bed for containers, because books need to be grouped to multivolumes as well as books and multivolumes need to be grouped in the ViRR collection. Anyway, on that level "no deep level TOC" is required, it's fine to provide a grouped list of direct members first.
    Note: Frank pointed to the fact that this approach would conflict with the basic principle that eSciDoc items represent the smallest logical units (= digitized page). Anyway, with introducing the METS format the "world of digitizers" point out that the book is their most important entity.
But of course they have different representations for a "book part". The DFG-Viewer-METS format shows me that they actually did grouping on page level: images with different resolution for one page. Frank 11:45, 18 March 2008 (CET)
Yes, but all of this is done within one object without any need to split the information to several components. --Inga 12:33, 25 March 2008 (CET)
In this context - does it mean we will actually have a Container or Item with 3 components (at the moment in Faces, VIRR we put as Item with 3 components :) ..--Natasa 12:42, 18 March 2008 (CET)
  • The TOC is an optional, but integral component/member of a digitized book
    =>Changes in the TOC object should version the container object in any case
  • Members are independent from their container(s), thus each item can be member of many containers.
    => In cases where users would like to provide an additional TOC for an existing container for which they have no privilege ("non-editor"), they still could create their own container including the same set (or subset) of the items.
  • I would suggest to synchronize definitions, re-considering terms used and harmonizing notations
    • container, i.e. in regard to hierarchical structure
    • table of contents/tableOfContents/TOC/toc - if the escidoc toc is an ordered and grouped overview of [selected] members, it may be semantically in sync with the METS concept "structMap" -> renaming to avoid confusion?
    • StructuralMap/struct-map/ - if a structural map is the "flat" list of item reference, it may be semantically in accordance to the METS concept "fileSec" -> renaming?
    • /container/resources/members - what is the difference between the members property and the structMap?

--Inga 23:31, 15 March 2008 (CET)

Attempt to map to OAI-ORE[edit]

Last update: Beta version of OAI-ORE standard available at [1]

On synchronization of definitions: maybe we use smth out from the OAI-ORE specification?--Natasa 17:26, 17 March 2008 (CET). Interesting set of examples at http://www.openarchives.org/ore/0.2/datamodel#Introduction :)

Started the derivation as a test for common understanding:

  • Container (OAI-ORE Concept: Aggregation, see http://www.openarchives.org/ore/0.2/overview#Aggregation)
  • Member of a container (OAI-ORE Concept: Aggregated resource - same link as above, however in eSciDoc terms aggregated resource as a member of a container can not be of eSciDoc type component, but only of eSciDoc type Item or Container - i.e. what eSciDoc considers as a resource)
  • The primary difference between Aggregation and ResourceMap: Aggregation is a set of resources and the ResourceMap is a resource itself and it describes the aggregation
  • So far seems that the ResourceMap concept is closest to the eSciDoc original TOC idea, with the limitation: single ResourceMap per container (i.e. Aggregation) (in fact that would mean our Toc objects would be smth completely different if actually needed)
Update - as of beta release [2] each Aggregation has at least one resource map. --Natasa 16:02, 3 June 2008 (CEST)
  • Important: "An Aggregation consists of one or more Aggregated Resources, which are logically the constituents of the Aggregation. To enumerate these, the Resource Map MUST express one or more triples where the subject is the URI of the respective Aggregation, the Predicate is "ore:aggregates", and the object is URI of a Resource that MUST NOT be the Resource Map or the Aggregation itself."
    • The above states that the ResourceMap itself can not be related to the Aggregation (if my mapping is fine-Container) as member, but with the relation "ore:describes"
update: this statement is relaxed so it is only stated that object must not be URI of the Aggregation. Not yet certain on the meaning for us. --Natasa 17:40, 3 June 2008 (CEST)


  • Very interesting part on "5.7.2 Relationships among nested Aggregations" from http://www.openarchives.org/ore/0.2/datamodel#Identification of an Aggregation
    • Each ResourceMap must contain "ore:aggregates" to its members
    • it does not assume automatic existance of "ore:isAggregatedBy" in this ResourceMap (interesting when nesting aggregations)
      • "The authority authoring and managing a nested Aggregation MAY choose to inform consuming clients of the "part/whole" nature of such nested Aggregations by including triples with the ore:isAggregatedBy Predicate in the Resource Maps that describe the nested Aggregations. In this manner a client accessing one of the issues has direct access back to the aggregating journal. The figure below illustrates this use of the ore:isAggregated relationship for nested Aggregations. This is the same "journal/issue" example shown above, but simplified to show only one "issue" AR-1, aggregated in the "journal" A-1. As shown, ReM-2 expresses a ore:isAggregated relationship for its respective nested Aggregation, AR-1 thereby providing information about the "containment" of AR-1, the "issue", in A-1, the "journal".

Another trial[edit]

  • Item -> Aggregation (identified by Handle URI hdl:1234/567)

How many TOC per container?[edit]

The only reason to provide more than one TOC for a container resource would be to have different selections of the container resource members. It is assumed that there is no use case for different selections of members of one single container.

This assumption is probably out-dated by now. The ViRR use cases show that various types ("physical", "logical") may be provided --Inga 23:16, 15 March 2008 (CET)
Yes indeed, it is out-dated. For me it's not clear if there is a "physical" and a "logical" TOC or one single TOC should be transformed into a METS with a "physical" and a "logical" structMap, in the ViRR use case. Frank 10:46, 17 March 2008 (CET)

Discussion from December 2007[edit]

  • TOC of a container should support
    • possibility to give title and description to a member
    • to give members an order
    • grouping members
probably this is a very good reason to really distinguish between structural map (the container members on the 1st level) as internal part of the container and the TOC (which can comprise any level of container members) as object which can be related to a container --Natasa 16:50, 10 December 2007 (CET)
Did you mean members of members if you say comprise any level of container members? See the next point but one.
Yes, i saw all points mentioned - (see also the reasoning under Questions and Discussion on this page) the idea is that we can treat TOC objects as any other objects i.e. content items that are related to container objects via content relations isTOCFor, hasToc. The users should decide up to which level they would like to go when they create the TOC (we may only give them some utility methods for that). Thus we keep the structural map intact - and it really contains only the members on first level(items, containers). So my point was to really make strict difference between struct map and TOC. --Natasa 18:32, 10 December 2007 (CET)
  • I see multiple TOCs as different views on the members of a container. Different representations (html, rtf ...) are transformations of a main format (xml).
  • We should basically decide if the TOC of a container is about the container members or additionally about the members of all subcontainers. The latter one is very complex and doubles the object structure.

Discussion from March 2008[edit]

The newest TOC version looks like

<?xml version="1.0" encoding="UTF-8"?>
<toc:toc ID="meins" TYPE="LOGICAL" LABEL="Table of Content" 
xml:base="http://localhost:8080" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:toc="http://www.escidoc.de/schemas/toc/0.4" 
xmlns:xlink="http://www.w3.org/1999/xlink"
xsi:schemaLocation="http://www.escidoc.de/schemas/toc/0.4 TOC-v3.xsd">
	<!-- 
		ID: 	xsd:ID       optional 
						an optional XML ID value.
		TYPE: 	xsd:string   optional  
						an optional string attribute specifying the type of toc provided. 
						Typical values will be "PHYSICAL" for a map which describes 
						the physical composition of the original work (a series with individual 
						monographs with pages) and "LOGICAL" for one which describes 
						the intellectual structure of the work (a monograph with TOC, 
						forward, chapters, index., etc.);
		LABEL: 	xsd:string   optional  
						an optional string attribute which may be used to describe the toc
						to users. This is primarily useful where more than one toc is 
						provided for a single object (e.g., both logical and physical toc).
	-->
	<toc:div ID="rootNode" LABEL="[the containers title]" TYPE="monograph">
		<toc:ptr ID="rootNodePtr" USE="DEFAULT" LOCTYPE="URL" xlink:href="/ir/container/escidoc:10" xlink:type="simple" 
                xlink:title="[the containers title]"/>
		<!-- attribute LOCTYPE has fixed value "URL" -->
		<!-- type of "toc:ptr" refers xlink:simpleLink in http://www.loc.gov/standards/mets/xlink.xsd -->
		<toc:div ID="container11" ORDER="1" ORDERLABEL="1." LABEL="[the containers title]" TYPE="chapter">
			<toc:ptr ID="container11Ptr" USE="DEFAULT" MIMETYPE="text/xml" xlink:href="/ir/container/escidoc:11" 
                        xlink:title="[the containers title]"/>
			<toc:div ID="item287" ORDER="1" ORDERLABEL="1.1" LABEL="[the items title]" TYPE="page">
				<toc:ptr ID="item287Ptr" xlink:href="/ir/item/escidoc:287" xlink:title="[the items title]"/>
				<my:additional-metadata xmlns:my="http://my.domain.org">
					<my:description>A photo of the front side.</my:description>
				</my:additional-metadata>
			</toc:div>
			<toc:div ID="item289" ORDER="2" ORDERLABEL="1.2" LABEL="[the items title]" TYPE="page">
				<my:additional-metadata xmlns:my="http://my.domain.org">
					<my:description>A textual description.</my:description>
				</my:additional-metadata>
				<toc:ptr ID="item289Ptr" USE="DEFAULT" xlink:href="/ir/item/escidoc:289" 
                                xlink:title="[the items title]"/>
			</toc:div>
		</toc:div>
		<toc:div ID="container12" ORDER="2" ORDERLABEL="2." LABEL="[the containers title]" TYPE="chapter">
			<toc:ptr ID="container12Ptr" USE="DEFAULT" xlink:href="/ir/container/escidoc:12" 
                        xlink:title="[the containers title]"/>
			<toc:div ID="container13" ORDER="1" ORDERLABEL="2.1" LABEL="[the containers title]" TYPE="section">
				<toc:ptr ID="container13Ptr" USE="DEFAULT" xlink:href="/ir/container/escidoc:13" 
                                xlink:title="[the containers title]"/>
				<my:additional-metadata xmlns:my="http://my.domain.org">
					<my:description>The back side.</my:description>
				</my:additional-metadata>
				<toc:div ID="item275" ORDER="1" ORDERLABEL="2.1.1" LABEL="[the items title]" TYPE="page">
					<toc:ptr ID="item275Ptr" USE="DEFAULT" xlink:href="/ir/item/escidoc:275" 
                                        xlink:title="[the items title]"/>
					<my:additional-metadata xmlns:my="http://my.domain.org">
						<my:description>A photo of the back side.</my:description>
					</my:additional-metadata>
				</toc:div>
				<toc:div ID="item277" ORDER="2" ORDERLABEL="2.1.2" LABEL="[the items title]" TYPE="page">
					<toc:ptr ID="item277Ptr" USE="DEFAULT" xlink:href="/ir/item/escidoc:277" 
                                        xlink:title="[the items title]"/>
					<my:additional-metadata xmlns:my="http://my.domain.org">
						<my:description>A textual description.</my:description>
					</my:additional-metadata>
				</toc:div>
			</toc:div>
		</toc:div>
	</toc:div>
</toc:toc>

Info from Frank: The main changes are:

  • <ptr> is optional. Therefore, <div> elements without a pointer are possible.
  • <ptr> is unbounded. A <div> may have several pointer.
  • <ptr> may have additional attributes "USE" and "MIMETYPE"
This information is required, but mainly on component (file) level, but ptr points to items or containers. It feels like we are mixing things up --Inga 13:04, 25 March 2008 (CET)

The link of a ptr element may point to a container or an item or binary-content of a component. Later the infrastructure should ensure, that a link points into the "member-tree" of the container the TOC is bound to. That means: a link points to a container that is member of the container or that is a member of a member of the container OR a link points to an item that is member of the container or that is a member of a member of the container OR a link points to binary-content of a component of an item that is member of the container or that is a member of a member of the container. The "member of a member" may also be a grandchild-member of the container etc. ;-)

With this changes the discussed use-cases (especially DFG-Viewer) should be possible. Even though it is not ensured by the XML structure that the TOC can be transformed in valid DFG-Viewer-METS without additional knowledge and/or informations.

Has this been tested/confirmed by anyone? We don't expect this transformation to be efficient and fast, true? --Inga 13:14, 25 March 2008 (CET)

Below, the xsd for the above example:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.escidoc.de/schemas/toc/0.4" xmlns:toc="http://www.escidoc.de/schemas/toc/0.4" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink">

	<xs:import namespace="http://www.w3.org/1999/xlink" schemaLocation="http://www.loc.gov/standards/mets/xlink.xsd"/>
	<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/>
	
	<xs:element name="toc">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="toc:div"/>
			</xs:sequence>
			<xs:attribute name="ID" type="xs:ID" use="optional"/>
			<xs:attribute name="TYPE" type="xs:string" use="optional">
				<xs:annotation>
					<xs:documentation>TYPE: an optional string attribute specifying the type of structural map provided.  Typical values will be "PHYSICAL" for a map which describes the physical composition of the original work (a series with individual monographs with pages) and "LOGICAL" for one which describes the intellectual structure of the work (a monograph with TOC, forward, chapters, index., etc.);
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute name="LABEL" type="xs:string" use="optional">
				<xs:annotation>
					<xs:documentation>LABEL: an optional string attribute which may be used to describe the structMap to users.  This is primarily useful where more than one structMap is provided for a single object (e.g., both logical and physical structMap).
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute ref="xml:base" use="optional"/>
		</xs:complexType>
	</xs:element>

	<xs:element name="div">
		<xs:complexType>
			<xs:sequence>
				<xs:choice minOccurs="0" maxOccurs="unbounded">
					<xs:element ref="toc:ptr"/>
					<xs:element ref="toc:div"/>
					<xs:any namespace="##other" processContents="skip"/>
				</xs:choice>
			</xs:sequence>
			<xs:attribute name="ID" type="xs:ID" use="optional"/>
			<xs:attribute name="ORDER" type="xs:integer" use="optional">
				<xs:annotation>
					<xs:documentation>ORDER: an optional integer representation of this div's order among its siblings (e.g., its sequence).
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute name="ORDERLABEL" type="xs:string" use="optional">
				<xs:annotation>
					<xs:documentation>ORDERLABEL: an optional string representation of this div's  order among its siblings (e.g., "xii"), or a non-integer native numbering system.  It is presumed that this value will still be machine-actionable (e.g., supports a page 'go to' function), and is not a replacement/substitute for the LABEL attribute.
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute name="LABEL" type="xs:string" use="optional">
				<xs:annotation>
					<xs:documentation>LABEL: an optional string label to describe this div to an end user viewing the document, as per a table of contents entry (NB: a div LABEL should be specific to its level in the structural map.  In the case of a book with chapters, the book div LABEL should have the book title, and the chapter div LABELS should have the individual chapter titles, rather than having the chapter div LABELs combine both book title and chapter title).
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute name="TYPE" type="xs:string" use="optional">
				<xs:annotation>
					<xs:documentation>TYPE: an optional string attribute for specifying a type of division (e.g., chapter, article, page, etc.).
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute name="visible" type="xs:string" default="true">
				<xs:annotation>
					<xs:documentation>Indicates if this div (and its sub-elements should be displayed when displaying this toc.</xs:documentation>
				</xs:annotation>
			</xs:attribute>
		</xs:complexType>
	</xs:element>

	<xs:element name="ptr">
		<xs:complexType>
			<xs:attribute name="ID" type="xs:ID" use="required">
				<xs:annotation>
					<xs:documentation>ID: an optional XML ID value</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute name="LOCTYPE" type="xs:string" fixed="URL"/>
			<xs:attribute name="USE" type="xs:string" use="optional">
				<xs:annotation>
					<xs:documentation>USE: an optional string attribute indicating the intended use of the resource (e.g., master, reference, thumbnails for image files).
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attribute name="MIMETYPE" type="xs:string" use="optional">
				<xs:annotation>
					<xs:documentation>MIMETYPE: an optional string attribute providing the MIME type for the resource.
				</xs:documentation>
				</xs:annotation>
			</xs:attribute>
			<xs:attributeGroup ref="xlink:simpleLink"/>
		</xs:complexType>
	</xs:element>

</xs:schema>