ESciDoc Content Model in Fedora

From MPDLMediaWiki
Revision as of 11:49, 31 July 2009 by Frank (talk | contribs)
Jump to navigation Jump to search

This is a protected page.

!!! in progress !!!


eSciDoc Content Models[edit]

In the eSciDoc infrastructure a content model describes the specialization of an Item or Container object and is stored inside the infrastructure as Content Model Object. Content Model Objects are managed via Content Model Handler.

A Content Model Object is a versionable eSciDoc resource with a publication workflow. Thus it contains common object properties of a eSciDoc resource including version informations.

Additionally it holds key-value pairs governing the behavior and form of the specialized content object. These are e.g. the initial state, if versioning is enabled, the name of a mandatory metadata record etc..

In order to flexibly define the structure of a content object in its variable parts, the Content Model Object may include or refer a rule document.

In general the term "content model object" refers to a concret digital object that holds - or consist of - the description of a content model. A digital object usually refers a content model object which means it is (or should be) conform to the appropriate content model. Instead of "conform to" one can say the digital object is of the type defined by the content model.

See also [|eSciDoc Content Model Object].

Common Object Properties[edit]

  • id (objid)
  • name
  • description
  • creation date
  • creator (created by)
  • modification date (last-modification-date)
  • status (public-status, version-status) + comment (values are pending, in-revision, submitted, released))
  • PID
  • context
  • content model (either defines a content model object for content models OR don't state this property; note: content model is not a special item)
  • lock-status, lock-date, lock-owner
  • version, latest-version, release

Questions and Notes[edit]

  • status withdrawn not needed
  • name and description? (cf. Item and Container)
  • What means the context of a Content Model Object?

Key-Value Pairs[edit]

This section covers how to state these informations and the form to represent them inside the resource representation of a Content Model Object. For a list of possible or needed values see [[http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Content_Model_Object#Predefined_Content_Model_Properties |eSciDoc Content Model Object]].

There are single values as for initial state and lists of values as for mime-types or names of metadata records. Lists usually demand for a definition if a special occurrence is allowed or forbidden, if the list is completed or open etc.. Therefore lists may be better covered by rules inside a rule document.

Most single values must be accessible for creation of a resource; in the following called creation information. Some must be accessible for specific operations (status transitions, tranformation description for dc mapping); in the following called transition informations. Transition information may also be creation information but not vice versa. Creation information are necessary for the creation process while transition information are necessary for one or more service operations.

List values can be used as input for a resource validation which must be done before or after every operation. They must not be available for individual read-access. Therefore it is sufficient to formulate them into a rule document (cf. [|List (Properties)]). Other values should be available for fast access.

Questions and Notes[edit]

Rule Document[edit]

The rule-document (if any) is stored as valid XML document inside the Content Model Object or as reference to a valid XML document. It contains restrictions to an Item or Container expressed in a rule-language (e.g. Schematron-XML).

This approach does not support content validation in Fedora. Even if the rule document related to the eSciDoc resource is in a language Fedora can evaluate, it will describe the eSciDoc resource and not the Fedora object.

Questions and Notes[edit]

Fedora Content Models[edit]

Fedora Content Model Architecture (CMA). See [|Fedora Content Model Architecture (CMA)]

One important reason to introduce CMA in Fedora was the need to decouple content objects and Fedora disseminators. Now, the behavior of objects - in form of disseminators - is bound to the content model object of a content object. This is of lower interest for eSciDoc and eSciDoc Content Models and therefore not discussed here. Though it enables the eSciDoc Infrastructure to use disseminators and in the future, eSciDoc content models should be extended to describe behavior of eSciDoc objects.

Each Fedora object refers a Fedora content model object by the property info:fedora/fedora-system:def/model#hasModel. This relation is stored in the RELS-EXT datastream of the Fedora object.

Fedora CMA supports "complex single-object models", "commonly called 'compound'" and "multi-object models and is commonly called 'atomistic' or 'linked'"((http://fedora-commons.org/confluence/display/FCR30/Content+Model+Architecture)). Since no concrete means could be found this seems to be a theoretical statement.

Content models "include structural, behavioral and semantic information [and] a description of the permitted, excluded, and required relationships to other digital objects or identifiable entities."((http://fedora-commons.org/confluence/display/FCR30/Content+Model+Architecture)) This is defined by means of a "content modeling language".

Fedora Content Modeling Language[edit]

Fedora 3 contains a reference implementation of a content modeling language. This implementation seems to be limited to the definition of a very simple XML format. It allows a list of elements where each specifies the ID of datastream that must exist. Additionaly, for a datastream ID a list of allowed MIME types may be specified. The Fedora client code comprises a validator implementation for that language.

The content modeling language document is stored in the mandatory datastream "DS-COMPOSITE-MODEL" of a Content Model Object.

.<img src="http://fedora-commons.org/confluence/download/attachments/4718710/cmodel.png"/>

Questions and Notes[edit]


Enhanced Content Models for Fedora[edit]

Enhanced Content Models for Fedora (ECM) is an enhancement to Fedora CMA providing a validator implementation, a webservice to create compound views from atomisitc objects, templates for object creation, an extension to Fedoras content modeling language and an ontology datastream extending possibilities. ECM is curently available in version 0.8.

See [|http://ecm.wiki.sourceforge.net/]

[|Presentation on OpenRepositories 2009 by Asger Blekinge-Rasmussen]

Supports [|Content Model Inheritance].

Validator[edit]

By binding an object to a special Content Model defined by ECM the validator can be called as disseminator of that object. Beside that, it can be called by a specific HTTP URL containing the ID of the object to be validated. There seem to be no tool to validate a FOXML document outside Fedora. At least when modifying an existing Fedora object the validation must be done after persisting the modification.

Extension to Fedora content modeling language[edit]

Fedora content modeling language is extended by an element that adds the possibility to specify a XML Schema (stored inside the Content Model Object) for a datastream (with MIME type text/xml) in addition to the possiblity to define MIME types for a datastream(see "Fedora Content Modeling Language" above). So, a further specification of datastreams of MIME type XML is possible.

Ontology Datastream[edit]

The Ontology Datastream contains an ontology about objects conform to the content model. It consists of a RDF/XML document defining a class using RDFs and OWL (Lite) and the predicates an instance of this class may have.

Therefore, beside other things, the relations of objects which are expressed in RDF/XML in the RELS-EXT of these objects may be restricted to a defined set and to what kind (content model) of objects they point. This approach fits perfectly the idea of RELS-EXT datastream.

Templates[edit]

ECM defines a predicate to be stated in the RDF/XML document stored in the RELS-EXT of the Content Model Object that refers an object which should be used as template. New objects - conform to that content model - can be created from that template by means of a webservice.

Questions and Notes[edit]

  • eSciDoc Content Models can not define templates in the above sense because the structure of Fedora objects is completely hidden by the eSciDoc Infrastructure!?
  • eSciDoc Infrastructure is able to create new objects from existing objects.
  • ECM template objects in Fedora have state inactive.

Mapping between eSciDoc and Fedora Content Model Objects[edit]

In Fedora an object points to its content model by the predicate info:fedora/fedora-system:def/model#hasModel which may be reported as http://escidoc.de/core/01/structural-relations/content-model inside the eSciDoc XML representation of an object.

Because a Fedora object may refer more than one content model objects and it is recommend to refer the fedora-defined "Basic Content Model" if a self-defined content model is refered((http://fedora-commons.org/confluence/display/FCR30/Fedora+Digital+Object+Model#FedoraDigitalObjectModel-ContentModelObject)), it may be hard to figure out which of these relations to state in the eSciDoc representation.

Fedora Datastreams of an eSciDoc Object[edit]

The only optional Fedora datastreams of an eSciDoc Object are content streams (only Item) and the metadata records.

Metadata records have a name which is the ID of the corresponding XML datastream in Fedora. In order to define that a metadata record with a given name must appear in an eSciDoc object the Fedora content modeling language is sufficient. It is easy to map a list of mandatory names for metadata records - which may be specified in an eSciDoc Content Model Object - to an appropriate Fedora content modeling language document.

Using the ECM extension of the Fedora content modeling language accordingly the XML schema, a metadata record must be conform to, can be defined and mapped.

Relations[edit]

structural vs. content relations

Defined by ONTOLOGY of ECM. Hardly by users.

Compound Views[edit]

Key-Value Pairs[edit]

Single values defined by an eSciDoc content model (see "eSciDoc Content Models" above) are stored inside the Fedora Content Model Object and may have consequences regarding the set of datastreams. E.g. if an object should not be versioned it does not need the versioning datastream.

Rule vs. Content Modeling Language[edit]

The Fedora content modeling language (even though extended by ECM) does not fullfil the requirements on a rule language describing an eSciDoc object.

A rule language is one idea to restrict the kind and content of Components. A Fedora content model (with ECM) can define the relation between an Item and its Components but no constraints about the Component Object. Not even cardinality of Components different from 0 or 1 can be restricted. The only way to specify Components be means of Fedora and ECM is to introduce seperate Content Models for Components and to state there must be at least one Component of a specific Content Model.

Questions and Notes[edit]

  • How to seperate basic content model from self-defined content model in RELS-EXT of Fedora Content Model Object.
  • How to store key/value pairs? RELS-EXT?
  • Can everything that need to be defined for an eSciDoc object be defined in an Fedora Content Model Object using CMA and ECM?
    • If not, two validation stages overlapping?
  • Necessity for consider key/value pairs generating the Fedora content modeling language document? E.g. version-history.


eSciDoc Content Model in Fedora[edit]

Values stated in an eSciDoc Content Model Object defining content and behavior of a content object (e.g. Item, Container) are separeted in three categories:

  1. Creation; values pertaining the initial state
  2. Transition; values defining possible transitions or effects of specific transitions
  3. State; values describing content independent of the previous or next state

Creation[edit]

Information considered in the creation process.

  • initial state
  • versioning enabled
  • name of main metadata record
  • schema of main metadata record
  • dc mapping
  • (content checksum enabled)
  • (applies to object pattern)

Transition[edit]

Information considered for specific operations in order to decide if the operation is allowed and/or which sub-operations must be triggered or are requirements for that operation.

  • status transitions (e.g from pending to submitted)
  • cascade information (e.g. for containers, should a release of all members be tried on release)

State[edit]

Information used to validate the current state of the resource.

  • Name, schema, and occurrence of additional metadata records.
  • mime-types of content
  • Name, content-category etc. of Component
  • content model of allowed members, occurrences

Note: The listings above are not necessarily complete.


Creation and Transition[edit]

Validation[edit]

Information related to the state of a resource (the content object) are used to validate the object. Such a validation may be done in Fedora based on the informations from the dsCompositeModel and maybe additionally the ONTOLOGY datastream.

Components and Members[edit]

The description of which Components of an Item and what kind of members of a Container are allowed may in Fedora be validated as relations. In fact both are stored as relations in Fedora (see "Structural Relations" below) but informations about the kind of the related resource is needed to reach the intended level of description.

Relations and Datastreams[edit]

With CMA and ECM the relations and datastreams of a Fedora Object can be validated. Both can easily be mapped from a description of an eSciDoc resource into a description of a Fedora object except for cardinality of relations.

Relations of an eSciDoc resource are not directly mapped to relations of the corresponding Fedora object (see below "Content Relations").

Metadata Datastreams[edit]

Metadata records of an eSciDoc resource are defined by a name and a XML Schema. Technically such a record is stored as datastream of MIME type text/xml in a Fedora object where the name of the metadata record is the name of the datastream, the XML schema applies to the content of the datastream and the datastream is marked as eSciDoc metadata record.

The description of an eSciDoc metadata record can be mapped to the description of a Fedora datastream and vice versa. So a validation of the eSciDoc resource is possible as well as a Fedora object validation based on Fedoras modeling language extended by ECM.

This approche lacks the possibility to state optional metadata records or to restrict the set of metadata records to the defined set.

Content Datastreams[edit]

Content (also referred as binary content) of an eSciDoc resource is defined by a name (an individual name in case of content-stream in Item and the name "content" in Component) and a mime-type. These values can be accurately mapped to the values of a datastream in Fedora. The storage-type of the content can be freely choosen and is not restricted by the content model. So a validation of the eSciDoc resource is possible as well as a Fedora object validation based on Fedoras modeling language extended by ECM. The content itself is not considered for validation by content model.

This approche lacks the possibility to state optional content-streams in Item or to restrict the of set content-streams in Item to a defined set.

Content Relations[edit]

Idea: Define a global ontology which allows all defined relations for all possible objects. With ECM: "All allowed relations must be defined in the ontology". So every content model object in Fedora must hold every possible relation.

BUT technically content relations are objects by their own. So every Fedora content model derived from an eSciDoc content model must just allow to state relations to Content Relation Objects. From this point of view it must be considered as advantage there is only one predicate referring to Content Relations. Otherwise the ECM rule always to state the complete set of possible relations would break the idea to freely relate resources by Content Relations without respect for ownership and content model of the resources.

Structural Relations[edit]

Yes. No other cardinality then 1 or 0.

The validation of structural relations in Fedora may cover the validation of the descriptions of Components and members in eSciDoc.

Members conform to a specified set of content models but ... (? Does ECM allow everything from OWL Lite?). Allowed member content models by allValuesFrom-restriction with union of content model classes.

To be able to map the description of a set of Components including values of content-category, mime-type etc. a content model for Component objects is necessary.

For every component-type statet in the eSciDoc content model a separate Fedora content model object is created. The name (aka content-category) of a component-type and the ID of the eSciDoc Content Model Object are used to generate the ID of the Component Content Model Object. Allowed metadata records are modeled as for common eSciDoc resources (see above "Metadata Datastreams"). The allowed mime-types are listed for the datastream "content". The content-category is ensured by a hasValue-restriction (must be checked if supported by ECM).


File:ContentModel-draft.xml

File:ContentModel-draft-dsCompositeModel.xml

File:ContentModel-draft-ONTOLOGY.xml

File:ContentModel-draft-CM component FULLSIZE-dsCompositeModel.xml

File:ContentModel-draft-CM component FULLSIZE-ONTOLOGY.xml