ESciDoc Content Model Object

From MPDLMediaWiki
Jump to: navigation, search

Note: The following is a reworked proposal for eSciDocEnhanced Scientific Documentation Developer Workshop on 14./15.10.08.

See also ESciDoc_Content_Model_in_Fedora.

eSciDocEnhanced Scientific Documentation Content Model

In eSciDocEnhanced Scientific Documentation infrastructure a content model describes the specialization of an Item or Container object and is stored inside the infrastructure as Content Model Object. Content Model Objects are managed via ContentModelService.

The content of a Content Model Object consists of common object properties, as known for Item Objects and Container Objects etc., of a set of predefined content model properties, governing the behavior and form of an Item or Container (e.g. initial state). In the future, a rule-document may be assigned to the Content Model Object. The rule-document is stored as valid XMLExtensible Markup Language document inside the Content Model Object and contains restrictions to an Item or Container expressed in a Rule or Modeling Language (e.g. Schematron-XMLExtensible Markup Language).

See also eSciDoc Content Models .

Common Object Properties

Common object properties, as known for Item Objects and Container Objects etc.

  • id (objid)
  • name
  • description
  • creation date
  • creator (created by)
  • modification date (last-modification-date)
  • status (public-status) + comment
  • PIDPersistent Identifer or Identification
  • context (?)
  • lock-status, lock-date, lock-owner (?)
  • version, latest-version, release (in case of versioned content model)

Content Model Metadata

A DCDublin Core metadata record in order to provide human readable information about this content model.

Are the properties name and description sufficient? (see above Common Object Properties) Frank 07:35, 24 September 2009 (UTCCoordinated Universal Time)
as final solution these are not sufficient. It is important to do smth like: author of the content model (not necessarily the user who created it), institution, some more descriptive information (maybe even in different languages). That is why descriptive metadata differ from properties. --Natasa 11:51, 24 September 2009 (UTCCoordinated Universal Time)

Predefined Content Model Properties

Predefined content model properties, governing the behavior and form of an Item or Container.

There are predefined content model properties in form of key-value pairs and in form of lists of values.

Key/Value

  • initial state (e.g. pending, submitted or released)
  • status transitions
  • versioning enabled
  • name of main metadata record
  • schema of main metadata record
  • transformation description from main metadata record to DCDublin Core metadata (XSLTExtensible Stylesheet Language Transformations in datastream "DCDublin Core-MAPPING")
  • applies to object pattern, or "specialise" (if the content model is further specification of an Item or a Container)
  • if a PIDPersistent Identifer or Identification must be assigned for release (version and object PIDPersistent Identifer or Identification, necessity of PIDPersistent Identifer or Identification for content must be defined with component definition)

In the future, maybe flags if specific methods create new version.

Lists

  • additional metadata records
Name and schema of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.
  • allowed formats (Item only)
a list of possible components. A component is described by a name of the kind used for property content-category, a list of allowed mime-types and alist of metadata records (see previous point)
  • aggregation / description of members (Container only)
content model of allowed members, mandatory or optional
  • ...

Questions:

Should number and kind of components of an Item be defined as property or in Rules?
Not in form of a rule language document! Because we would like to evaluate content models in FedoraFlexible Extensible Digital Object Repository Architecture, so - if rule language - we need a description of the FedoraFlexible Extensible Digital Object Repository Architecture object. Frank 07:45, 28 July 2009 (UTCCoordinated Universal Time)
Should cascade behavior be defined as standalone property or with aggregation description? See "Cascade" below.
Are values optional and if so, what is the meaning of no value?

Rule Document

The rule document of a content model is a valid XMLExtensible Markup Language document included in the Content Model Object.

Components

The description of an Item should contain at least a list of mime-types of component content (see above "allowed formats" in "List").

For further specification the rule-document may be used (see above "Rule Document"). Or the description of an Item may include an extended list of allowed formats where an entry consists of

  • mime-type
  • content-category
  • occurrence
  • metadata records (name, schema, and occurrence)
  • ...

DCDublin Core Mapping

Since the entire object is indexed for search and every XMLExtensible Markup Language element text content available via filter the DCDublin Core mapping may be dropped for components. Till then the default mapping (! no user configurable mapping) is applied to the main metadata record.

Questions

Content Streams

Sould be described like component (list and/or rule-document) but just mime-type and occurrence. A name may be defined but may colide with occurrence. If mime-type is text/xml it should be possible to specify an XMLExtensible Markup Language schema.

Tech. note: Because content streams are not a seperate object inside the infrastructure they have no objid but a name as metadata records have.

Aggregation

The description of a Container may include an extended list of allowed member types where an entry consists of

  • content model (unique inside the list; means members conform to that content model are allowed)
  • cascade information (see below "Cascade")
  • unique parent (see below "Unique Parent")
  • occurrence

The list itself may carry information like

  • order(ed)
What means ordered? Or should it be order and give a key for standard order?
Originally, it was meant to be as the native physical order of the members in a container, i.e. the order in which they are created, and to enable adding them after e.g. 10th element in the aggregation. This could be used to automatically derive the default TOCTable of Contents. But no longer certain if needed at all, because then the ordering logic would be split into 2 places: TOCTable of Contents and Container. If it would in future help to automatically generate a TOCTable of Contents object - then we should keep it. --Natasa 12:14, 10 November 2008 (UTCCoordinated Universal Time)
So i would propose to drop it. But if not, it seems to me it's a boolean "ordered" flag!? Frank 17:36, 10 November 2008 (UTCCoordinated Universal Time)
  • TOCTable of Contents Content Model IDIdentifier

Cascade

As attribute to aggregation.

The value of cascade can be one of SUBMIT, RELEASE, WITHDRAW, LOCK, UNLOCK or a comma separated list of combinations of these values or simply ALL. Additionaly the value RELAXED can be set to state that cascading is allowed to fail. An implementation MUST accept the values in lower case.

The value of cascade states if members of an aggregation MUST, SHOULD or MUST NOT be affected by state change or lock.

see also ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources for description (from our previous talks). I am not certain that these can go to the level of granularity we wish i.e. better like the previous with tried/required.--Natasa 12:02, 10 November 2008 (UTCCoordinated Universal Time)
There seem to be two limitations with this approach: 1) One single cascade attribute can not state relaxed submission and required release at the same time. 2) No members-first; but I did not really get what that mean. Frank 10:17, 17 March 2009 (UTCCoordinated Universal Time)

Here and at ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources we discuss the following two approaches:

<aggregation cascade="SUBMIT,RELEASE,WITHDRAW,RELAXED" ...
<aggregation submit-rule="members-tried" release-rule="members-tried" withdraw-rule="members-tried" ...

Unique Parent

As attribute to aggregation.

The attribute 'unique-parent' states if the container described by this content model is the only parent of its members or not, or if there is no other parent of its members with the same content model.

[unique-parent="false"] The members of the container described by this content model may have other parents.

[unique-parent="true"] The members of the container described by this content model are not allowed to have other parents.

[unique-parent="typed"] The members of the container described by this content model may have other parents but not of the same content model as the described container.

  • I think in most cases it's a bad idea to introduce seemingly boolean property names and then let them have more than two values. So something like 'parent="unique|typed"' may be more appripriate. Robert 12:06, 10 November 2008 (UTCCoordinated Universal Time)
Agreed, but wanted to have one attribute for that unique parent behavior. What could be the value to attribute "parent" if parent must NOT be unique? Just not to state that attribute seems to me a bad idea, too. Frank 17:49, 10 November 2008 (UTCCoordinated Universal Time)
Robert, I was actually editing the same page with very same remark:) .. - as true/false are expected values on their own, and we need to try to avoid "type" when we talk about content models, would suggest naming this differently i.e.
  • unique-parent =>parents-allowed
  • false=> single-container-any-model
  • true => many-containers-any-model
  • typed => many-containers-distinct-model

--Natasa 12:10, 10 November 2008 (UTCCoordinated Universal Time)

General remarks/discussion

  • As pointed somewhere above, would really try to make the CModel document self-contained. Rules part such as Schematron defined rules (or any other language defined rules) that are pre-compiled every time a cmodel definition is changed can be generated separately. This would allow human-readable CModel definition and interoperability across implementations. --Natasa 12:18, 10 November 2008 (UTCCoordinated Universal Time)