ESciDoc Content Model Object

Note: The following is a reworked proposal for eSciDoc Developer Workshop on 14./15.10.08.

See also ESciDoc_Content_Model_in_Fedora.

= eSciDoc Content Model = In eSciDoc infrastructure a content model describes the specialization of an Item or Container object and is stored inside the infrastructure as Content Model Object. Content Model Objects are managed via ContentModelService.

The content of a Content Model Object consists of common object properties, as known for Item Objects and Container Objects etc., of a set of predefined content model properties, governing the behavior and form of an Item or Container (e.g. initial state). In the future, a rule-document may be assigned to the Content Model Object. The rule-document is stored as valid XML document inside the Content Model Object and contains restrictions to an Item or Container expressed in a Rule or Modeling Language (e.g. Schematron-XML).

See also eSciDoc Content Models.

Common Object Properties
Common object properties, as known for Item Objects and Container Objects etc.


 * id (objid)
 * name
 * description
 * creation date
 * creator (created by)
 * modification date (last-modification-date)
 * status (public-status) + comment
 * PID
 * context (?)
 * lock-status, lock-date, lock-owner (?)
 * version, latest-version, release (in case of versioned content model)

Content Model Metadata
A DC metadata record in order to provide human readable information about this content model.


 * Are the properties name and description sufficient? (see above Common Object Properties) Frank 07:35, 24 September 2009 (UTC)
 * as final solution these are not sufficient. It is important to do smth like: author of the content model (not necessarily the user who created it), institution, some more descriptive information (maybe even in different languages). That is why descriptive metadata differ from properties. --Natasa 11:51, 24 September 2009 (UTC)

Predefined Content Model Properties
Predefined content model properties, governing the behavior and form of an Item or Container.

There are predefined content model properties in form of key-value pairs and in form of lists of values.

Key/Value

 * initial state (e.g. pending, submitted or released)
 * status transitions
 * versioning enabled
 * name of main metadata record
 * schema of main metadata record
 * transformation description from main metadata record to DC metadata (XSLT in datastream "DC-MAPPING")
 * applies to object pattern, or "specialise" (if the content model is further specification of an Item or a Container)
 * if a PID must be assigned for release (version and object PID, necessity of PID for content must be defined with component definition)

In the future, maybe flags if specific methods create new version.

Lists

 * additional metadata records
 * Name and schema of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.


 * allowed formats (Item only)
 * a list of possible components. A component is described by a name of the kind used for property content-category, a list of allowed mime-types and alist of metadata records (see previous point)


 * aggregation / description of members (Container only)
 * content model of allowed members, mandatory or optional



Questions:
 * Should number and kind of components of an Item be defined as property or in Rules?
 * Not in form of a rule language document! Because we would like to evaluate content models in Fedora, so - if rule language - we need a description of the Fedora object. Frank 07:45, 28 July 2009 (UTC)
 * Should cascade behavior be defined as standalone property or with aggregation description? See "Cascade" below.
 * Are values optional and if so, what is the meaning of no value?

Rule Document
The rule document of a content model is a valid XML document included in the Content Model Object.

Components
The description of an Item should contain at least a list of mime-types of component content (see above "allowed formats" in "List").

For further specification the rule-document may be used (see above "Rule Document"). Or the description of an Item may include an extended list of allowed formats where an entry consists of


 * mime-type
 * content-category
 * occurrence
 * metadata records (name, schema, and occurrence)

DC Mapping
Since the entire object is indexed for search and every XML element text content available via filter the DC mapping may be dropped for components. Till then the default mapping (! no user configurable mapping) is applied to the main metadata record.

Content Streams
Sould be described like component (list and/or rule-document) but just mime-type and occurrence. A name may be defined but may colide with occurrence. If mime-type is text/xml it should be possible to specify an XML schema.

Tech. note: Because content streams are not a seperate object inside the infrastructure they have no objid but a name as metadata records have.

Aggregation
The description of a Container may include an extended list of allowed member types where an entry consists of


 * content model (unique inside the list; means members conform to that content model are allowed)
 * cascade information (see below "Cascade")
 * unique parent (see below "Unique Parent")
 * occurrence

The list itself may carry information like


 * order(ed)
 * What means ordered? Or should it be order and give a key for standard order?
 * Originally, it was meant to be as the native physical order of the members in a container, i.e. the order in which they are created, and to enable adding them after e.g. 10th element in the aggregation. This could be used to automatically derive the default TOC. But no longer certain if needed at all, because then the ordering logic would be split into 2 places: TOC and Container. If it would in future help to automatically generate a TOC object - then we should keep it. --Natasa 12:14, 10 November 2008 (UTC)
 * So i would propose to drop it. But if not, it seems to me it's a boolean "ordered" flag!? Frank 17:36, 10 November 2008 (UTC)


 * TOC Content Model ID

Cascade
As attribute to aggregation.

The value of cascade can be one of SUBMIT, RELEASE, WITHDRAW, LOCK, UNLOCK or a comma separated list of combinations of these values or simply ALL. Additionaly the value RELAXED can be set to state that cascading is allowed to fail. An implementation MUST accept the values in lower case.

The value of cascade states if members of an aggregation MUST, SHOULD or MUST NOT be affected by state change or lock.


 * see also ESciDoc_Content_Models for description (from our previous talks). I am not certain that these can go to the level of granularity we wish i.e. better like the previous with tried/required.--Natasa 12:02, 10 November 2008 (UTC)
 * There seem to be two limitations with this approach: 1) One single cascade attribute can not state relaxed submission and required release at the same time. 2) No members-first; but I did not really get what that mean. Frank 10:17, 17 March 2009 (UTC)

Here and at ESciDoc_Content_Models we discuss the following two approaches: <aggregation cascade="SUBMIT,RELEASE,WITHDRAW,RELAXED" ...

<aggregation submit-rule="members-tried" release-rule="members-tried" withdraw-rule="members-tried" ...

Unique Parent
As attribute to aggregation.

The attribute 'unique-parent' states if the container described by this content model is the only parent of its members or not, or if there is no other parent of its members with the same content model.

[unique-parent="false"] The members of the container described by this content model may have other parents.

[unique-parent="true"] The members of the container described by this content model are not allowed to have other parents.

[unique-parent="typed"] The members of the container described by this content model may have other parents but not of the same content model as the described container.


 * I think in most cases it's a bad idea to introduce seemingly boolean property names and then let them have more than two values. So something like 'parent="unique|typed"' may be more appripriate. Robert 12:06, 10 November 2008 (UTC)
 * Agreed, but wanted to have one attribute for that unique parent behavior. What could be the value to attribute "parent" if parent must NOT be unique? Just not to state that attribute seems to me a bad idea, too. Frank 17:49, 10 November 2008 (UTC)


 * Robert, I was actually editing the same page with very same remark:) .. - as true/false are expected values on their own, and we need to try to avoid "type" when we talk about content models, would suggest naming this differently i.e.

--Natasa 12:10, 10 November 2008 (UTC)
 * unique-parent =>parents-allowed
 * false=> single-container-any-model
 * true => many-containers-any-model
 * typed => many-containers-distinct-model

General remarks/discussion

 * As pointed somewhere above, would really try to make the CModel document self-contained. Rules part such as Schematron defined rules (or any other language defined rules) that are pre-compiled every time a cmodel definition is changed can be generated separately. This would allow human-readable CModel definition and interoperability across implementations. --Natasa 12:18, 10 November 2008 (UTC)