Difference between revisions of "ESciDoc Content Model Object"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 55: Line 55:


* additional metadata records
* additional metadata records
: Name, schema, and occurrence of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.
: Name and schema of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.
::Name and schema would be better, as the content model - beside descriptive information that holds only flag should also offer the possibility to map between differents chemas. --[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
::What is meant by occurence of additional metadata records? not clear with this information, probably we should not keep this information. --[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
::: Yes, name and occurrence may be in conflict, because name of a metadata record must be unique inside one object. [[User:Frank|Frank]] 10:37, 11 November 2008 (UTC)
:::: Obviously we need name-schema pairs (a md-record with name ''name'' must be conform to ''schema''). Additionaly it would be nice to state if a md-record is optional or mandatory (a md-record with name ''name'' must exist and must be conform to ''schema''). Additionaly, for the entire object, it should be possible to state if it is allowed to add md-records not defined in the content model (a flag to allow additional unspecified md-records; default should be ''not allowed'') [[User:Frank|Frank]] 08:52, 17 March 2009 (UTC)
* allowed formats (Item only)
* allowed formats (Item only)
: a list of mime-types. The binary content bound to a component of an Item must match one of these mime-types. (In conjunction with occurrence, metadata for a specific mime-type, type, type-label, etc., maybe better defined as rule or as set of properties applied to a component. See also content streams.)
: a list of mime-types. The binary content bound to a component of an Item must match one of these mime-types. (In conjunction with occurrence, metadata for a specific mime-type, type, type-label, etc., maybe better defined as rule or as set of properties applied to a component. See also content streams.)

Revision as of 15:11, 27 July 2009

Note: The following is a reworked proposal for eSciDoc Developer Workshop on 14./15.10.08.

eSciDoc Content Model[edit]

In eSciDoc infrastructure a content model describes the specialization of an Item or Container object and is stored inside the infrastructure as Content Model Object. Content Model Objects are managed via ContentModelHandler.

The content of a Content Model Object consists of common object properties, as known for Item Objects and Container Objects etc., of a set of predefined content model properties, governing the behavior and form of an Item or Container (e.g. initial state), and a rule-document. The rule-document is stored as valid XML document inside the Content Model Object and contains restrictions to an Item or Container expressed in Schematron-XML (or maybe another rule-language, in the future).

See also eSciDoc Content Models .

Common Object Properties[edit]

  • id (objid)
  • name
  • description
  • creation date
  • creator (created by)
  • modification date (last-modification-date)
  • status (public-status) + comment
  • PID
  • context
  • content model (either defines a content model object for content models OR don't state this property; note: content model is not a special item)
  • lock-status, lock-date, lock-owner
  • version, latest-version, release (in case of versioned content model)

Predefined Content Model Properties[edit]

There are predefined content model properties in form of key-value pairs and in form of lists of values.

Key/Value[edit]

  • initial state (e.g. pending, submitted or released)
  • status transitions
  • versioning enabled
  • name of main metadata record
  • schema of main metadata record
  • transformation description from main metadata record to DC metadata (XSLT in datastream "DC-MAPPING")
  • applies to object pattern, or "specialise" (if the content model is further specification of an Item or a Container)
  • allowed contexts (not recommendable in case of reuse of content-model)

Maybe flags, if specific methods create new version.

Lists[edit]

  • additional metadata records
Name and schema of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.
  • allowed formats (Item only)
a list of mime-types. The binary content bound to a component of an Item must match one of these mime-types. (In conjunction with occurrence, metadata for a specific mime-type, type, type-label, etc., maybe better defined as rule or as set of properties applied to a component. See also content streams.)
In here would not agree. First we need to define if item in cmodel can have components or not, and then for the components define allowed formats. That is a bit more complex, but better structured. --Natasa 11:53, 10 November 2008 (UTC)
What is the difference between allowed formats and a list of allowed mime-types?--Natasa 11:53, 10 November 2008 (UTC)
As content-category of the component is a component property, possible values should be defined within the cModel as well imho. --Natasa 11:53, 10 November 2008 (UTC)
So, should we skip allowed formats at all and just define the mandatory/optional components together with mime-type, md-records etc.? As I understand we had a requirement for a list of valid mime-types as we do for the entire system in one list, for now. But maybe such a list should be in the Context Object??? Frank 08:59, 17 March 2009 (UTC)
  • aggregation / description of members (Container only)
content model of allowed members, occurrences
  • ...

Questions:

Should number and kind of components of an Item be defined as property or in Rules?
What is the difference between property and Rules?--Natasa 11:53, 10 November 2008 (UTC)
Aha, after reading I got it. But i would think that the content model should be formally defined in a manner that is not implementation specific (but still conforming to a certain structure i.e. we started before to define a schema on this). Then the Rule document can be generated based on the content of this structure. That way, we could allow that our content models are really interoperable. The rules are needed for internal validation in eSciDoc (and evtl. could be offered as a service). But they are probably not so clear for the end user. --Natasa 11:56, 10 November 2008 (UTC)
The idea is to be able to assign rule documents that can be validated against Validation Service. Maybe PubMan validation schemas can be used here. (See "Rule Document" below) Frank 17:30, 10 November 2008 (UTC)
Should cascade behavior be defined as standalone property or with aggregation description? See "Cascade" below.
Are values optional and if so, what is the meaning of no value?

Rule Document[edit]

The rule document of a content model is a valid XML document included in the Content Model Object.

A first approach can be, just to store validation schemas in use by PubMan inside the Content Model Object.

This would not be possible, as the validation schema depend not only on CModel, but also on metadata version and the context.
CModel can only store some very generic validation schema (i.e. default validation points) --Natasa 13:32, 10 November 2008 (UTC)

Components[edit]

The description of an Item should contain at least a list of mime-types of component content (see above "allowed formats" in "List").

For further specification the rule-document may be used (see above "Rule Document"). Or the description of an Item may include an extended list of allowed formats where an entry consists of

  • mime-type
  • content-category
  • occurrence
  • metadata records (name, schema, and occurrence)
  • ...

Content Streams[edit]

Sould be described like component (list and/or rule-document) but just mime-type and occurrence. A name may be defined but may colide with occurrence. If mime-type is text/xml it should be possible to specify an XML schema.

Tech. note: Because content streams are not a seperate object inside the infrastructure they have no objid but a name as metadata records have.

Aggregation[edit]

The description of a Container may include an extended list of allowed member types where an entry consists of

  • content model (unique inside the list; means members conform to that content model are allowed)
  • cascade information (see below "Cascade")
  • unique parent (see below "Unique Parent")
  • occurrence

The list itself may carry information like

  • order(ed)
What means ordered? Or should it be order and give a key for standard order?
Originally, it was meant to be as the native physical order of the members in a container, i.e. the order in which they are created, and to enable adding them after e.g. 10th element in the aggregation. This could be used to automatically derive the default TOC. But no longer certain if needed at all, because then the ordering logic would be split into 2 places: TOC and Container. If it would in future help to automatically generate a TOC object - then we should keep it. --Natasa 12:14, 10 November 2008 (UTC)
So i would propose to drop it. But if not, it seems to me it's a boolean "ordered" flag!? Frank 17:36, 10 November 2008 (UTC)
  • TOC Content Model ID

Cascade[edit]

As attribute to aggregation.

The value of cascade can be one of SUBMIT, RELEASE, WITHDRAW, LOCK, UNLOCK or a comma separated list of combinations of these values or simply ALL. Additionaly the value RELAXED can be set to state that cascading is allowed to fail. An implementation MUST accept the values in lower case.

The value of cascade states if members of an aggregation MUST, SHOULD or MUST NOT be affected by state change or lock.

see also ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources for description (from our previous talks). I am not certain that these can go to the level of granularity we wish i.e. better like the previous with tried/required.--Natasa 12:02, 10 November 2008 (UTC)
There seem to be two limitations with this approach: 1) One single cascade attribute can not state relaxed submission and required release at the same time. 2) No members-first; but I did not really get what that mean. Frank 10:17, 17 March 2009 (UTC)

Here and at ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources we discuss the following two approaches:

<aggregation cascade="SUBMIT,RELEASE,WITHDRAW,RELAXED" ...
<aggregation submit-rule="members-tried" release-rule="members-tried" withdraw-rule="members-tried" ...

Unique Parent[edit]

As attribute to aggregation.

The attribute 'unique-parent' states if the container described by this content model is the only parent of its members or not, or if there is no other parent of its members with the same content model.

[unique-parent="false"] The members of the container described by this content model may have other parents.

[unique-parent="true"] The members of the container described by this content model are not allowed to have other parents.

[unique-parent="typed"] The members of the container described by this content model may have other parents but not of the same content model as the described container.

  • I think in most cases it's a bad idea to introduce seemingly boolean property names and then let them have more than two values. So something like 'parent="unique|typed"' may be more appripriate. Robert 12:06, 10 November 2008 (UTC)
Agreed, but wanted to have one attribute for that unique parent behavior. What could be the value to attribute "parent" if parent must NOT be unique? Just not to state that attribute seems to me a bad idea, too. Frank 17:49, 10 November 2008 (UTC)
Robert, I was actually editing the same page with very same remark:) .. - as true/false are expected values on their own, and we need to try to avoid "type" when we talk about content models, would suggest naming this differently i.e.
  • unique-parent =>parents-allowed
  • false=> single-container-any-model
  • true => many-containers-any-model
  • typed => many-containers-distinct-model

--Natasa 12:10, 10 November 2008 (UTC)

General remarks/discussion[edit]

  • As pointed somewhere above, would really try to make the CModel document self-contained. Rules part such as Schematron defined rules (or any other language defined rules) that are pre-compiled every time a cmodel definition is changed can be generated separately. This would allow human-readable CModel definition and interoperability across implementations. --Natasa 12:18, 10 November 2008 (UTC)