Difference between revisions of "ESciDoc Content Model Object"

From MPDLMediaWiki
Jump to navigation Jump to search
 
(41 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<accesscontrol>eSciDoc</accesscontrol>
Note: The following is a reworked proposal for eSciDoc Developer
Note: The following is a reworked proposal for eSciDoc Developer
Workshop on 14./15.10.08.
Workshop on 14./15.10.08.
See also [[ESciDoc_Content_Model_in_Fedora]].


= eSciDoc Content Model =
= eSciDoc Content Model =
Line 7: Line 8:
an Item or Container object and is stored inside the infrastructure as
an Item or Container object and is stored inside the infrastructure as
Content Model Object.  
Content Model Object.  
Content Model Objects are managed via ContentModelHandler.
Content Model Objects are managed via ContentModelService.


The content of a Content Model Object consists of common object properties, as known  
The content of a Content Model Object consists of common object properties, as known  
Line 14: Line 15:
properties, governing the behavior and form of an Item or Container  
properties, governing the behavior and form of an Item or Container  
(e.g. initial
(e.g. initial
state), and a rule-document. The rule-document is stored as valid XML
state). In the future, a rule-document may be assigned to the Content Model Object. The rule-document is stored as valid XML
document inside the Content Model Object and contains restrictions to  
document inside the Content Model Object and contains restrictions to  
an Item or Container expressed in Schematron-XML (or maybe another
an Item or Container expressed in a Rule or Modeling Language (e.g. Schematron-XML).
rule-language, in the future).


See also [[ESciDoc_Content_Models | eSciDoc Content Models ]].
See also [[ESciDoc_Content_Models | eSciDoc Content Models ]].


== Common Object Properties ==
== Common Object Properties ==
Common object properties, as known for Item Objects and Container Objects etc.


* id (objid)
* id (objid)
Line 29: Line 31:
* creator (created by)
* creator (created by)
* modification date (last-modification-date)
* modification date (last-modification-date)
* status (public-status)
* status (public-status) + comment
* context
* PID
* content model (either defines a content model object for content models OR don't state this property; note: content model is not a special item)
* context (?)
* lock-status, lock-date, lock-owner (?)
* version, latest-version, release (in case of versioned content model)
 
== Content Model Metadata ==
 
A DC metadata record in order to provide human readable information about this content model.


: Are the properties ''name'' and ''description'' sufficient? (see above ''Common Object Properties'') [[User:Frank|Frank]] 07:35, 24 September 2009 (UTC)
::as final solution these are not sufficient. It is important to do smth like: author of the content model (not necessarily the user who created it), institution, some more descriptive information (maybe even in different languages). That is why descriptive metadata differ from properties. --[[User:Natasab|Natasa]] 11:51, 24 September 2009 (UTC)


== Predefined Content Model Properties ==
== Predefined Content Model Properties ==
Predefined content model properties, governing the behavior and form of an Item or Container.
There are predefined content model properties in form of key-value pairs and
There are predefined content model properties in form of key-value pairs and
in form of lists of values.
in form of lists of values.
Line 40: Line 53:
=== Key/Value ===
=== Key/Value ===


* initial state (pending, submitted or released)
* initial state (e.g. pending, submitted or released)
* status transitions
* status transitions
* versioning enabled
* versioning enabled
* name of main metadata record
* name of main metadata record
* schema of main metadata record
* schema of main metadata record
:for sake of simplicity we could always name the main metadata record "eSciDoc"--[[User:Natasab|Natasa]] 11:46, 10 November 2008 (UTC)
* transformation description from main metadata record to DC metadata (XSLT in datastream "DC-MAPPING")
* transformation description from main metadata record to DC metadata
* applies to object pattern, or "specialise" (if the content model is further specification of an Item or a Container)
* applies to object pattern (if the content model is further specification of an Item or a Container)
* if a PID must be assigned for release (version and object PID, necessity of PID for content must be defined with component definition)
:and according last workshop a "TableOfContent - TOC"--[[User:Natasab|Natasa]] 11:46, 10 November 2008 (UTC)


Maybe flags, if specific methods create new version.
In the future, maybe flags if specific methods create new version.
:do not understand what this means--[[User:Natasab|Natasa]] 13:30, 10 November 2008 (UTC)
::Probably we should also add a flag if a content-stream is allowed or not for resources of this CModel--[[User:Natasab|Natasa]] 13:30, 10 November 2008 (UTC)


=== Lists ===
=== Lists ===


* model specific properties
: defines which properties (key-value pairs) are allowed to put in the content-model-specific section of an Item or Container. An allowed property is defined with name, datatype, and occurence. (May be obsolete because content-model-specific in Item and Container is seen as metadata and may be removed.)
::I would still not keep with this assumption. We have in content-model-specific properties in R4 put "local tags" i.e. those which are valid for publication items only, and are not indeed real Tag-relations.--[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
* additional metadata records
* additional metadata records
: Name, schema, and occurrence of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.
: Name and schema of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.
::Name and schema would be better, as the content model - beside descriptive information that holds only flag should also offer the possibility to map between differents chemas. --[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
::What is meant by occurence of additional metadata records? not clear with this information, probably we should not keep this information. --[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
* allowed formats (Item only)
* allowed formats (Item only)
:In here would not agree. First we need to define if item in cmodel can have components or not, and then for the components define allowed formats. That is a bit more complex, but better structured. --[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
: a list of possible components. A component is described by a ''name of the kind'' used for property ''content-category'', a list of allowed mime-types and alist of metadata records (see previous point)
: a list of mime-types. The binary content bound to a component of an Item must match one of these mime-types. (In conjunction with occurrence, metadata for a specific mime-type, type, type-label, etc., maybe better defined as rule or as set of properties applied to a component. See also content streams.)
::What is the difference between allowed formats and a list of allowed mime-types?--[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
::As content-category of the component is a component property, possible values should be defined within the cModel as well imho. --[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
* aggregation / description of members (Container only)
* aggregation / description of members (Container only)
: content model of allowed members, occurrences
: content model of allowed members, mandatory or optional
* ...
* ...


Questions:
Questions:
:Should number and kind of components of an Item be defined as property or in Rules?
:Should number and kind of components of an Item be defined as property or in Rules?
::What is the difference between property and Rules?--[[User:Natasab|Natasa]] 11:53, 10 November 2008 (UTC)
:: Not in form of a rule language document! Because we would like to evaluate content models in Fedora, so - if rule language - we need a description of the Fedora object. [[User:Frank|Frank]] 07:45, 28 July 2009 (UTC)
:::Aha, after reading I got it. But i would think that the content model should be formally defined in a manner that is not implementation specific (but still conforming to a certain structure i.e. we started before to define a schema on this). Then the Rule document can be generated based on the content of this structure. That way, we could allow that our content models are really interoperable. The rules are needed for internal validation in eSciDoc (and evtl. could be offered as a service). But they are probably not so clear for the end user. --[[User:Natasab|Natasa]] 11:56, 10 November 2008 (UTC)
:Should cascade behavior be defined as standalone property or with aggregation description? See "Cascade" below.
:Should cascade behavior be defined as standalone property or with aggregation description? See "Cascade" below.
:Are values optional and if so, what is the meaning of no value?
:Are values optional and if so, what is the meaning of no value?
Line 82: Line 83:
The rule document of a content model is a valid XML document included in
The rule document of a content model is a valid XML document included in
the Content Model Object.
the Content Model Object.
A first approach can be, just to store validation schemas in use by PubMan
inside the Content Model Object.
:This would not be possible, as the validation schema depend not only on CModel, but also on metadata version and the context.
::CModel can only store some very generic validation schema (i.e. default validation points) --[[User:Natasab|Natasa]] 13:32, 10 November 2008 (UTC)


== Components ==
== Components ==
Line 100: Line 96:
* metadata records (name, schema, and occurrence)
* metadata records (name, schema, and occurrence)
* ...
* ...
==== DC Mapping ====
Since the entire object is indexed for search and every XML element text content available via filter the DC mapping may be dropped for components.
Till then the default mapping (! no user configurable mapping) is applied to the  ''main metadata record''.
==== Questions ====


== Content Streams ==
== Content Streams ==
Line 113: Line 115:
''allowed member types'' where an entry consists of
''allowed member types'' where an entry consists of


* content model (unique inside the list)
* content model (unique inside the list; means members conform to that content model are allowed)
* cascade information (see below "Cascade")
* cascade information (see below "Cascade")
* unique parent (see below "Unique Parent")
* unique parent (see below "Unique Parent")
Line 123: Line 125:
: What means ''ordered''? Or should it be ''order'' and give a key for standard order?
: What means ''ordered''? Or should it be ''order'' and give a key for standard order?
::Originally, it was meant to be as the native physical order of the members in a container, i.e. the order in which they are created, and to enable adding them after e.g. 10th element in the aggregation. This could be used to automatically derive the default TOC. But no longer certain if needed at all, because then the ordering logic would be split into 2 places: TOC and Container. If it would in future help to automatically generate a TOC object - then we should keep it. --[[User:Natasab|Natasa]] 12:14, 10 November 2008 (UTC)
::Originally, it was meant to be as the native physical order of the members in a container, i.e. the order in which they are created, and to enable adding them after e.g. 10th element in the aggregation. This could be used to automatically derive the default TOC. But no longer certain if needed at all, because then the ordering logic would be split into 2 places: TOC and Container. If it would in future help to automatically generate a TOC object - then we should keep it. --[[User:Natasab|Natasa]] 12:14, 10 November 2008 (UTC)
::: So i would propose to drop it. But if not, it seems to me it's a boolean "ordered" flag!? [[User:Frank|Frank]] 17:36, 10 November 2008 (UTC)
* TOC Content Model ID


== Cascade ==
== Cascade ==
Line 140: Line 145:


:see also [[ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources]] for description (from our previous talks). I am not certain that these can go to the level of granularity we wish i.e. better like the previous with tried/required.--[[User:Natasab|Natasa]] 12:02, 10 November 2008 (UTC)
:see also [[ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources]] for description (from our previous talks). I am not certain that these can go to the level of granularity we wish i.e. better like the previous with tried/required.--[[User:Natasab|Natasa]] 12:02, 10 November 2008 (UTC)
:: There seem to be two limitations with this approach: 1) One single cascade attribute can not state relaxed submission and required release at the same time. 2) No ''members-first''; but I did not really get what that mean. [[User:Frank|Frank]] 10:17, 17 March 2009 (UTC)
Here and at [[ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources]] we discuss the following two approaches:
<aggregation cascade="SUBMIT,RELEASE,WITHDRAW,RELAXED" ...
<aggregation submit-rule="members-tried" release-rule="members-tried" withdraw-rule="members-tried" ...


== Unique Parent ==
== Unique Parent ==
Line 158: Line 169:


*I think in most cases it's a bad idea to introduce seemingly boolean property names and then let them have more than two values. So something like 'parent="unique|typed"' may be more appripriate. [[User:Robert|Robert]] 12:06, 10 November 2008 (UTC)
*I think in most cases it's a bad idea to introduce seemingly boolean property names and then let them have more than two values. So something like 'parent="unique|typed"' may be more appripriate. [[User:Robert|Robert]] 12:06, 10 November 2008 (UTC)
::: Agreed, but wanted to have one attribute for that unique parent behavior. What could be the value to attribute "parent" if parent must NOT be unique? Just not to state that attribute seems to me a bad idea, too. [[User:Frank|Frank]] 17:49, 10 November 2008 (UTC)


:Robert, I was actually editing the same page with very same remark:) .. - as true/false are expected values on their own, and we need to try to avoid "type" when we talk about content models, would suggest naming this differently i.e.  
:Robert, I was actually editing the same page with very same remark:) .. - as true/false are expected values on their own, and we need to try to avoid "type" when we talk about content models, would suggest naming this differently i.e.  
Line 170: Line 182:


[[Category:eSciDoc|Content Models]]
[[Category:eSciDoc|Content Models]]
[[Category:Content Models|ESciDoc Content Model Object]]

Latest revision as of 11:51, 24 September 2009

Note: The following is a reworked proposal for eSciDoc Developer Workshop on 14./15.10.08.

See also ESciDoc_Content_Model_in_Fedora.

eSciDoc Content Model[edit]

In eSciDoc infrastructure a content model describes the specialization of an Item or Container object and is stored inside the infrastructure as Content Model Object. Content Model Objects are managed via ContentModelService.

The content of a Content Model Object consists of common object properties, as known for Item Objects and Container Objects etc., of a set of predefined content model properties, governing the behavior and form of an Item or Container (e.g. initial state). In the future, a rule-document may be assigned to the Content Model Object. The rule-document is stored as valid XML document inside the Content Model Object and contains restrictions to an Item or Container expressed in a Rule or Modeling Language (e.g. Schematron-XML).

See also eSciDoc Content Models .

Common Object Properties[edit]

Common object properties, as known for Item Objects and Container Objects etc.

  • id (objid)
  • name
  • description
  • creation date
  • creator (created by)
  • modification date (last-modification-date)
  • status (public-status) + comment
  • PID
  • context (?)
  • lock-status, lock-date, lock-owner (?)
  • version, latest-version, release (in case of versioned content model)

Content Model Metadata[edit]

A DC metadata record in order to provide human readable information about this content model.

Are the properties name and description sufficient? (see above Common Object Properties) Frank 07:35, 24 September 2009 (UTC)
as final solution these are not sufficient. It is important to do smth like: author of the content model (not necessarily the user who created it), institution, some more descriptive information (maybe even in different languages). That is why descriptive metadata differ from properties. --Natasa 11:51, 24 September 2009 (UTC)

Predefined Content Model Properties[edit]

Predefined content model properties, governing the behavior and form of an Item or Container.

There are predefined content model properties in form of key-value pairs and in form of lists of values.

Key/Value[edit]

  • initial state (e.g. pending, submitted or released)
  • status transitions
  • versioning enabled
  • name of main metadata record
  • schema of main metadata record
  • transformation description from main metadata record to DC metadata (XSLT in datastream "DC-MAPPING")
  • applies to object pattern, or "specialise" (if the content model is further specification of an Item or a Container)
  • if a PID must be assigned for release (version and object PID, necessity of PID for content must be defined with component definition)

In the future, maybe flags if specific methods create new version.

Lists[edit]

  • additional metadata records
Name and schema of additional metadata records. Maybe a flag to just allow additional unspecified metadata records.
  • allowed formats (Item only)
a list of possible components. A component is described by a name of the kind used for property content-category, a list of allowed mime-types and alist of metadata records (see previous point)
  • aggregation / description of members (Container only)
content model of allowed members, mandatory or optional
  • ...

Questions:

Should number and kind of components of an Item be defined as property or in Rules?
Not in form of a rule language document! Because we would like to evaluate content models in Fedora, so - if rule language - we need a description of the Fedora object. Frank 07:45, 28 July 2009 (UTC)
Should cascade behavior be defined as standalone property or with aggregation description? See "Cascade" below.
Are values optional and if so, what is the meaning of no value?

Rule Document[edit]

The rule document of a content model is a valid XML document included in the Content Model Object.

Components[edit]

The description of an Item should contain at least a list of mime-types of component content (see above "allowed formats" in "List").

For further specification the rule-document may be used (see above "Rule Document"). Or the description of an Item may include an extended list of allowed formats where an entry consists of

  • mime-type
  • content-category
  • occurrence
  • metadata records (name, schema, and occurrence)
  • ...

DC Mapping[edit]

Since the entire object is indexed for search and every XML element text content available via filter the DC mapping may be dropped for components. Till then the default mapping (! no user configurable mapping) is applied to the main metadata record.

Questions[edit]

Content Streams[edit]

Sould be described like component (list and/or rule-document) but just mime-type and occurrence. A name may be defined but may colide with occurrence. If mime-type is text/xml it should be possible to specify an XML schema.

Tech. note: Because content streams are not a seperate object inside the infrastructure they have no objid but a name as metadata records have.

Aggregation[edit]

The description of a Container may include an extended list of allowed member types where an entry consists of

  • content model (unique inside the list; means members conform to that content model are allowed)
  • cascade information (see below "Cascade")
  • unique parent (see below "Unique Parent")
  • occurrence

The list itself may carry information like

  • order(ed)
What means ordered? Or should it be order and give a key for standard order?
Originally, it was meant to be as the native physical order of the members in a container, i.e. the order in which they are created, and to enable adding them after e.g. 10th element in the aggregation. This could be used to automatically derive the default TOC. But no longer certain if needed at all, because then the ordering logic would be split into 2 places: TOC and Container. If it would in future help to automatically generate a TOC object - then we should keep it. --Natasa 12:14, 10 November 2008 (UTC)
So i would propose to drop it. But if not, it seems to me it's a boolean "ordered" flag!? Frank 17:36, 10 November 2008 (UTC)
  • TOC Content Model ID

Cascade[edit]

As attribute to aggregation.

The value of cascade can be one of SUBMIT, RELEASE, WITHDRAW, LOCK, UNLOCK or a comma separated list of combinations of these values or simply ALL. Additionaly the value RELAXED can be set to state that cascading is allowed to fail. An implementation MUST accept the values in lower case.

The value of cascade states if members of an aggregation MUST, SHOULD or MUST NOT be affected by state change or lock.

see also ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources for description (from our previous talks). I am not certain that these can go to the level of granularity we wish i.e. better like the previous with tried/required.--Natasa 12:02, 10 November 2008 (UTC)
There seem to be two limitations with this approach: 1) One single cascade attribute can not state relaxed submission and required release at the same time. 2) No members-first; but I did not really get what that mean. Frank 10:17, 17 March 2009 (UTC)

Here and at ESciDoc_Content_Models#Aggregations_and_internal_life-cycle_of_resources we discuss the following two approaches:

<aggregation cascade="SUBMIT,RELEASE,WITHDRAW,RELAXED" ...
<aggregation submit-rule="members-tried" release-rule="members-tried" withdraw-rule="members-tried" ...

Unique Parent[edit]

As attribute to aggregation.

The attribute 'unique-parent' states if the container described by this content model is the only parent of its members or not, or if there is no other parent of its members with the same content model.

[unique-parent="false"] The members of the container described by this content model may have other parents.

[unique-parent="true"] The members of the container described by this content model are not allowed to have other parents.

[unique-parent="typed"] The members of the container described by this content model may have other parents but not of the same content model as the described container.

  • I think in most cases it's a bad idea to introduce seemingly boolean property names and then let them have more than two values. So something like 'parent="unique|typed"' may be more appripriate. Robert 12:06, 10 November 2008 (UTC)
Agreed, but wanted to have one attribute for that unique parent behavior. What could be the value to attribute "parent" if parent must NOT be unique? Just not to state that attribute seems to me a bad idea, too. Frank 17:49, 10 November 2008 (UTC)
Robert, I was actually editing the same page with very same remark:) .. - as true/false are expected values on their own, and we need to try to avoid "type" when we talk about content models, would suggest naming this differently i.e.
  • unique-parent =>parents-allowed
  • false=> single-container-any-model
  • true => many-containers-any-model
  • typed => many-containers-distinct-model

--Natasa 12:10, 10 November 2008 (UTC)

General remarks/discussion[edit]

  • As pointed somewhere above, would really try to make the CModel document self-contained. Rules part such as Schematron defined rules (or any other language defined rules) that are pre-compiled every time a cmodel definition is changed can be generated separately. This would allow human-readable CModel definition and interoperability across implementations. --Natasa 12:18, 10 November 2008 (UTC)