Difference between revisions of "ESciDoc Logical Data Model"

From MPDLMediaWiki
Jump to navigation Jump to search
 
(51 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Status: ''In PROGRESS''
==Introduction==
==Introduction==
An overview (data model) of the basic object patterns within the eSciDoc.
Understanding the structure and the nature of the data is essential for the ability to meet the requirements of managing various type of content within an eSciDoc repository. Therefore the eSciDoc logical data model was developed to enable on one hand side implementation of core data services based on high-level abstractions, and on the other hand side to allow for further specialization of data and further implementation of specialized services.
This page may be used as a starting point for understanding the eSciDoc data structures. The eSciDoc manages two general categories of data:


==Basic object patterns==


*Item
*'''Resources''' (content resources) - the content of the repository such as: articles, book, images, image albums, scanned manuscripts, pages etc.
*Container
*'''MasterData''' - additional classes of data that are used for management of Resources such as: organizational units, contexts.
*Context
*Organizational unit


==Item==


[[Image:ItemModel.jpg]]
A simple delineation between these two data categories may be stated in the following manner: resources are the real content that can be further extended, shared and preserved. Master data are used for content (i.e. resource) administration and as referenced entities of importance. Master data can also be referenced by objects outside of the core eSciDoc repository.


<br/><br/><br/>


*An '''Item''' represents minimal unit of content managed in the system. Each Item comprises of:
[[Image:LDMExplained.png]]
**at least one '''MetadataRecord''' - represents one or more metadata descriptions of the overall content represented with the item
**zero or more '''Component''' - represents a single content and at least one metadata description


*For simplicity reasons properties of the objects are not represented on the diagram. Properties are managed by the escidoc repository itself and may change during time. For more information on the properties and ther usage please check the documentation available at [https://www.escidoc.org/JSPWiki/en/ObjectManager#section-ObjectManager-Item eSciDoc Item Service]
===Examples===
====Publication items====
*An Item object is an Article published in a journal
**''MetadataRecord'': contains the bibliographic metadata (title, author, publication type e.g. journal article, author affiliation, year of publication, etc.) structured accordingly a particular metadata profile
**''Component1'':one published version of the fulltext of the article
***''Component1.MetadataRecord1'': title, description of the full-text, reference to the license
***''Component1.MetadataRecord2'': technical metadata of the file (e.g. generated from JHove)
***''Component1.Content'': the actual bytestream e.g. PDF or reference(link) to the content external to the repository
**''Component2'':supplementary material e.g. data on results provided in the fulltext of the article
***''Component2.MetadataRecord1'': title, description of the file
***''Component2.MetadataRecord2'': technical metadata of the file (e.g. generated from JHove)
***''Component2.Content'': the actual file or reference(link) to the content external to the repository
**Instances with actual content:
***[http://pubman.mpdl.mpg.de/pubman/faces/viewItemFullPage.jsp?itemId=escidoc:60092| Publication item presented]
***[http://coreservice.mpdl.mpg.de/ir/item/escidoc:60092 Publication item in the repository]
**Instances with referenced content:
***[http://http://pubman.mpdl.mpg.de/pubman/faces/viewItemFullPage.jsp?itemId=escidoc:95177| Publication item presented]
***[http://coreservice.mpdl.mpg.de/ir/item/escidoc:95177 Publication item in the repository]


====Face items====
===Content Resources: Item and Container===
*An Item object is an image of a facial expression, showing emotions such as: happiness, anger, saddness etc.. Each item contains two sets of image captions of the expression (each face/expression was captured twice). Sets are named "a" for the first image caption and "b" for the second caption.
Content resources are defined by two generic object patterns: [[ESciDoc_Logical_Data_Model/Item|Item]] and [[ESciDoc_Logical_Data_Model/Container|Container]].  
**''MetadataRecord'':contains metadata of the face and expression such as: gender, age, emotion, age-group
*An ''Item'' resource consists of metadata records (e.g. eSciDoc publication metadata, SISIS MAB record, MODS record, Dublin Core record) and optionally of components that represent the actual content (e.g. PDF file, JPEG file, XML file). See also [[ESciDoc_Logical_Data_Model/SurrogateItem|SurrogateItem]].
**''Component1'': an image that captures the facial expression
*A ''Container'' resource is an aggregation of other resources that allows for aggregating other items or containers. Like the Item resource, Container can be described by multiple metadata records.
***''Component1.MetadataRecord1'': contains information on the image caption set; type of the image such as: original resolution, thumbnail or web resolution.
 
***''Component1.MetadataRecord2'': technical metadata of the image (e.g. generated from JHove)
===Contexts===
***''Component1.Content'': the actual bytestream e.g. JPEG file itself
Each resource (Item or Container) is maintained in a single administrative '''Context'''. Contexts are responsibility of organizations (e.g. on or more a project groups, institutions etc.). Organizations responsibly for contexts are define the settings within the context (by the mechanism of so called ''Admin Descriptor'') in accordance with their needs to express rules for content creation, update, quality assurance of the metadata, dissemination, preservation, authorization policies, submission policies, etc.
**''Component2'': an image that captures the facial expression
 
***''Component2.MetadataRecord1'': contains information on the image caption set; type of the image such as: thumbnail
===Content models===
***''Component1.MetadataRecord2'': technical metadata of the image (e.g. generated from JHove)
Items and Containers are very generic resources and they do not speak for themselves about the content they represent or about their own structure e.g. what kind of metadata may be associated with them, what kind of members a container aggregates, or what kind of resources they represent semantically. Therefore, eSciDoc logical data model introduced the concept of a '''Content model'''. Each Item or a Container has to claim that it is an instance of exactly one content model. There is no limitation on the number of instances that may claim to be of a specific content model.
***''Component1.Content'': the actual bytestream e.g. JPEG file itself
 
**Instances:
[[ESciDoc_Logical_Data_Model/Content_Model|Content model]] defines in general:
***[http://faces.mpib-berlin.mpg.de/details/escidoc:43864 Face item presented]
*the type and structure of the content resources (item, container, members)
***[http://coreservice.mpdl.mpg.de/ir/item/escidoc:43864 Face item in the repository]
*a set of services that may be associated with the content resources
 
As being formalized, the definition of the content model is additionally used (as of core-service release 1.3 of eSciDoc)  for validation of the instantiated resources.
 
===Organizational units===
[[ESciDoc_Logical_Data_Model/Organizational_Unit|Organizational units]] represent institutions, their internal organization (institutes, departments, projects etc.) and their structural changes during time (Organizational units history). Similar to the content resources in the system, each organizational unit may be described with a metadata record, containing most important information about the organization (in accordance with the needs of the users of the system).
An organizational unit may be parent organizational unit for many other organizational units. Vice versa, an organizational unit may have several parent organizational units at the same time.  
As being in the category of master data in the system, organizational units are referenced in the Contexts (to point to institutional responsibility), may be referenced in the metadata of the resources, may be used for definition of authorization rules to access the content (institutional visibility) etc.
 
[[Category:ESciDoc_Logical_Data_Model| ]]

Latest revision as of 09:37, 10 November 2011

Introduction[edit]

Understanding the structure and the nature of the data is essential for the ability to meet the requirements of managing various type of content within an eSciDoc repository. Therefore the eSciDoc logical data model was developed to enable on one hand side implementation of core data services based on high-level abstractions, and on the other hand side to allow for further specialization of data and further implementation of specialized services. This page may be used as a starting point for understanding the eSciDoc data structures. The eSciDoc manages two general categories of data:


  • Resources (content resources) - the content of the repository such as: articles, book, images, image albums, scanned manuscripts, pages etc.
  • MasterData - additional classes of data that are used for management of Resources such as: organizational units, contexts.


A simple delineation between these two data categories may be stated in the following manner: resources are the real content that can be further extended, shared and preserved. Master data are used for content (i.e. resource) administration and as referenced entities of importance. Master data can also be referenced by objects outside of the core eSciDoc repository.




LDMExplained.png


Content Resources: Item and Container[edit]

Content resources are defined by two generic object patterns: Item and Container.

  • An Item resource consists of metadata records (e.g. eSciDoc publication metadata, SISIS MAB record, MODS record, Dublin Core record) and optionally of components that represent the actual content (e.g. PDF file, JPEG file, XML file). See also SurrogateItem.
  • A Container resource is an aggregation of other resources that allows for aggregating other items or containers. Like the Item resource, Container can be described by multiple metadata records.

Contexts[edit]

Each resource (Item or Container) is maintained in a single administrative Context. Contexts are responsibility of organizations (e.g. on or more a project groups, institutions etc.). Organizations responsibly for contexts are define the settings within the context (by the mechanism of so called Admin Descriptor) in accordance with their needs to express rules for content creation, update, quality assurance of the metadata, dissemination, preservation, authorization policies, submission policies, etc.

Content models[edit]

Items and Containers are very generic resources and they do not speak for themselves about the content they represent or about their own structure e.g. what kind of metadata may be associated with them, what kind of members a container aggregates, or what kind of resources they represent semantically. Therefore, eSciDoc logical data model introduced the concept of a Content model. Each Item or a Container has to claim that it is an instance of exactly one content model. There is no limitation on the number of instances that may claim to be of a specific content model.

Content model defines in general:

  • the type and structure of the content resources (item, container, members)
  • a set of services that may be associated with the content resources

As being formalized, the definition of the content model is additionally used (as of core-service release 1.3 of eSciDoc) for validation of the instantiated resources.

Organizational units[edit]

Organizational units represent institutions, their internal organization (institutes, departments, projects etc.) and their structural changes during time (Organizational units history). Similar to the content resources in the system, each organizational unit may be described with a metadata record, containing most important information about the organization (in accordance with the needs of the users of the system). An organizational unit may be parent organizational unit for many other organizational units. Vice versa, an organizational unit may have several parent organizational units at the same time. As being in the category of master data in the system, organizational units are referenced in the Contexts (to point to institutional responsibility), may be referenced in the metadata of the resources, may be used for definition of authorization rules to access the content (institutional visibility) etc.