ESciDoc Logical Data Model

From MPDLMediaWiki
Revision as of 09:09, 28 September 2009 by Natasab (talk | contribs) (→‎Introduction)
Jump to navigation Jump to search

Status: In PROGRESS

This is a protected page.

Restricted Access to eSciDoc group

Introduction[edit]

Understanding the structure and the nature of the data is essential for the ability to meet the requirements of managing various type of content within an eSciDoc repository. Therefore the eSciDoc logical data model was developed to enable on one hand side implementation of core data services based on high-level abstractions, and on the other hand side to allow for further specialization of data and further implementation of specialized services. This page may be used as a starting point for understanding the eSciDoc data structures. The eSciDoc manages two general categories of data:


  • Resources (content resources) - the content of the repository such as: articles, book, images, image albums, scanned manuscripts, pages etc.
  • MasterData - additional classes of data that are used for management of Resources such as: organizational units, contexts.


A simple delineation between these two data categories may be stated in the following manner: resources are the real content that can be further extended, shared and preserved. Master data are used for content (i.e. resource) administration and as referenced entities of importance. Master data can also be referenced by objects outside of the core eSciDoc repository.





LDMExplained.png

On content resources[edit]

Content resources are defined by two generic object patterns: Item and Container.

  • An Item resource consists of metadata records (e.g. eSciDoc publication metadata, SISIS MAB record, MODS record, Dublin Core record) and optionally of components that represent the actual content (e.g. PDF file, JPEG file, XML file).
  • A Container resource is an aggregation of other resources that allows for aggregating other items or containers. Like the Item resource, Container can be described by multiple metadata records.


On Contexts[edit]

Each resource (Item or Container) is maintained in a single administrative Context. Contexts are responsibility of organizations (e.g. on or more a project groups, institutions etc.). Organizations responsibly for contexts are define the settings within the context (by the mechanism of so called Admin Descriptor) in accordance with their needs to express rules for content creation, update, quality assurance of the metadata, dissemination, preservation, authorization policies, submission policies, etc.


On Content models[edit]

Items and Containers are very generic resources and they do not speak for themselves about the content they represent or about their own structure e.g. what kind of metadata may be associated with them, what kind of members a container aggregates, or what kind of resources they represent semantically. Therefore, eSciDoc logical data model introduced the concept of a Content model. Each Item or a Container has to claim that it is an instance of exactly one content model. There is no limitation on the number of instances that may claim to be of a specific content model.

Content model defines in general:

  • the type and structure of the content resources (item, container, members)
  • a set of services that may be associated with the content resources


A Content Model is a formal representation of set of content resources such as an integrated image and text view of a scanned manuscript page or a precisely documented collection of images.

Example: a content model named "CModel: Publication" defines a resource which is an Item, has bibliographic metadata record in accordance with the ESciDoc Publication metadata profile, and may have several PDF file associated that represent the publisher version, the pre-print or some supplementary material. It is used to represent content resources which are published Articles, Conference Papers, Books etc.


Example: a digitized book is a Container of book page items and related transcription items. The content model of the book container (see image below) defines that is has bibliographic metadata based on the MODS and MAB metadata schema. A digitized book container aggregates page items. Each page item consists of the digitized image of the book page and a metadata record. The metadata record may contain metadata inherited from the book container metadata. In addition, it may has its own, page item specific metadata such as: page number (e.g. 1, 2, 3, 4 or I, II, III, IV), chapter information.


CModel Example.jpg



Besides providing semantic information about the content and the structure of a resource, a content model may additionally define services that are applicable for the resources that are instances of the content model. Such services are for example specialized image viewers, TEI-formatted text viewers, services that offer various transformations etc.

As being formalized, the definition of the content model is additionally used (as of core-service release 1.3 of eSciDoc) for validation of the instantiated resources.

Data model explained[edit]

  • Item
  • Container
  • Context
  • Organizational unit