ESciDoc Logical Data Model

From MPDLMediaWiki
Jump to navigation Jump to search


Understanding the structure and the nature of the data is essential for the ability to meet the requirements of managing various type of content within an eSciDoc repository. Therefore the eSciDoc logical data model was developed to enable on one hand side implementation of core data services based on high-level abstractions, and on the other hand side to allow for further specialization of data and further implementation of specialized services. This page may be used as a starting point for understanding the eSciDoc data structures. The eSciDoc manages two general categories of data:

  • Resources (content resources) - the content of the repository such as: articles, book, images, image albums, scanned manuscripts, pages etc.
  • MasterData - additional classes of data that are used for management of Resources such as: organizational units, contexts.

A simple delineation between these two data categories may be stated in the following manner: resources are the real content that can be further extended, shared and preserved. Master data are used for content (i.e. resource) administration and as referenced entities of importance. Master data can also be referenced by objects outside of the core eSciDoc repository.


Content Resources: Item and Container[edit]

Content resources are defined by two generic object patterns: Item and Container.

  • An Item resource consists of metadata records (e.g. eSciDoc publication metadata, SISIS MAB record, MODS record, Dublin Core record) and optionally of components that represent the actual content (e.g. PDF file, JPEG file, XML file). See also SurrogateItem.
  • A Container resource is an aggregation of other resources that allows for aggregating other items or containers. Like the Item resource, Container can be described by multiple metadata records.


Each resource (Item or Container) is maintained in a single administrative Context. Contexts are responsibility of organizations (e.g. on or more a project groups, institutions etc.). Organizations responsibly for contexts are define the settings within the context (by the mechanism of so called Admin Descriptor) in accordance with their needs to express rules for content creation, update, quality assurance of the metadata, dissemination, preservation, authorization policies, submission policies, etc.

Content models[edit]

Items and Containers are very generic resources and they do not speak for themselves about the content they represent or about their own structure e.g. what kind of metadata may be associated with them, what kind of members a container aggregates, or what kind of resources they represent semantically. Therefore, eSciDoc logical data model introduced the concept of a Content model. Each Item or a Container has to claim that it is an instance of exactly one content model. There is no limitation on the number of instances that may claim to be of a specific content model.

Content model defines in general:

  • the type and structure of the content resources (item, container, members)
  • a set of services that may be associated with the content resources

As being formalized, the definition of the content model is additionally used (as of core-service release 1.3 of eSciDoc) for validation of the instantiated resources.

Organizational units[edit]

Organizational units represent institutions, their internal organization (institutes, departments, projects etc.) and their structural changes during time (Organizational units history). Similar to the content resources in the system, each organizational unit may be described with a metadata record, containing most important information about the organization (in accordance with the needs of the users of the system). An organizational unit may be parent organizational unit for many other organizational units. Vice versa, an organizational unit may have several parent organizational units at the same time. As being in the category of master data in the system, organizational units are referenced in the Contexts (to point to institutional responsibility), may be referenced in the metadata of the resources, may be used for definition of authorization rules to access the content (institutional visibility) etc.