Difference between revisions of "EScience Seminar 2009/EScience-Seminar Repository Systems"

From MPDLMediaWiki
Jump to navigation Jump to search
(added link to figure)
(finished DELOS)
Line 21: Line 21:
The DELOS Digital Library project <ref>http://www.delos.info/</ref> presented a careful analysis of a number of aspects of "Digital Libraries" which can be transformed to digital repositories. They present a summary of the main points of their manifesto <ref>http://www.delos.info/index.php?option=com_content&task=view&id=345&Itemid=#docs</ref>, which we simply include here. As a consequence of this manifesto they derive an abstract reference model. There is no clear separation between Digital Libraries and Digital Repositories, but it can be stated that a proper Digital Library model will include a proper Digital Repository as its core.  
The DELOS Digital Library project <ref>http://www.delos.info/</ref> presented a careful analysis of a number of aspects of "Digital Libraries" which can be transformed to digital repositories. They present a summary of the main points of their manifesto <ref>http://www.delos.info/index.php?option=com_content&task=view&id=345&Itemid=#docs</ref>, which we simply include here. As a consequence of this manifesto they derive an abstract reference model. There is no clear separation between Digital Libraries and Digital Repositories, but it can be stated that a proper Digital Library model will include a proper Digital Repository as its core.  


=====The Digital Library Manifesto in Brief=====
The '''Digital Library Manifesto''' in Brief<br/>
It is commonly understood that the Digital Library universe is a complex and multifaceted domain that cannot be captured by a single definition. The Manifesto organizes the pieces constituting the puzzle into a single framework [http://www.dlib.org/dlib/march07/castelli/castelli-fig1.jpg].
It is commonly understood that the Digital Library universe is a complex and multifaceted domain that cannot be captured by a single definition. The Manifesto <ref>http://www.dlib.org/dlib/march07/castelli/03castelli.html</ref> organizes the pieces constituting the puzzle into a single framework [http://www.dlib.org/dlib/march07/castelli/castelli-fig1.jpg].


In particular, it identifies the three different types of systems operating in the Digital Library universe, i.e.
In particular, it identifies the '''three different types of systems''' operating in the Digital Library universe, i.e.
# the Digital Library (DL) – the final ‘system’ actually perceived by the end-users as being the digital library;
# the Digital Library (DL) – the final ‘system’ actually perceived by the end-users as being the digital library;
# the Digital Library System (DLS) – the deployed and running software system that implements the DL facilities; and
# the Digital Library System (DLS) – the deployed and running software system that implements the DL facilities; and
# the Digital Library Management System (DLMS) – the generic software system that supports the production and administration of DLSs and the integration of additional software offering more refined, specialized or advanced facilities.  
# the Digital Library Management System (DLMS) – the generic software system that supports the production and administration of DLSs and the integration of additional software offering more refined, specialized or advanced facilities.  


=====The Manifesto also organizes the Digital Library universe into domains=====
The Manifesto also organizes the Digital Library universe into '''domains'''
# The Resource Domain captures generic characteristics that are common to the other specialized domains. Building on this, the model introduces six orthogonal and complementary domains that together strongly characterize the Digital Library universe and capture its specificities with respect to generic information systems. These specialized domains are:
# The Resource Domain captures generic characteristics that are common to the other specialized domains. Building on this, the model introduces '''six''' orthogonal and complementary domains that together strongly characterize the Digital Library universe and capture its specificities with respect to generic information systems. These specialized domains are:
# Content – represents the information made available;
# Content – represents the information made available;
# User – represents the actors interacting with the system;
# User – represents the actors interacting with the system;
Line 46: Line 46:
Further, it states that there is the need for modeling focused views. The ultimate goal of the whole reference model activity is to clarify the Digital Library universe to the different actors by tailoring the representation to their specific needs. The three systems organize the universe in concentric layers that are revealed to interested players only. Meanwhile, the six domains constitute the complementary perspectives from which interested players are allowed to see each layer. Thus, the framework is potentially complex because it aims at accommodating all the various needs. However, it is highly modular and can therefore be easily adapted to capture the needs arising in specific application contexts.
Further, it states that there is the need for modeling focused views. The ultimate goal of the whole reference model activity is to clarify the Digital Library universe to the different actors by tailoring the representation to their specific needs. The three systems organize the universe in concentric layers that are revealed to interested players only. Meanwhile, the six domains constitute the complementary perspectives from which interested players are allowed to see each layer. Thus, the framework is potentially complex because it aims at accommodating all the various needs. However, it is highly modular and can therefore be easily adapted to capture the needs arising in specific application contexts.


Finally, the Manifesto gives reason for proceeding with '''different levels of abstraction''' while laying down the complete framework. These different levels of abstraction, which lead conceptually from the modeling to the implementation, are captured in Figure II.1-2 where the core role of the Reference Model is illustrated; all the other elements constituting the envisaged DL development methodology chain start from here. It drives the definition of any Reference Architecture that proposes an optimal architectural pattern for a specific class of 9 It is still under discussion whether two other players should be added to this list, namely Institutions and Industries. By Institutions are meant organizations, either concrete or virtual, having the important role of forming the Digital Library. By Industries are meant the institutions performing economic activities concerned with the Digital Library, by providing either the software or the service.
Finally, the Manifesto gives reason for proceeding with '''different levels of abstraction''' while laying down the complete framework. These different levels of abstraction, which lead conceptually from the modeling to the implementation, are captured in this [http://www.dlib.org/dlib/march07/castelli/castelli-fig5.jpg Figure] where the core role of the Reference Model is illustrated; all the other elements constituting the envisaged DL development methodology chain start from here. It drives the definition of any Reference Architecture that proposes an optimal architectural pattern for a specific class of 9 It is still under discussion whether two other players should be added to this list, namely Institutions and Industries. By Institutions are meant organizations, either concrete or virtual, having the important role of forming the Digital Library. By Industries are meant the institutions performing economic activities concerned with the Digital Library, by providing either the software or the service.


   
   

Revision as of 11:34, 21 April 2009

Introduction[edit]

The second MPG eScience seminar of 2009 is devoted to one of the big challenges for research institutes: how to guarantee persistence and continuous access to its records to all interested and authorized researchers. Therefore, each institute needs to have a strategy of how to manage the increasing amounts and complexity of data, how to guarantee online access to it and how to replicate the data for preservation purposes. The term "digital repositories" seems to properly describe the layer of functionality that we want. JISC defines the term in the following words [1]: "Repositories are important for universities and colleges in helping to capture, manage, and share institutional assets as a part of their information strategy. A digital repository can hold a wide range of materials for a variety of purposes and users. It can support learning, research and administrative processes."

The concept "Digital Repository" is not new of course, although its meaning changed rapidly due to the new requirements caused mainly by the increasing amount and complexity of data, the Internet and the awareness of long-term accessibility. Within the MPG it was Friedrich Hertweck [2] from RZG (Computer Centre in Garching) who created AMOS (Advanced Multi user Operating System) in the 70-ies. AMOS [3] was an excellent piece of software, which allowed scientists in Plasma Physics (amongst others) to store and retrieve their large data volumes, and it turned out to be an island of stability for many years. Natural sciences now need to maintain repositories that cover petabytes of data, which in general is highly structured. But also in the humanities we are now close to maintaining hundreds of terabytes where often the problem is not the sheer amount, but the inherent complexity of the data sets.

Since a few years the concept of "digital repositories" is being discussed in a number of different contexts. A recent study from the DRIVER project [4], covering 114 digital repositories, revealed that most institutes associate with this term repositories for publications. More than 80% of these repositories contain journal articles and other types of publications, only about 10 % also store primary data sets and common data types such as audio and video recordings. This is fully in line with the experiences in the European CLARIN project [5], which wants to build a network of centers functioning as the backbone for offering persistent access to language resources and services. About 80% of the potential centers are busy restructuring their repository to fulfill the new requirements. From these two results we can conclude

  • that researchers are used to store ePublications in proper repositories and associate them with proper metadata for example to support discovery and
  • that researchers are not used to store their research data in such ways that other researchers can easily access them.

It seems that in general researchers still use idiosyncratic methods to store their data, that they tend to structure it by minimalistic solutions, such as file names and directory structures, and that long-term accessibility was/is not an issue of primary concern. For an increasing number of researchers, in particular when they are participating in international data driven collaborations (e.g. genomics or climate), it becomes increasingly obvious, however, that they need to change their behavior.

This eScience seminar will therefore focus on repository solutions for research data, be it primary data generated by some types of sensors or secondary data that is generated by researchers to allow interpretations. The field of primary and in particular secondary data is characterized by an extreme heterogeneity of data types, formats, and implicit or explicit semantics, making it a difficult field for abstractions. This is different from ePublications where the data types and formats are widely standardized, where metadata characterization has a long history and where the semantics of the content can be interpreted by the reading researcher.


Views on Digital Repositories[edit]

Much has been written about digital repositories during the last years. We would like to cite three initiatives without claiming being comprehensive (see below). Important other initiatives have thought about repositories and layers of abstractions as well, such as FEDORA [6] or OAI-ORE [7].

DELOS Digital Library[edit]

The DELOS Digital Library project [8] presented a careful analysis of a number of aspects of "Digital Libraries" which can be transformed to digital repositories. They present a summary of the main points of their manifesto [9], which we simply include here. As a consequence of this manifesto they derive an abstract reference model. There is no clear separation between Digital Libraries and Digital Repositories, but it can be stated that a proper Digital Library model will include a proper Digital Repository as its core.

The Digital Library Manifesto in Brief
It is commonly understood that the Digital Library universe is a complex and multifaceted domain that cannot be captured by a single definition. The Manifesto [10] organizes the pieces constituting the puzzle into a single framework [1].

In particular, it identifies the three different types of systems operating in the Digital Library universe, i.e.

  1. the Digital Library (DL) – the final ‘system’ actually perceived by the end-users as being the digital library;
  2. the Digital Library System (DLS) – the deployed and running software system that implements the DL facilities; and
  3. the Digital Library Management System (DLMS) – the generic software system that supports the production and administration of DLSs and the integration of additional software offering more refined, specialized or advanced facilities.

The Manifesto also organizes the Digital Library universe into domains

  1. The Resource Domain captures generic characteristics that are common to the other specialized domains. Building on this, the model introduces six orthogonal and complementary domains that together strongly characterize the Digital Library universe and capture its specificities with respect to generic information systems. These specialized domains are:
  2. Content – represents the information made available;
  3. User – represents the actors interacting with the system;
  4. Functionality – represents the facilities supported;
  5. Policy – represents the rules and conditions, including digital rights, governing the operation;
  6. Quality – represents the aspects needed to consider digital library systems from a quality point of view;
  7. Architecture – represents the physical software (and hardware) constituents concretely realizing the whole.

Another contribution of the Manifesto is recognizing the existence of various players acting in the DL universe and cooperating in the operation of the whole. In particular,

  • The DL End-Users are the ultimate clients the Digital Library is going to serve.
  • The DL Designers are the organizers and orchestrators of the Digital Library from the application point of view.
  • The DL System Administrators are the organizers and orchestrators from the physical point of view.
  • The DL Application Developers are the implementers of the software parts needed to realize the Digital Library.

Further, it states that there is the need for modeling focused views. The ultimate goal of the whole reference model activity is to clarify the Digital Library universe to the different actors by tailoring the representation to their specific needs. The three systems organize the universe in concentric layers that are revealed to interested players only. Meanwhile, the six domains constitute the complementary perspectives from which interested players are allowed to see each layer. Thus, the framework is potentially complex because it aims at accommodating all the various needs. However, it is highly modular and can therefore be easily adapted to capture the needs arising in specific application contexts.

Finally, the Manifesto gives reason for proceeding with different levels of abstraction while laying down the complete framework. These different levels of abstraction, which lead conceptually from the modeling to the implementation, are captured in this Figure where the core role of the Reference Model is illustrated; all the other elements constituting the envisaged DL development methodology chain start from here. It drives the definition of any Reference Architecture that proposes an optimal architectural pattern for a specific class of 9 It is still under discussion whether two other players should be added to this list, namely Institutions and Industries. By Institutions are meant organizations, either concrete or virtual, having the important role of forming the Digital Library. By Industries are meant the institutions performing economic activities concerned with the Digital Library, by providing either the software or the service.



Venue[edit]

RZG Garching


Date[edit]

25/26 June 2009


Responsible for content[edit]

Malte Dreyer, Andreas Gros, (MPDL), Stefan Heinzel (RZG), Peter Wittenburg, Daan Broeder (MPI for Psycholinguistics), Frank Toussaint, Michael Lautenschlager (MPI for Meteorology)

Speakers[edit]

Registration[edit]

Registration is open at: http://escience.mpg.de/registration_en.html


References[edit]