Difference between revisions of "EScience Seminar 2008/EScience-Seminar Aspects of long-term archiving"

From MPDLMediaWiki
Jump to navigation Jump to search
(added another set of slides)
Line 95: Line 95:
Sayeed Choudhury will provide an update related to the BRTF-SPDA's recent work featuring a working definition of economic sustainability and highlights from the Task Force's first draft report.  
Sayeed Choudhury will provide an update related to the BRTF-SPDA's recent work featuring a working definition of economic sustainability and highlights from the Task Force's first draft report.  
*Sayeed Choudhury's slides: [[media:ESci08_Sem_2_Overview_on_Sustainable_Digital_Preservation_and_Access_Choudhury.pdf|PDF, 0.3MB]]
*Sayeed Choudhury's slides: [[media:ESci08_Sem_2_Overview_on_Sustainable_Digital_Preservation_and_Access_Choudhury.pdf|PDF, 0.3MB]]
===Digital Long-Term Archiving at GWDG and other Archiving Systems (Dagmar Ullrich, GWDG)===
This talk informs about the current approach to digital Long-Term Archiving at GWDG. It shows which technical solutions are already in use for Bitstream Preservation. The kopal system which derived from the kopal project "Co-operative Development of a Long-Term Digital Information Archive" and is hosted at the GWDG is introduced. A short overview over other existing archiving systems is given.
*Dagmar Ullrich's slides: [[media:ESci08_Sem_2_LZAatGWDG_Ullrich.pdf|PDF, 0.8MB]]





Revision as of 06:01, 2 July 2008

Goal[edit]

Building on experience acquired in recent years (GWDG and RZG offering services for bitstream preservation, growing awareness of the need for open archive formats), strategies for long term archiving within the Max Planck Society will be developed. Furthermore, future service offerings and suggestions for file formats and metadata will be discussed. Organisational responsibilities for the lifecycle management of data (format migration, access strategies) within the Max Planck Society will be clarified.

Responsible for content[edit]

Dagmar Ullrich (GWDG)
Wolfgang Voges (MPDL)


Contributions[edit]

(The slides are being uploaded after submission of the respective final version)

General approach to digital Long-Term Preservation (dLTP)[edit]

  • Introduction, current situation and work done so far at the MPG (Dagmar Ullrich (GWDG), Wolfgang Voges (MPDL))
  • Dealing with Data: Roles, Rights, Responsibilities and Relationsships, Liz Lyon (UKOLN)
  • LTP of digital publications in a memory institution -- a challenge in the triangle of technology, integration and cooperation, Reinhard Altenhöner, DNB, (Slides, 3.8MB)
  • Requirements of e-Science and Grid Projects towards dLTP of Research Data, Jens Klump, GFZ Potsdam, (Slides, 0.3MB)


Technical aspects[edit]

  • Metadata for digital Long-Term Preservation, Michael Day (UKOLN)
  • Assessing file formats for dLTP, Caroline van Wijk (KB)
  • Persistent identifier for long-term archived data, Malte Dreyer (MPDL)


Organisational aspects[edit]

  • Rule-based Distributed Data Management, Reagan Moore, SDSC, (Slides, 3MB)
  • Standards and Standardization in the Context of eScience and dLTP, Peter Rödig (UniBwM)
  • Trustworthy Digital Archives, Susanne Dobratz, RZ HU Berlin (Slides, 0.5MB)
  • Overview of Sustainable Digital Preservation, Sayeed Choudhury, Blue Ribbon Task Force, (Slides, 0.3MB)
  • Calculating costs of dLTP, Neil Beagrie (Charles Beagrie Limited)


Current practices[edit]

  • The role of dLTP in the eSciDoc project, Natasa Bulatovic, MPDL, (Slides, 2.2MB)
  • Digital Long-Term Archiving at GWDG and other Archiving Systems, Dagmar Ullrich, GWDG, (Slides, 0.8MB)
  • Long-Term Archiving of Climate Model Data at WDC Climate and DKRZ, Michael Lautenschlager, MPI-M, (Slides, 2.5MB)
  • Digital Long-Term Preservation of linguistic resources at the MPI for Psycholinguistics, Paul Trilsbeek, Peter Wittenburg (MPIPL)


Future perspective for dLTP in the MPG, final discussion[edit]

  • Summary, Dagmar Ullrich (GWDG) Wolfgang Voges (MPDL)


Abstracts[edit]

LTP of digital publications in a memory institution -- a challenge in the triangle of technology, integration and cooperation (Reinhard Altenhöner, DNB)[edit]

One of the unresolved problems of the global information society is to ensure the long-term accessibility of digital documents. Especially for those institutions which aim for the availability of information objects in several hundred years, the challenges are impressive. Not only technological aspects but also organisational questions have to be answered. And at least the question of how the Long-term preservation should be integrated into the life-cycle of a digital information object has to be answered. The example of kopal (Co-operative Development of a Long-Term Digital Information Archive), a public funded, successful realisation of a cooperative digital archive-solution, shows how one possible technological solution looks like and how the development of subsequent steps helps to understand the specific challenges for libraries and cultural heritage organisations in terms of the underlying technology and the need for cooperation and for the integration of LTP into the life cycle of digital objects.
http://www.kopal.langzeitarchivierung.de/index.php.en


Requirements of e-Science and Grid Projects towards dLTP of Research Data (Jens Klump, GFZ Potsdam)[edit]

The enormous amounts of data from Grid projects and the complexity of data from e-science projects suggest that these new types of projects also have new requirements towards long- term archiving of data. On the other hand, Grid technology and semantic tools emerging from e-science might provide us with new methods that may be useful in long-term digital preservation.

The study "Requirements of e-science and Grid projects towards long-term archiving of scientific and scholarly data" investigates from a technological and from a management perspective whether existing infrastructures in data producing research e-science and Grid communities meet the requirements of long-term digital preservation. The study also investigates whether technologies and best practices from e-science and Grid project can be transferred to organisations and systems in the field of long-term digital preservation.

The interviews conducted as part of this study showed considerable differences between projects in the way they approached long-term digital preservation of data. Their achievements –but also their deficits– are analysed and discussed. The recommendations given in this study are derived from this analysis and discussion with stakeholders in e- science and Grid projects.


Overview of Sustainable Digital Preservation (Sayeed Choudhury, Blue Ribbon Task Force)[edit]

Johns Hopkins University (JHU) has initiated a series of data curation activities that focus on data and publications as new compound objects. This work has been most thoroughly explored in the context of the Virtual Observatory. While much of this work has been technical in nature, an equally important aspect for consideration is the economic issues of sustainability. Choudhury who leads the JHU Libraries' data curation efforts is also a member of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF- SPDA). The BRTF-SPDA is funded by the US National Science Foundation and the Andrew W. Mellon Foundation, in partnership with the US Library of Congress, the UK Joint Information Systems Committee, the US Council on Library and Information Resources and the US National Archives and Records Administration. During the next two years, the BRTF- SDPA will explore the sustainability challenge with the goal of delivering specific recommendations that are economically viable of use to a broad audience, from individuals to institutions and corporations to cultural heritage centers.

This Task Force will:

  • Conduct an analysis of previous and current models for sustainable digital preservation, and identify current best practices among existing collections, repositories and analogous enterprises.
  • Develop a set of economically viable recommendations to catalyze the development of reliable strategies for the preservation of digital information.
  • Provide a research agenda to organize and motivate future work in the specific area of economic sustainability of digital information.

Sayeed Choudhury will provide an update related to the BRTF-SPDA's recent work featuring a working definition of economic sustainability and highlights from the Task Force's first draft report.


Digital Long-Term Archiving at GWDG and other Archiving Systems (Dagmar Ullrich, GWDG)[edit]

This talk informs about the current approach to digital Long-Term Archiving at GWDG. It shows which technical solutions are already in use for Bitstream Preservation. The kopal system which derived from the kopal project "Co-operative Development of a Long-Term Digital Information Archive" and is hosted at the GWDG is introduced. A short overview over other existing archiving systems is given.


Long-Term Archiving of Climate Model Data at WDC Climate and DKRZ (Michael Lautenschlager, MPI-M)[edit]

The computing capabilities for production of Earth system model data are growing faster than the prices for mass storage media sink. If the archive philosophy would be left unchanged during the migration to the next compute server generation consequently the amount of money for long-term archiving rises and the total amount of money for archiving tends to exceed the money which is left for compute services. At WDCC (World Data Center Climate) and DKRZ (German Climate Computing Centre) a new concept for long-term archiving has been developed which addresses this problem and improves the overall confidence in the long-term archive. The new archive concept separates data storage with expiration date at the scientific project level and the documented long-term archive. The transition process to the new archive concept already started and at the end we expect to have a completely documented long-term archive with a searchable data catalogue. This archive concept is supported by a four level storage hierarchy which reflects the lifetimes of the different data categories.