2011-02-01 SHAMAN nestor LTP WORKSHOP

MPDL

Event: '''Introduction to New Approaches in Digital Preservation. Interim Results and Perspectives of the SHAMAN Project.' Joint SHAMAN - nestor Training event'' 1 February 2011, Deutsche Nationalbibliothek, Adickesallee 1, Frankfurt am Main, Germany

Participant: Vlad

Project SHAMAN - Sustaining Heritage Access through Multivalent ArchiviNg

Co-funded by EU.

Many partners, both academical and commercial domains (DNB, FernUni in Hagen, Georg-August Uni in Göttingen, Uni of Liverpool, Xerox, Philips, etc.)

SHAMAN will provide Open Distributed Resource Management Infrastructure Framework based on GRID Resource Integration.

Challenge is to be "'conceptual and technical reference architecture offering more complete set of features for supporting LTP as any other contemporary systems/approaches'"

SHAMAN is OAIS archive in the context of information lifecycles with Ontologies and Context Modelling.

Policies
are formally described in W3C Rule Interchange Format (RIF), RIF presentation Features:
 * Formal notation (XML) for all kind of Rules: Rules, Abstract Rules, Low Level Rules
 * External references
 * XML for objects
 * WSDL for services

Example for iRODS rules has been shown.

Working group pasopol@googlegroups.com (Practical AspectS Of POLicies) has been established. Already in the group: SHAMAN, PoDRI, DRESNET.

Ontologies and Context Modeling
SHAMAN context model should be infrastructure-independent for associated attributes and relations between digital objects.

Based on OAIS extended with lifecycle workflows.

An old presentation, new one coming soon.

Aspects:
 * User requirements in form of UC-DOF1-310, UC-DOF1-630/ UC-DOF1-631
 * Formalization in OWL, RDF/XML form, described classes, relations, etc.
 * Modular Context Ontology with Policies, Processes, Concepts, Actors (for scientific publishing test case).

Ontologies modeling tool protégé, some interesting features:
 * OntoGraf Visualization
 * SPARQL Queries II

Data Grid Technology
Requirements for project:
 * Large amount of data. (eg. 333 GB/day, 10 TB/month)
 * Heterogeneous data
 * Unlimited time of storage
 * Unlimited space of storage
 * Fast ingest
 * High access performance
 * Security/AA/Integrity
 * No admin ovehead
 * Cost effictive

Clouds are rejected as concept for data storage due to security issues.

GRID Systems investigated by SHAMAN
 * SRD
 * D-GRID
 * Cloud
 * iRODS (chosen)

iRODS: Virtualized collections through
 * logical NS for files (no overhead in Win/Linux fs by huge amount of files (inodes))
 * association of NS w. file track
 * State information
 * Provenance
 * Descriptive information
 * management of policies
 * audit trails
 * server-side workflows
 * AA of every action
 * time-depending control

Organizations in GRID:
 * Xerox
 * Philips
 * GLOBIT
 * InConTec
 * FernUni Haagen

Xeproc&copy; to model document processing pipelines
Developed by Research Centre Europe, Open Source under EPL

Features:
 * Document flows from processing step to next
 * Processing steps enriched
 * Model Driven Architecture (MDA) compliant
 * Eclipse plugin Xeproc Designer

SHAMAN information lifecycle (page 9)

 * Pre-Ingest process
 * Creation: Objects
 * Assembly (Objects, Descriptive Information, Context Information, Preservation Information): Xerox Studio, Kolibri
 * SIP - Ingest process
 * Dearchiving to AIP
 * Archval: Chechire, iRODS, KOPAL Gateway
 * Access (DIP)
 * Post-Access
 * Adoption (Objects, Descriptive Information, Context Information, Preservation Information): User Interfaces for Access, Multivalent Browser
 * Reuse (Objects, Descriptive Information, Context Information, Preservation Information)

SHAMAN demonstrator
is here.

Fab4 Multivalent Browser
"'The Multivalent preservation architecture preserves the ability to manipulate the original encoding format of a digital entity'"

Multivalent services can automate required processes:
 * Format identification
 * Validation
 * Transformation (e.g. correct invalid files)‏

Fab4 can process HTML, PDF, DVI, SVG, JPEG, PPT, OGG Theora + vorbis

Invalid files are a big challenge (47% of PDF files are not to specification)‏

Related Links

 * All presentations
 * [[Media:SHAMAN_ISP1_Training_event_final2.pdf|Agenda]]
 * [[Media:SHAMANnestor_training2011_bios2.pdf|Bio of the speakers]]
 * [[Media:SHAMANnestor_training2011_readinglist.pdf|Reading List]]