User:Bourke/escidocdays2011

=wednesday 26.10.11=

Keynote: JISC managing research data programme (Simon Hodgson)
Some programs links:

Managing Research Data Trusted Cloud Infrastructure

challenges

 * data deluge
 * put also opportunities, eg data reuse

drivers of the jisc mrd programme

 * cordis "riding the wave" cordis report
 * RCUK Principles on data policies RCUK data policies
 * UK funding agencies NERC ESRC require "data value checklist", + 10 year mandatory storage
 * Similar to german guidelines Principles for the handling of research data

results / lessons learnt
Source JISC Data Centres Report (link not caught)


 * needs
 * closer cooperation
 * better training of staff
 * improved practise. ie: before the data is captured/archived ("upstream")


 * outputs and results of the MRD program
 * support data lifecycle
 * circle plan, create (store, annotate), use, appraise (discard, select), publish (identify, describe, discover (access), reuse
 * reused some work of national library of australia ("data verbs").
 * leadership and policy development
 * develop benefits for all stakeholders, (phd student, individul research, research team (eg, do we still have the data when a team-member leaves), university, supra-university
 * need for bottom-up approach, lessons from the Incremental project of Cambridge/Glasgow universities
 * five training projects on specific areas (keywords for later search: mantra
 * DCC how-to guides on data management
 * madam project (madam) helps guide project planning for funding applications (see drives above, requirement for DMP Data Management Plan).
 * social science research data dmp project by uk data archive
 * Data Management Recommendations from same source
 * data management costing tool
 * Erim engineering research. specifically looking at engineering mapping datasets (developed cooperatively).
 * sample project in biology fishnet online like AWOB for freshwater biologists
 * Miss (transitional project from Madam, see above) miss website
 * citation mechanisms. Datacite initiative with British Library. whatisdatacite (aligns with doi)
 * Dryad repository. Aligns with Open Access journals initiatives, for holding biochemical data. Takes portion of Gold OA fee for longterm archival. Dryad Estimate costs of archiving, $25-75 per article
 * incentivise long-term archival. Research to show that publically available research data increases citation rate (v. important for sciences, of course) useful blog by a researcher in this area

Escidoc Overview
Principles

virtual research environments
(driver). If the researcher is doing it in a managed environment, then the metadata can be captured easily, early and at-source).

an ESciDoc Application

 * Used as is:
 * login
 * import format
 * searches
 * OAI Import
 * Wrapped and extended
 * authorization
 * item format
 * Items not used at all
 * Containers
 * Table of contents

Publication Management

 * Import / Display / Export
 * import via 3rd party (...?)
 * Sword Import
 * Manual Export via basket, export basket via email, download
 * Export Interface: Define a query and automatically export. Useful for repeatable scripts
 * Metadata handling
 * Master Data Management via Cone
 * Intelligent cut-and-paste (eg for adding multiple authors)
 * Validation of Metadata flexibility via contexts.
 * Discovery
 * Browsing via topics
 * Search via CQL
 * Google Site Maps allows crawlers to index
 * OAI-PMH allows harvesting by other repository
 * RSS-Feeds to spread new publications
 * Long-term Archiving (all from escidoc)
 * PIDs
 * Versioning

Max Planck Digital Archive

 * User Support
 * Training
 * Maintenance
 * Branding
 * Import / Migration
 * Integration (1. into MPI homepages, 2. into local MPI databases, 3. into authentication infrastructure)
 * Feature Requests (e.g. automatic creation of the MPG Yearbook)

Open Source Community

 * Installation Support
 * Developer Support
 * Feature Requests
 * Installer Development

Technology

 * Java
 * JBoss / Tomcat
 * XSLT Transformation

Future Challenges

 * Clear distinction between OpenSource and MPG versions of the software
 * Efficient Support: Toolboxes eg on Homepage integration
 * Service Spin-Off - e.g. ConeService (done), TransformationService (not quite there), validationService (definitely not there)
 * Enhanced configuration

Imeji

 * Bastien Saquet (MPDL)
 * Andreas Vollmer - (computer and media service: HU Berlin)
 * Karsten Asshauer & Jörg Busse - HU Berlin, Institute of Art and Visual History Institute of Art and Visual History
 * Julian Röder & Hai Nguyen - FU Berlin, Institute Computer Science, Konrad Zuse Internet Archive zuse archive


 * CMS handles infrastructure questions, policies, standards
 * IAVH: legacy system is imago_mediathek (ms-access, 50,000 images, each with 50 metadata fields)
 * Strategy: migrate to web-based enviornment, export to xml, transform, import into eSciDoc/Imeji)
 * Metadata issues. Need custom thesaurus.
 * Konrad Zuse ArchiveBoth modern multimedia approach, and publish Zuses inheritance in an openaccess manner
 * future plans
 * digilib integration
 * anotation

eKinematix (G. Lonij, RWTH Aachen IGM)

 * virtual research environment for mechanical engineering
 * integrated information structure
 * mechanism technology
 * Documentation
 * Collaboration
 * Publication

Targets

 * make gathering, organising, enhancing, linking information more easily.
 * support of research organisation / cooperation
 * Eliminate "reinventing the wheel"
 * developer reengineeing and web-services
 * To be achieved via modular development and collaboration tools
 * expand our existing repository

Implementation

 * XML / XSLT-based.
 * Parners with FIZ for hosting and operations (eSciDoc)
 * TU Ilmenau. Development of design theory and methodology. Supply libraries for modelling etc.
 * IGM RWTH-Aachen. Supply the methanical / robotics / mechatronic expertise. specialised software (gecko geospatial tool, easier to use than a full CAD tool).


 * Basic Module (EsciDoc)
 * why escidoc? Open source, concentration on service-oriented architecture.

Amalia

 * Digital Humanities
 * Tool use in humanities. An epistemologic shift

ENS Lyon, the digital humanities workshop

 * inter-project communication
 * common training
 * a cyber-infrastructure

ESciDoc Japan

 * Masuo
 * very interesting statistics

DigiLifecycle

 * 2 year project
 * 5 mpi participants, two associated

goals

 * tools
 * usage guidelines
 * create an expert group
 * Predecessor VIRR

Lifecycle

 * Scanning
 * OCR conversions
 * produces a TEI file
 * bibliogrpaic record (opac)
 * ingest + viewing environment DLC tool)
 * edit stage, add information (DLC tool)
 * virtual research: annotate, reference (DLC tool)
 * planning and preparation
 * Triggers new projects

challenges

 * generic online tools leads to isolated solutions
 * Long-term archiving
 * Full text integration (TEI as a semi-standard)
 * technical complexitiy (eg page-breaks need conventions)

highlights

 * batch ingest triggerred by institutes
 * variety of import and export formats
 * generic online editor for structural metadata
 * annotation and citation mechanism in both text and image parts
 * ability to links and cite texts, images, collections within MPG and beyond

panmetadocs

 * Jens Klump Geosciences
 * some "big data" projects, many "small data" projects. 25 new projects per year.
 * big data projects are easy, they have to budget for data management anyway
 * the problem was with the "small data" projects. Funding requirement to maintain the data, but little money
 * solution: common data structure --> escidoc
 * escidoc as "high rack" storage, which is agnostic to metadata and contents
 * PanMetaDocs (PMD)as forklift
 * written in PHP, based on panmetaworks)
 * Per project, one PMD instance to control access and metadata contexts
 * Syndication via RSS and OAI-PMH also allows the creation of portals for a distributed project
 * data is not held in the application, but in the infrastructure
 * sourcecode available at sourceforge

escidoc Browser/Admintool

 * OUs
 * Contexts (but not pubman contexts)
 * User and Roles (but not groups)
 * Roles and Scopes

=Thursday 27.10.11= developer track humboldt room breakout session mozart room

=thursday 2.10.11=

infrastructue since escidoc days 2011
1.3, java infrastructure connector, 1.4 soap removed

apis and libraries

 * rest interface
 * infrastructure java connector (ijc)
 * maven
 * version 1.3 compatible with 1.4
 * escidoc-ijc version 1.4 soap will be removed
 * next versions
 * minor object changes (more abstraction)
 * PHP connector

documentation

 * "ziemlich verteilt"
 * api vollständig dokumentiert
 * viele java beispiele
 * xml representations
 * java
 * connector
 * rest api calls

infrastructure

 * finely grained authorization system
 * a collection of services for applicaiton
 * without further gui application not suitable for non-technical users
 * not a relational db.

default installation

 * java installer
 * installation
 * upgrade
 * limited options for configuration and local optimisations

resources

 * xml namespace, href, last-modified-date
 * metadata
 * object-specific content
 * object-specific references to other elements (eg filters)

Infrastructure 2:

 * load samples 1.3.3, 1.4 coreservice