User:Bourke/escidocdays2011

From MPDLMediaWiki
Jump to navigation Jump to search

wednesday 26.10.11[edit]

Keynote: JISC managing research data programme (Simon Hodgson)[edit]

Some programs links:

Managing Research Data Trusted Cloud Infrastructure

challenges[edit]

  • data deluge
  • put also opportunities, eg data reuse

drivers of the jisc mrd programme[edit]

results / lessons learnt[edit]

Source JISC Data Centres Report (link not caught)

  • needs
    • closer cooperation
    • better training of staff
    • improved practise. ie: before the data is captured/archived ("upstream")
  • outputs and results of the MRD program
    • support data lifecycle
      • circle plan, create (store, annotate), use, appraise (discard, select), publish (identify, describe, discover (access), reuse
      • reused some work of national library of australia ("data verbs").
    • leadership and policy development
    • develop benefits for all stakeholders, (phd student, individul research, research team (eg, do we still have the data when a team-member leaves), university, supra-university
    • need for bottom-up approach, lessons from the [1] Incremental project of Cambridge/Glasgow universities
    • five training projects on specific areas (keywords for later search: mantra
    • DCC how-to guides on data management
    • madam project (madam) helps guide project planning for funding applications (see drives above, requirement for DMP Data Management Plan).
    • social science research data dmp project by uk data archive
    • Data Management Recommendations from same source
    • data management costing tool
    • Erim engineering research. specifically looking at engineering mapping datasets (developed cooperatively).
    • sample project in biology fishnet online like AWOB for freshwater biologists
    • Miss (transitional project from Madam, see above) miss website
    • citation mechanisms. Datacite initiative with British Library. whatisdatacite (aligns with doi)
    • Dryad repository. Aligns with Open Access journals initiatives, for holding biochemical data. Takes portion of Gold OA fee for longterm archival. Dryad Estimate costs of archiving, $25-75 per article
    • incentivise long-term archival. Research to show that publically available research data increases citation rate (v. important for sciences, of course) useful blog by a researcher in this area

Escidoc Overview[edit]

Principles

Management of scholarly record[edit]

virtual research environments[edit]

(driver). If the researcher is doing it in a managed environment, then the metadata can be captured easily, early and at-source).

integrated data management[edit]

Pubman 2011 - aspects[edit]

an ESciDoc Application[edit]

  • Used as is:
    • login
    • import format
    • searches
    • OAI Import
  • Wrapped and extended
    • authorization
    • item format
  • Items not used at all
    • Containers
    • Table of contents

Publication Management[edit]

  • Import / Display / Export
    • import via 3rd party (...?)
    • Sword Import
    • Manual Export via basket, export basket via email, download
    • Export Interface: Define a query and automatically export. Useful for repeatable scripts
  • Metadata handling
    • Master Data Management via Cone
    • Intelligent cut-and-paste (eg for adding multiple authors)
    • Validation of Metadata flexibility via contexts.
  • Discovery
    • Browsing via topics
    • Search via CQL
    • Google Site Maps allows crawlers to index
    • OAI-PMH allows harvesting by other repository
    • RSS-Feeds to spread new publications
  • Long-term Archiving (all from escidoc)
    • PIDs
    • Versioning

Max Planck Digital Archive[edit]

  • User Support
  • Training
  • Maintenance
  • Branding
  • Import / Migration
  • Integration (1. into MPI homepages, 2. into local MPI databases, 3. into authentication infrastructure)
  • Feature Requests (e.g. automatic creation of the MPG Yearbook)

Open Source Community[edit]

  • Installation Support
  • Developer Support
  • Feature Requests
  • Installer Development

Technology[edit]

  • Java
  • JBoss / Tomcat
  • XSLT Transformation

Future Challenges[edit]

  • Clear distinction between OpenSource and MPG versions of the software
  • Efficient Support: Toolboxes eg on Homepage integration
  • Service Spin-Off - e.g. ConeService (done), TransformationService (not quite there), validationService (definitely not there)
  • Enhanced configuration

Imeji[edit]

  • Bastien Saquet (MPDL)
  • Andreas Vollmer - (computer and media service: HU Berlin)
  • Karsten Asshauer & Jörg Busse - HU Berlin, Institute of Art and Visual History Institute of Art and Visual History
  • Julian Röder & Hai Nguyen - FU Berlin, Institute Computer Science, Konrad Zuse Internet Archive zuse archive
  • CMS handles infrastructure questions, policies, standards
  • IAVH: legacy system is imago_mediathek (ms-access, 50,000 images, each with 50 metadata fields)
  • Strategy: migrate to web-based enviornment, export to xml, transform, import into eSciDoc/Imeji)
  • Metadata issues. Need custom thesaurus.
  • Konrad Zuse ArchiveBoth modern multimedia approach, and publish Zuses inheritance in an openaccess manner
  • future plans
  • digilib integration
  • anotation

eKinematix (G. Lonij, RWTH Aachen IGM)[edit]

  • virtual research environment for mechanical engineering
  • integrated information structure
  • mechanism technology
  • Documentation
  • Collaboration
  • Publication

Targets[edit]

  • make gathering, organising, enhancing, linking information more easily.
  • support of research organisation / cooperation
  • Eliminate "reinventing the wheel"
  • developer reengineeing and web-services
  • To be achieved via modular development and collaboration tools
  • expand our existing repository

Implementation[edit]

  • XML / XSLT-based.
  • Parners with FIZ for hosting and operations (eSciDoc)
  • TU Ilmenau. Development of design theory and methodology. Supply libraries for modelling etc.
  • IGM RWTH-Aachen. Supply the methanical / robotics / mechatronic expertise. specialised software (gecko geospatial tool, easier to use than a full CAD tool).
  • Basic Module (EsciDoc)
  • why escidoc? Open source, concentration on service-oriented architecture.

Amalia[edit]

  • Digital Humanities
  • Tool use in humanities. An epistemologic shift

ENS Lyon, the digital humanities workshop[edit]

  • inter-project communication
  • common training
  • a cyber-infrastructure

ESciDoc Japan[edit]

  • Masuo
  • very interesting statistics

DigiLifecycle[edit]

  • 2 year project
  • 5 mpi participants, two associated

goals[edit]

  • tools
  • usage guidelines
  • create an expert group
  • Predecessor VIRR

Lifecycle[edit]

  • Scanning
  • OCR conversions
    • produces a TEI file
    • bibliogrpaic record (opac)
  • ingest + viewing environment DLC tool)
  • edit stage, add information (DLC tool)
  • virtual research: annotate, reference (DLC tool)
  • planning and preparation
  • Triggers new projects

challenges[edit]

  • generic online tools leads to isolated solutions
  • Long-term archiving
  • Full text integration (TEI as a semi-standard)
  • technical complexitiy (eg page-breaks need conventions)

highlights[edit]

  • batch ingest triggerred by institutes
  • variety of import and export formats
  • generic online editor for structural metadata
  • annotation and citation mechanism in both text and image parts
  • ability to links and cite texts, images, collections within MPG and beyond

panmetadocs[edit]

  • Jens Klump Geosciences
  • some "big data" projects, many "small data" projects. 25 new projects per year.
  • big data projects are easy, they have to budget for data management anyway
  • the problem was with the "small data" projects. Funding requirement to maintain the data, but little money
  • solution: common data structure --> escidoc
  • escidoc as "high rack" storage, which is agnostic to metadata and contents
  • PanMetaDocs (PMD)as forklift
  • written in PHP, based on panmetaworks)
  • Per project, one PMD instance to control access and metadata contexts
  • Syndication via RSS and OAI-PMH also allows the creation of portals for a distributed project
  • data is not held in the application, but in the infrastructure
  • sourcecode available at sourceforge

escidoc Browser/Admintool[edit]

  • OUs
  • Contexts (but not pubman contexts)
  • User and Roles (but not groups)
  • Roles and Scopes

Thursday 27.10.11[edit]

developer track humboldt room breakout session mozart room

thursday 2.10.11[edit]

Infrastructure 1. Basic Concepts escidoc[edit]

infrastructue since escidoc days 2011[edit]

1.3, java infrastructure connector, 1.4 soap removed

apis and libraries[edit]

  • rest interface
  • infrastructure java connector (ijc)
    • maven
    • version 1.3 compatible with 1.4
    • escidoc-ijc version 1.4 soap will be removed
  • next versions
    • minor object changes (more abstraction)
    • PHP connector

documentation[edit]

  • "ziemlich verteilt"
  • api vollständig dokumentiert
  • viele java beispiele
    • xml representations
    • java
      • connector
      • rest api calls

infrastructure[edit]

  • finely grained authorization system
  • a collection of services for applicaiton
    • without further gui application not suitable for non-technical users
    • not a relational db.


community[edit]

default installation[edit]

  • java installer
    • installation
    • upgrade
    • limited options for configuration and local optimisations

resources[edit]

  • xml namespace, href, last-modified-date
  • metadata
  • object-specific content
  • object-specific references to other elements (eg filters)

Infrastructure 2:[edit]

  • load samples 1.3.3, 1.4 coreservice

Applications 1: pubman, imeji[edit]

Applications 2: configuring[edit]