User:Bourke/escidocdays2011
Jump to navigation
Jump to search
wednesday 26.10.11[edit]
Keynote: JISC managing research data programme (Simon Hodgson)[edit]
Some programs links:
Managing Research Data Trusted Cloud Infrastructure
challenges[edit]
- data deluge
- put also opportunities, eg data reuse
drivers of the jisc mrd programme[edit]
- cordis "riding the wave" cordis report
- RCUK Principles on data policies RCUK data policies
- UK funding agencies NERC ESRC require "data value checklist", + 10 year mandatory storage
- Similar to german guidelines Principles for the handling of research data
results / lessons learnt[edit]
Source JISC Data Centres Report (link not caught)
- needs
- closer cooperation
- better training of staff
- improved practise. ie: before the data is captured/archived ("upstream")
- outputs and results of the MRD program
- support data lifecycle
- circle plan, create (store, annotate), use, appraise (discard, select), publish (identify, describe, discover (access), reuse
- reused some work of national library of australia ("data verbs").
- leadership and policy development
- develop benefits for all stakeholders, (phd student, individul research, research team (eg, do we still have the data when a team-member leaves), university, supra-university
- need for bottom-up approach, lessons from the [1] Incremental project of Cambridge/Glasgow universities
- five training projects on specific areas (keywords for later search: mantra
- DCC how-to guides on data management
- madam project (madam) helps guide project planning for funding applications (see drives above, requirement for DMP Data Management Plan).
- social science research data dmp project by uk data archive
- Data Management Recommendations from same source
- data management costing tool
- Erim engineering research. specifically looking at engineering mapping datasets (developed cooperatively).
- sample project in biology fishnet online like AWOB for freshwater biologists
- Miss (transitional project from Madam, see above) miss website
- citation mechanisms. Datacite initiative with British Library. whatisdatacite (aligns with doi)
- Dryad repository. Aligns with Open Access journals initiatives, for holding biochemical data. Takes portion of Gold OA fee for longterm archival. Dryad Estimate costs of archiving, $25-75 per article
- incentivise long-term archival. Research to show that publically available research data increases citation rate (v. important for sciences, of course) useful blog by a researcher in this area
- support data lifecycle
Escidoc Overview[edit]
Principles
Management of scholarly record[edit]
virtual research environments[edit]
(driver). If the researcher is doing it in a managed environment, then the metadata can be captured easily, early and at-source).
integrated data management[edit]
Pubman 2011 - aspects[edit]
an ESciDoc Application[edit]
- Used as is:
- login
- import format
- searches
- OAI Import
- Wrapped and extended
- authorization
- item format
- Items not used at all
- Containers
- Table of contents
Publication Management[edit]
- Import / Display / Export
- import via 3rd party (...?)
- Sword Import
- Manual Export via basket, export basket via email, download
- Export Interface: Define a query and automatically export. Useful for repeatable scripts
- Metadata handling
- Master Data Management via Cone
- Intelligent cut-and-paste (eg for adding multiple authors)
- Validation of Metadata flexibility via contexts.
- Discovery
- Browsing via topics
- Search via CQL
- Google Site Maps allows crawlers to index
- OAI-PMH allows harvesting by other repository
- RSS-Feeds to spread new publications
- Long-term Archiving (all from escidoc)
- PIDs
- Versioning
Max Planck Digital Archive[edit]
- User Support
- Training
- Maintenance
- Branding
- Import / Migration
- Integration (1. into MPI homepages, 2. into local MPI databases, 3. into authentication infrastructure)
- Feature Requests (e.g. automatic creation of the MPG Yearbook)
Open Source Community[edit]
- Installation Support
- Developer Support
- Feature Requests
- Installer Development
Technology[edit]
- Java
- JBoss / Tomcat
- XSLT Transformation
Future Challenges[edit]
- Clear distinction between OpenSource and MPG versions of the software
- Efficient Support: Toolboxes eg on Homepage integration
- Service Spin-Off - e.g. ConeService (done), TransformationService (not quite there), validationService (definitely not there)
- Enhanced configuration
Imeji[edit]
- Bastien Saquet (MPDL)
- Andreas Vollmer - (computer and media service: HU Berlin)
- Karsten Asshauer & Jörg Busse - HU Berlin, Institute of Art and Visual History Institute of Art and Visual History
- Julian Röder & Hai Nguyen - FU Berlin, Institute Computer Science, Konrad Zuse Internet Archive zuse archive
- CMS handles infrastructure questions, policies, standards
- IAVH: legacy system is imago_mediathek (ms-access, 50,000 images, each with 50 metadata fields)
- Strategy: migrate to web-based enviornment, export to xml, transform, import into eSciDoc/Imeji)
- Metadata issues. Need custom thesaurus.
- Konrad Zuse ArchiveBoth modern multimedia approach, and publish Zuses inheritance in an openaccess manner
- future plans
- digilib integration
- anotation
eKinematix (G. Lonij, RWTH Aachen IGM)[edit]
- virtual research environment for mechanical engineering
- integrated information structure
- mechanism technology
- Documentation
- Collaboration
- Publication
Targets[edit]
- make gathering, organising, enhancing, linking information more easily.
- support of research organisation / cooperation
- Eliminate "reinventing the wheel"
- developer reengineeing and web-services
- To be achieved via modular development and collaboration tools
- expand our existing repository
Implementation[edit]
- XML / XSLT-based.
- Parners with FIZ for hosting and operations (eSciDoc)
- TU Ilmenau. Development of design theory and methodology. Supply libraries for modelling etc.
- IGM RWTH-Aachen. Supply the methanical / robotics / mechatronic expertise. specialised software (gecko geospatial tool, easier to use than a full CAD tool).
- Basic Module (EsciDoc)
- why escidoc? Open source, concentration on service-oriented architecture.
Amalia[edit]
- Digital Humanities
- Tool use in humanities. An epistemologic shift
ENS Lyon, the digital humanities workshop[edit]
- inter-project communication
- common training
- a cyber-infrastructure
ESciDoc Japan[edit]
- Masuo
- very interesting statistics
DigiLifecycle[edit]
- 2 year project
- 5 mpi participants, two associated
goals[edit]
- tools
- usage guidelines
- create an expert group
- Predecessor VIRR
Lifecycle[edit]
- Scanning
- OCR conversions
- produces a TEI file
- bibliogrpaic record (opac)
- ingest + viewing environment DLC tool)
- edit stage, add information (DLC tool)
- virtual research: annotate, reference (DLC tool)
- planning and preparation
- Triggers new projects
challenges[edit]
- generic online tools leads to isolated solutions
- Long-term archiving
- Full text integration (TEI as a semi-standard)
- technical complexitiy (eg page-breaks need conventions)
highlights[edit]
- batch ingest triggerred by institutes
- variety of import and export formats
- generic online editor for structural metadata
- annotation and citation mechanism in both text and image parts
- ability to links and cite texts, images, collections within MPG and beyond
panmetadocs[edit]
- Jens Klump Geosciences
- some "big data" projects, many "small data" projects. 25 new projects per year.
- big data projects are easy, they have to budget for data management anyway
- the problem was with the "small data" projects. Funding requirement to maintain the data, but little money
- solution: common data structure --> escidoc
- escidoc as "high rack" storage, which is agnostic to metadata and contents
- PanMetaDocs (PMD)as forklift
- written in PHP, based on panmetaworks)
- Per project, one PMD instance to control access and metadata contexts
- Syndication via RSS and OAI-PMH also allows the creation of portals for a distributed project
- data is not held in the application, but in the infrastructure
- sourcecode available at sourceforge
escidoc Browser/Admintool[edit]
- OUs
- Contexts (but not pubman contexts)
- User and Roles (but not groups)
- Roles and Scopes
Thursday 27.10.11[edit]
developer track humboldt room breakout session mozart room
thursday 2.10.11[edit]
Infrastructure 1. Basic Concepts escidoc[edit]
infrastructue since escidoc days 2011[edit]
1.3, java infrastructure connector, 1.4 soap removed
apis and libraries[edit]
- rest interface
- infrastructure java connector (ijc)
- maven
- version 1.3 compatible with 1.4
- escidoc-ijc version 1.4 soap will be removed
- next versions
- minor object changes (more abstraction)
- PHP connector
documentation[edit]
- "ziemlich verteilt"
- api vollständig dokumentiert
- viele java beispiele
- xml representations
- java
- connector
- rest api calls
infrastructure[edit]
- finely grained authorization system
- a collection of services for applicaiton
- without further gui application not suitable for non-technical users
- not a relational db.
community[edit]
default installation[edit]
- java installer
- installation
- upgrade
- limited options for configuration and local optimisations
resources[edit]
- xml namespace, href, last-modified-date
- metadata
- object-specific content
- object-specific references to other elements (eg filters)
Infrastructure 2:[edit]
- load samples 1.3.3, 1.4 coreservice