Difference between revisions of "User:Bourke/escidocdays2011"

From MPDLMediaWiki
Jump to navigation Jump to search
 
(36 intermediate revisions by one other user not shown)
Line 45: Line 45:


==Escidoc Overview==
==Escidoc Overview==
Principles


==Pubman==
===Management of scholarly record===
 
===virtual research environments===
 
(driver). If the researcher is doing it in a managed environment, then the metadata can be captured easily, early and at-source).
 
===integrated data management===
 
==Pubman 2011 - aspects==
 
===an ESciDoc Application===
* Used as is:
** login
** import format
** searches
** OAI Import
* Wrapped and extended
** authorization
** item format
* Items not used at all
** Containers
** Table of contents
 
=== Publication Management===
* Import / Display / Export
** import via 3rd party (...?)
** Sword Import
**  Manual Export via basket, export basket via email, download
** Export Interface: Define a query and automatically export. Useful for repeatable scripts
* Metadata handling
** Master Data Management via Cone
** Intelligent cut-and-paste (eg for adding multiple authors)
** Validation of Metadata flexibility via contexts.
* Discovery
** Browsing via topics
** Search via CQL
** Google Site Maps allows crawlers to index
** OAI-PMH allows harvesting by other repository
** RSS-Feeds to spread new publications
* Long-term Archiving (all from escidoc)
** PIDs
** Versioning
 
=== Max Planck Digital Archive ===
* User Support
* Training
* Maintenance
* Branding
* Import / Migration
* Integration (1. into MPI homepages, 2. into local MPI databases, 3. into authentication infrastructure)
* Feature Requests (e.g. automatic creation of the MPG Yearbook)
 
=== Open Source Community ===
* Installation Support
* Developer Support
* Feature Requests
* Installer Development
 
=== Technology ===
* Java
* JBoss / Tomcat
* XSLT Transformation
 
=== Future Challenges ===
* Clear distinction between OpenSource and MPG versions of the software
* Efficient Support: Toolboxes eg on Homepage integration
* Service Spin-Off - e.g. ConeService (done), TransformationService (not quite there), validationService (definitely not there)
* Enhanced configuration


==Imeji==
==Imeji==
* Bastien Saquet (MPDL)
* Andreas Vollmer - (computer and media service: HU Berlin)
* Karsten Asshauer & Jörg Busse - HU Berlin, Institute of Art and Visual History [http://www.kunstgeschichte.hu-berlin.de/ Institute of Art and Visual History]
* Julian Röder & Hai Nguyen - FU Berlin, Institute Computer Science, Konrad Zuse Internet Archive [http://www.zip.de/zuse zuse archive]
* CMS handles infrastructure questions, policies, standards
* IAVH: legacy system is imago_mediathek (ms-access, 50,000 images, each with 50 metadata fields)
* Strategy: migrate to web-based enviornment, export to xml, transform, import into eSciDoc/Imeji)
* Metadata issues. Need custom thesaurus.
* [http://en.wikipedia.org/wiki/Konrad_Zuse Konrad Zuse Archive]Both modern multimedia approach, and publish Zuses inheritance in an openaccess manner
* future plans
* digilib integration
* anotation


==eKinematix==
==eKinematix (G. Lonij, RWTH Aachen IGM)==
* virtual research environment for mechanical engineering
* integrated information structure
* mechanism technology
* Documentation
* Collaboration
* Publication
 
===Targets===
* make gathering, organising, enhancing, linking information more easily.
* support of research organisation / cooperation
* Eliminate "reinventing the wheel"
* developer reengineeing and web-services
* To be achieved via modular development and collaboration tools
* expand our existing repository
 
=== Implementation===
* XML / XSLT-based.
* Parners with FIZ for hosting and operations (eSciDoc)
* TU Ilmenau. Development of design theory and methodology. Supply libraries for modelling etc.
* IGM RWTH-Aachen. Supply the methanical / robotics / mechatronic expertise. specialised software (gecko geospatial tool, easier to use than a full CAD tool).
 
* Basic Module (EsciDoc)
* why escidoc? Open source, concentration on service-oriented architecture.


==Amalia==
==Amalia==
* Digital Humanities
* Tool use in humanities. An epistemologic shift
=== ENS Lyon, the digital humanities workshop ===
* inter-project communication
* common training
* a cyber-infrastructure


==ESciDoc Japan==
==ESciDoc Japan==
* Masuo
* very interesting statistics


==DigiLifecycle==
==DigiLifecycle==
* 2 year project
* 5 mpi participants, two associated
===goals===
* tools
* usage guidelines
* create an expert group
* Predecessor VIRR
===Lifecycle===
* Scanning
* OCR conversions
** produces a TEI file
** bibliogrpaic record (opac)
* ingest + viewing environment DLC tool)
* edit stage, add information (DLC tool)
* virtual research: annotate, reference (DLC tool)
* planning and preparation
* Triggers new projects
=== challenges===
* generic online tools leads to isolated solutions
* Long-term archiving
* Full text integration (TEI as a semi-standard)
* technical complexitiy (eg page-breaks need conventions)
===highlights===
* batch ingest triggerred by institutes
* variety of import and export formats
* generic online editor for structural metadata
* annotation and citation mechanism in both text and image parts
* ability to links and cite texts, images, collections within MPG and beyond


==panmetadocs==
==panmetadocs==
*Jens Klump Geosciences
* some "big data" projects, many "small data" projects. 25 new projects per year.
* big data projects are easy, they have to budget for data management anyway
* the problem was with the "small data" projects. Funding requirement to maintain the data, but little money
* solution: common data structure --> escidoc
* escidoc as "high rack" storage, which is agnostic to metadata and contents
* PanMetaDocs (PMD)as forklift
* written in PHP, based on panmetaworks)
* Per project, one PMD instance to control access and metadata contexts
* Syndication via RSS and OAI-PMH also allows the creation of portals for a distributed project
* data is not held in the application, but in the infrastructure
* sourcecode available [http://sourceforge.net/projects/panmetadocs/ at sourceforge]


==escidoc Browser/Admintool==
==escidoc Browser/Admintool==
* OUs
* Contexts (but not pubman contexts)
* User and Roles (but not groups)
* Roles and Scopes


=Thursday 27.10.11=
=Thursday 27.10.11=
developer track humboldt room
developer track humboldt room
breakout session mozart room
breakout session mozart room
=thursday 2.10.11=
==Infrastructure 1. Basic Concepts escidoc==
===infrastructue since escidoc days 2011===
1.3, java infrastructure connector, 1.4 soap removed
===apis and libraries===
*rest interface
* infrastructure java connector (ijc)
**maven
** version 1.3 compatible with 1.4
**escidoc-ijc version 1.4 soap will be removed
*next versions
** minor object changes (more abstraction)
** PHP connector
===documentation===
*"ziemlich verteilt"
* api vollständig dokumentiert
* viele java beispiele
** xml representations
**java
***connector
***rest api calls
===infrastructure===
* finely grained authorization system
* a collection of services for applicaiton
** without further gui application not suitable for non-technical users
** not a relational db.
===community===
===default installation ===
* java installer
** installation
** upgrade
** limited options for configuration and local optimisations
===resources===
*xml namespace, href, last-modified-date
*metadata
*object-specific content
*object-specific references to other elements (eg filters)
==Infrastructure 2: ==
* load samples 1.3.3, 1.4 coreservice
==Applications 1: pubman, imeji==
==Applications 2: configuring==

Latest revision as of 13:43, 29 March 2012

wednesday 26.10.11[edit]

Keynote: JISC managing research data programme (Simon Hodgson)[edit]

Some programs links:

Managing Research Data Trusted Cloud Infrastructure

challenges[edit]

  • data deluge
  • put also opportunities, eg data reuse

drivers of the jisc mrd programme[edit]

results / lessons learnt[edit]

Source JISC Data Centres Report (link not caught)

  • needs
    • closer cooperation
    • better training of staff
    • improved practise. ie: before the data is captured/archived ("upstream")
  • outputs and results of the MRD program
    • support data lifecycle
      • circle plan, create (store, annotate), use, appraise (discard, select), publish (identify, describe, discover (access), reuse
      • reused some work of national library of australia ("data verbs").
    • leadership and policy development
    • develop benefits for all stakeholders, (phd student, individul research, research team (eg, do we still have the data when a team-member leaves), university, supra-university
    • need for bottom-up approach, lessons from the [1] Incremental project of Cambridge/Glasgow universities
    • five training projects on specific areas (keywords for later search: mantra
    • DCC how-to guides on data management
    • madam project (madam) helps guide project planning for funding applications (see drives above, requirement for DMP Data Management Plan).
    • social science research data dmp project by uk data archive
    • Data Management Recommendations from same source
    • data management costing tool
    • Erim engineering research. specifically looking at engineering mapping datasets (developed cooperatively).
    • sample project in biology fishnet online like AWOB for freshwater biologists
    • Miss (transitional project from Madam, see above) miss website
    • citation mechanisms. Datacite initiative with British Library. whatisdatacite (aligns with doi)
    • Dryad repository. Aligns with Open Access journals initiatives, for holding biochemical data. Takes portion of Gold OA fee for longterm archival. Dryad Estimate costs of archiving, $25-75 per article
    • incentivise long-term archival. Research to show that publically available research data increases citation rate (v. important for sciences, of course) useful blog by a researcher in this area

Escidoc Overview[edit]

Principles

Management of scholarly record[edit]

virtual research environments[edit]

(driver). If the researcher is doing it in a managed environment, then the metadata can be captured easily, early and at-source).

integrated data management[edit]

Pubman 2011 - aspects[edit]

an ESciDoc Application[edit]

  • Used as is:
    • login
    • import format
    • searches
    • OAI Import
  • Wrapped and extended
    • authorization
    • item format
  • Items not used at all
    • Containers
    • Table of contents

Publication Management[edit]

  • Import / Display / Export
    • import via 3rd party (...?)
    • Sword Import
    • Manual Export via basket, export basket via email, download
    • Export Interface: Define a query and automatically export. Useful for repeatable scripts
  • Metadata handling
    • Master Data Management via Cone
    • Intelligent cut-and-paste (eg for adding multiple authors)
    • Validation of Metadata flexibility via contexts.
  • Discovery
    • Browsing via topics
    • Search via CQL
    • Google Site Maps allows crawlers to index
    • OAI-PMH allows harvesting by other repository
    • RSS-Feeds to spread new publications
  • Long-term Archiving (all from escidoc)
    • PIDs
    • Versioning

Max Planck Digital Archive[edit]

  • User Support
  • Training
  • Maintenance
  • Branding
  • Import / Migration
  • Integration (1. into MPI homepages, 2. into local MPI databases, 3. into authentication infrastructure)
  • Feature Requests (e.g. automatic creation of the MPG Yearbook)

Open Source Community[edit]

  • Installation Support
  • Developer Support
  • Feature Requests
  • Installer Development

Technology[edit]

  • Java
  • JBoss / Tomcat
  • XSLT Transformation

Future Challenges[edit]

  • Clear distinction between OpenSource and MPG versions of the software
  • Efficient Support: Toolboxes eg on Homepage integration
  • Service Spin-Off - e.g. ConeService (done), TransformationService (not quite there), validationService (definitely not there)
  • Enhanced configuration

Imeji[edit]

  • Bastien Saquet (MPDL)
  • Andreas Vollmer - (computer and media service: HU Berlin)
  • Karsten Asshauer & Jörg Busse - HU Berlin, Institute of Art and Visual History Institute of Art and Visual History
  • Julian Röder & Hai Nguyen - FU Berlin, Institute Computer Science, Konrad Zuse Internet Archive zuse archive
  • CMS handles infrastructure questions, policies, standards
  • IAVH: legacy system is imago_mediathek (ms-access, 50,000 images, each with 50 metadata fields)
  • Strategy: migrate to web-based enviornment, export to xml, transform, import into eSciDoc/Imeji)
  • Metadata issues. Need custom thesaurus.
  • Konrad Zuse ArchiveBoth modern multimedia approach, and publish Zuses inheritance in an openaccess manner
  • future plans
  • digilib integration
  • anotation

eKinematix (G. Lonij, RWTH Aachen IGM)[edit]

  • virtual research environment for mechanical engineering
  • integrated information structure
  • mechanism technology
  • Documentation
  • Collaboration
  • Publication

Targets[edit]

  • make gathering, organising, enhancing, linking information more easily.
  • support of research organisation / cooperation
  • Eliminate "reinventing the wheel"
  • developer reengineeing and web-services
  • To be achieved via modular development and collaboration tools
  • expand our existing repository

Implementation[edit]

  • XML / XSLT-based.
  • Parners with FIZ for hosting and operations (eSciDoc)
  • TU Ilmenau. Development of design theory and methodology. Supply libraries for modelling etc.
  • IGM RWTH-Aachen. Supply the methanical / robotics / mechatronic expertise. specialised software (gecko geospatial tool, easier to use than a full CAD tool).
  • Basic Module (EsciDoc)
  • why escidoc? Open source, concentration on service-oriented architecture.

Amalia[edit]

  • Digital Humanities
  • Tool use in humanities. An epistemologic shift

ENS Lyon, the digital humanities workshop[edit]

  • inter-project communication
  • common training
  • a cyber-infrastructure

ESciDoc Japan[edit]

  • Masuo
  • very interesting statistics

DigiLifecycle[edit]

  • 2 year project
  • 5 mpi participants, two associated

goals[edit]

  • tools
  • usage guidelines
  • create an expert group
  • Predecessor VIRR

Lifecycle[edit]

  • Scanning
  • OCR conversions
    • produces a TEI file
    • bibliogrpaic record (opac)
  • ingest + viewing environment DLC tool)
  • edit stage, add information (DLC tool)
  • virtual research: annotate, reference (DLC tool)
  • planning and preparation
  • Triggers new projects

challenges[edit]

  • generic online tools leads to isolated solutions
  • Long-term archiving
  • Full text integration (TEI as a semi-standard)
  • technical complexitiy (eg page-breaks need conventions)

highlights[edit]

  • batch ingest triggerred by institutes
  • variety of import and export formats
  • generic online editor for structural metadata
  • annotation and citation mechanism in both text and image parts
  • ability to links and cite texts, images, collections within MPG and beyond

panmetadocs[edit]

  • Jens Klump Geosciences
  • some "big data" projects, many "small data" projects. 25 new projects per year.
  • big data projects are easy, they have to budget for data management anyway
  • the problem was with the "small data" projects. Funding requirement to maintain the data, but little money
  • solution: common data structure --> escidoc
  • escidoc as "high rack" storage, which is agnostic to metadata and contents
  • PanMetaDocs (PMD)as forklift
  • written in PHP, based on panmetaworks)
  • Per project, one PMD instance to control access and metadata contexts
  • Syndication via RSS and OAI-PMH also allows the creation of portals for a distributed project
  • data is not held in the application, but in the infrastructure
  • sourcecode available at sourceforge

escidoc Browser/Admintool[edit]

  • OUs
  • Contexts (but not pubman contexts)
  • User and Roles (but not groups)
  • Roles and Scopes

Thursday 27.10.11[edit]

developer track humboldt room breakout session mozart room

thursday 2.10.11[edit]

Infrastructure 1. Basic Concepts escidoc[edit]

infrastructue since escidoc days 2011[edit]

1.3, java infrastructure connector, 1.4 soap removed

apis and libraries[edit]

  • rest interface
  • infrastructure java connector (ijc)
    • maven
    • version 1.3 compatible with 1.4
    • escidoc-ijc version 1.4 soap will be removed
  • next versions
    • minor object changes (more abstraction)
    • PHP connector

documentation[edit]

  • "ziemlich verteilt"
  • api vollständig dokumentiert
  • viele java beispiele
    • xml representations
    • java
      • connector
      • rest api calls

infrastructure[edit]

  • finely grained authorization system
  • a collection of services for applicaiton
    • without further gui application not suitable for non-technical users
    • not a relational db.


community[edit]

default installation[edit]

  • java installer
    • installation
    • upgrade
    • limited options for configuration and local optimisations

resources[edit]

  • xml namespace, href, last-modified-date
  • metadata
  • object-specific content
  • object-specific references to other elements (eg filters)

Infrastructure 2:[edit]

  • load samples 1.3.3, 1.4 coreservice

Applications 1: pubman, imeji[edit]

Applications 2: configuring[edit]