Difference between revisions of "User:Bourke/escidocdays2011"
Jump to navigation
Jump to search
m (→Imeji) |
|||
(36 intermediate revisions by one other user not shown) | |||
Line 45: | Line 45: | ||
==Escidoc Overview== | ==Escidoc Overview== | ||
Principles | |||
==Pubman== | ===Management of scholarly record=== | ||
===virtual research environments=== | |||
(driver). If the researcher is doing it in a managed environment, then the metadata can be captured easily, early and at-source). | |||
===integrated data management=== | |||
==Pubman 2011 - aspects== | |||
===an ESciDoc Application=== | |||
* Used as is: | |||
** login | |||
** import format | |||
** searches | |||
** OAI Import | |||
* Wrapped and extended | |||
** authorization | |||
** item format | |||
* Items not used at all | |||
** Containers | |||
** Table of contents | |||
=== Publication Management=== | |||
* Import / Display / Export | |||
** import via 3rd party (...?) | |||
** Sword Import | |||
** Manual Export via basket, export basket via email, download | |||
** Export Interface: Define a query and automatically export. Useful for repeatable scripts | |||
* Metadata handling | |||
** Master Data Management via Cone | |||
** Intelligent cut-and-paste (eg for adding multiple authors) | |||
** Validation of Metadata flexibility via contexts. | |||
* Discovery | |||
** Browsing via topics | |||
** Search via CQL | |||
** Google Site Maps allows crawlers to index | |||
** OAI-PMH allows harvesting by other repository | |||
** RSS-Feeds to spread new publications | |||
* Long-term Archiving (all from escidoc) | |||
** PIDs | |||
** Versioning | |||
=== Max Planck Digital Archive === | |||
* User Support | |||
* Training | |||
* Maintenance | |||
* Branding | |||
* Import / Migration | |||
* Integration (1. into MPI homepages, 2. into local MPI databases, 3. into authentication infrastructure) | |||
* Feature Requests (e.g. automatic creation of the MPG Yearbook) | |||
=== Open Source Community === | |||
* Installation Support | |||
* Developer Support | |||
* Feature Requests | |||
* Installer Development | |||
=== Technology === | |||
* Java | |||
* JBoss / Tomcat | |||
* XSLT Transformation | |||
=== Future Challenges === | |||
* Clear distinction between OpenSource and MPG versions of the software | |||
* Efficient Support: Toolboxes eg on Homepage integration | |||
* Service Spin-Off - e.g. ConeService (done), TransformationService (not quite there), validationService (definitely not there) | |||
* Enhanced configuration | |||
==Imeji== | ==Imeji== | ||
* Bastien Saquet (MPDL) | |||
* Andreas Vollmer - (computer and media service: HU Berlin) | |||
* Karsten Asshauer & Jörg Busse - HU Berlin, Institute of Art and Visual History [http://www.kunstgeschichte.hu-berlin.de/ Institute of Art and Visual History] | |||
* Julian Röder & Hai Nguyen - FU Berlin, Institute Computer Science, Konrad Zuse Internet Archive [http://www.zip.de/zuse zuse archive] | |||
* CMS handles infrastructure questions, policies, standards | |||
* IAVH: legacy system is imago_mediathek (ms-access, 50,000 images, each with 50 metadata fields) | |||
* Strategy: migrate to web-based enviornment, export to xml, transform, import into eSciDoc/Imeji) | |||
* Metadata issues. Need custom thesaurus. | |||
* [http://en.wikipedia.org/wiki/Konrad_Zuse Konrad Zuse Archive]Both modern multimedia approach, and publish Zuses inheritance in an openaccess manner | |||
* future plans | |||
* digilib integration | |||
* anotation | |||
==eKinematix== | ==eKinematix (G. Lonij, RWTH Aachen IGM)== | ||
* virtual research environment for mechanical engineering | |||
* integrated information structure | |||
* mechanism technology | |||
* Documentation | |||
* Collaboration | |||
* Publication | |||
===Targets=== | |||
* make gathering, organising, enhancing, linking information more easily. | |||
* support of research organisation / cooperation | |||
* Eliminate "reinventing the wheel" | |||
* developer reengineeing and web-services | |||
* To be achieved via modular development and collaboration tools | |||
* expand our existing repository | |||
=== Implementation=== | |||
* XML / XSLT-based. | |||
* Parners with FIZ for hosting and operations (eSciDoc) | |||
* TU Ilmenau. Development of design theory and methodology. Supply libraries for modelling etc. | |||
* IGM RWTH-Aachen. Supply the methanical / robotics / mechatronic expertise. specialised software (gecko geospatial tool, easier to use than a full CAD tool). | |||
* Basic Module (EsciDoc) | |||
* why escidoc? Open source, concentration on service-oriented architecture. | |||
==Amalia== | ==Amalia== | ||
* Digital Humanities | |||
* Tool use in humanities. An epistemologic shift | |||
=== ENS Lyon, the digital humanities workshop === | |||
* inter-project communication | |||
* common training | |||
* a cyber-infrastructure | |||
==ESciDoc Japan== | ==ESciDoc Japan== | ||
* Masuo | |||
* very interesting statistics | |||
==DigiLifecycle== | ==DigiLifecycle== | ||
* 2 year project | |||
* 5 mpi participants, two associated | |||
===goals=== | |||
* tools | |||
* usage guidelines | |||
* create an expert group | |||
* Predecessor VIRR | |||
===Lifecycle=== | |||
* Scanning | |||
* OCR conversions | |||
** produces a TEI file | |||
** bibliogrpaic record (opac) | |||
* ingest + viewing environment DLC tool) | |||
* edit stage, add information (DLC tool) | |||
* virtual research: annotate, reference (DLC tool) | |||
* planning and preparation | |||
* Triggers new projects | |||
=== challenges=== | |||
* generic online tools leads to isolated solutions | |||
* Long-term archiving | |||
* Full text integration (TEI as a semi-standard) | |||
* technical complexitiy (eg page-breaks need conventions) | |||
===highlights=== | |||
* batch ingest triggerred by institutes | |||
* variety of import and export formats | |||
* generic online editor for structural metadata | |||
* annotation and citation mechanism in both text and image parts | |||
* ability to links and cite texts, images, collections within MPG and beyond | |||
==panmetadocs== | ==panmetadocs== | ||
*Jens Klump Geosciences | |||
* some "big data" projects, many "small data" projects. 25 new projects per year. | |||
* big data projects are easy, they have to budget for data management anyway | |||
* the problem was with the "small data" projects. Funding requirement to maintain the data, but little money | |||
* solution: common data structure --> escidoc | |||
* escidoc as "high rack" storage, which is agnostic to metadata and contents | |||
* PanMetaDocs (PMD)as forklift | |||
* written in PHP, based on panmetaworks) | |||
* Per project, one PMD instance to control access and metadata contexts | |||
* Syndication via RSS and OAI-PMH also allows the creation of portals for a distributed project | |||
* data is not held in the application, but in the infrastructure | |||
* sourcecode available [http://sourceforge.net/projects/panmetadocs/ at sourceforge] | |||
==escidoc Browser/Admintool== | ==escidoc Browser/Admintool== | ||
* OUs | |||
* Contexts (but not pubman contexts) | |||
* User and Roles (but not groups) | |||
* Roles and Scopes | |||
=Thursday 27.10.11= | =Thursday 27.10.11= | ||
developer track humboldt room | developer track humboldt room | ||
breakout session mozart room | breakout session mozart room | ||
=thursday 2.10.11= | |||
==Infrastructure 1. Basic Concepts escidoc== | |||
===infrastructue since escidoc days 2011=== | |||
1.3, java infrastructure connector, 1.4 soap removed | |||
===apis and libraries=== | |||
*rest interface | |||
* infrastructure java connector (ijc) | |||
**maven | |||
** version 1.3 compatible with 1.4 | |||
**escidoc-ijc version 1.4 soap will be removed | |||
*next versions | |||
** minor object changes (more abstraction) | |||
** PHP connector | |||
===documentation=== | |||
*"ziemlich verteilt" | |||
* api vollständig dokumentiert | |||
* viele java beispiele | |||
** xml representations | |||
**java | |||
***connector | |||
***rest api calls | |||
===infrastructure=== | |||
* finely grained authorization system | |||
* a collection of services for applicaiton | |||
** without further gui application not suitable for non-technical users | |||
** not a relational db. | |||
===community=== | |||
===default installation === | |||
* java installer | |||
** installation | |||
** upgrade | |||
** limited options for configuration and local optimisations | |||
===resources=== | |||
*xml namespace, href, last-modified-date | |||
*metadata | |||
*object-specific content | |||
*object-specific references to other elements (eg filters) | |||
==Infrastructure 2: == | |||
* load samples 1.3.3, 1.4 coreservice | |||
==Applications 1: pubman, imeji== | |||
==Applications 2: configuring== |
Latest revision as of 13:43, 29 March 2012
wednesday 26.10.11[edit]
Keynote: JISC managing research data programme (Simon Hodgson)[edit]
Some programs links:
Managing Research Data Trusted Cloud Infrastructure
challenges[edit]
- data deluge
- put also opportunities, eg data reuse
drivers of the jisc mrd programme[edit]
- cordis "riding the wave" cordis report
- RCUK Principles on data policies RCUK data policies
- UK funding agencies NERC ESRC require "data value checklist", + 10 year mandatory storage
- Similar to german guidelines Principles for the handling of research data
results / lessons learnt[edit]
Source JISC Data Centres Report (link not caught)
- needs
- closer cooperation
- better training of staff
- improved practise. ie: before the data is captured/archived ("upstream")
- outputs and results of the MRD program
- support data lifecycle
- circle plan, create (store, annotate), use, appraise (discard, select), publish (identify, describe, discover (access), reuse
- reused some work of national library of australia ("data verbs").
- leadership and policy development
- develop benefits for all stakeholders, (phd student, individul research, research team (eg, do we still have the data when a team-member leaves), university, supra-university
- need for bottom-up approach, lessons from the [1] Incremental project of Cambridge/Glasgow universities
- five training projects on specific areas (keywords for later search: mantra
- DCC how-to guides on data management
- madam project (madam) helps guide project planning for funding applications (see drives above, requirement for DMP Data Management Plan).
- social science research data dmp project by uk data archive
- Data Management Recommendations from same source
- data management costing tool
- Erim engineering research. specifically looking at engineering mapping datasets (developed cooperatively).
- sample project in biology fishnet online like AWOB for freshwater biologists
- Miss (transitional project from Madam, see above) miss website
- citation mechanisms. Datacite initiative with British Library. whatisdatacite (aligns with doi)
- Dryad repository. Aligns with Open Access journals initiatives, for holding biochemical data. Takes portion of Gold OA fee for longterm archival. Dryad Estimate costs of archiving, $25-75 per article
- incentivise long-term archival. Research to show that publically available research data increases citation rate (v. important for sciences, of course) useful blog by a researcher in this area
- support data lifecycle
Escidoc Overview[edit]
Principles
Management of scholarly record[edit]
virtual research environments[edit]
(driver). If the researcher is doing it in a managed environment, then the metadata can be captured easily, early and at-source).
integrated data management[edit]
Pubman 2011 - aspects[edit]
an ESciDoc Application[edit]
- Used as is:
- login
- import format
- searches
- OAI Import
- Wrapped and extended
- authorization
- item format
- Items not used at all
- Containers
- Table of contents
Publication Management[edit]
- Import / Display / Export
- import via 3rd party (...?)
- Sword Import
- Manual Export via basket, export basket via email, download
- Export Interface: Define a query and automatically export. Useful for repeatable scripts
- Metadata handling
- Master Data Management via Cone
- Intelligent cut-and-paste (eg for adding multiple authors)
- Validation of Metadata flexibility via contexts.
- Discovery
- Browsing via topics
- Search via CQL
- Google Site Maps allows crawlers to index
- OAI-PMH allows harvesting by other repository
- RSS-Feeds to spread new publications
- Long-term Archiving (all from escidoc)
- PIDs
- Versioning
Max Planck Digital Archive[edit]
- User Support
- Training
- Maintenance
- Branding
- Import / Migration
- Integration (1. into MPI homepages, 2. into local MPI databases, 3. into authentication infrastructure)
- Feature Requests (e.g. automatic creation of the MPG Yearbook)
Open Source Community[edit]
- Installation Support
- Developer Support
- Feature Requests
- Installer Development
Technology[edit]
- Java
- JBoss / Tomcat
- XSLT Transformation
Future Challenges[edit]
- Clear distinction between OpenSource and MPG versions of the software
- Efficient Support: Toolboxes eg on Homepage integration
- Service Spin-Off - e.g. ConeService (done), TransformationService (not quite there), validationService (definitely not there)
- Enhanced configuration
Imeji[edit]
- Bastien Saquet (MPDL)
- Andreas Vollmer - (computer and media service: HU Berlin)
- Karsten Asshauer & Jörg Busse - HU Berlin, Institute of Art and Visual History Institute of Art and Visual History
- Julian Röder & Hai Nguyen - FU Berlin, Institute Computer Science, Konrad Zuse Internet Archive zuse archive
- CMS handles infrastructure questions, policies, standards
- IAVH: legacy system is imago_mediathek (ms-access, 50,000 images, each with 50 metadata fields)
- Strategy: migrate to web-based enviornment, export to xml, transform, import into eSciDoc/Imeji)
- Metadata issues. Need custom thesaurus.
- Konrad Zuse ArchiveBoth modern multimedia approach, and publish Zuses inheritance in an openaccess manner
- future plans
- digilib integration
- anotation
eKinematix (G. Lonij, RWTH Aachen IGM)[edit]
- virtual research environment for mechanical engineering
- integrated information structure
- mechanism technology
- Documentation
- Collaboration
- Publication
Targets[edit]
- make gathering, organising, enhancing, linking information more easily.
- support of research organisation / cooperation
- Eliminate "reinventing the wheel"
- developer reengineeing and web-services
- To be achieved via modular development and collaboration tools
- expand our existing repository
Implementation[edit]
- XML / XSLT-based.
- Parners with FIZ for hosting and operations (eSciDoc)
- TU Ilmenau. Development of design theory and methodology. Supply libraries for modelling etc.
- IGM RWTH-Aachen. Supply the methanical / robotics / mechatronic expertise. specialised software (gecko geospatial tool, easier to use than a full CAD tool).
- Basic Module (EsciDoc)
- why escidoc? Open source, concentration on service-oriented architecture.
Amalia[edit]
- Digital Humanities
- Tool use in humanities. An epistemologic shift
ENS Lyon, the digital humanities workshop[edit]
- inter-project communication
- common training
- a cyber-infrastructure
ESciDoc Japan[edit]
- Masuo
- very interesting statistics
DigiLifecycle[edit]
- 2 year project
- 5 mpi participants, two associated
goals[edit]
- tools
- usage guidelines
- create an expert group
- Predecessor VIRR
Lifecycle[edit]
- Scanning
- OCR conversions
- produces a TEI file
- bibliogrpaic record (opac)
- ingest + viewing environment DLC tool)
- edit stage, add information (DLC tool)
- virtual research: annotate, reference (DLC tool)
- planning and preparation
- Triggers new projects
challenges[edit]
- generic online tools leads to isolated solutions
- Long-term archiving
- Full text integration (TEI as a semi-standard)
- technical complexitiy (eg page-breaks need conventions)
highlights[edit]
- batch ingest triggerred by institutes
- variety of import and export formats
- generic online editor for structural metadata
- annotation and citation mechanism in both text and image parts
- ability to links and cite texts, images, collections within MPG and beyond
panmetadocs[edit]
- Jens Klump Geosciences
- some "big data" projects, many "small data" projects. 25 new projects per year.
- big data projects are easy, they have to budget for data management anyway
- the problem was with the "small data" projects. Funding requirement to maintain the data, but little money
- solution: common data structure --> escidoc
- escidoc as "high rack" storage, which is agnostic to metadata and contents
- PanMetaDocs (PMD)as forklift
- written in PHP, based on panmetaworks)
- Per project, one PMD instance to control access and metadata contexts
- Syndication via RSS and OAI-PMH also allows the creation of portals for a distributed project
- data is not held in the application, but in the infrastructure
- sourcecode available at sourceforge
escidoc Browser/Admintool[edit]
- OUs
- Contexts (but not pubman contexts)
- User and Roles (but not groups)
- Roles and Scopes
Thursday 27.10.11[edit]
developer track humboldt room breakout session mozart room
thursday 2.10.11[edit]
Infrastructure 1. Basic Concepts escidoc[edit]
infrastructue since escidoc days 2011[edit]
1.3, java infrastructure connector, 1.4 soap removed
apis and libraries[edit]
- rest interface
- infrastructure java connector (ijc)
- maven
- version 1.3 compatible with 1.4
- escidoc-ijc version 1.4 soap will be removed
- next versions
- minor object changes (more abstraction)
- PHP connector
documentation[edit]
- "ziemlich verteilt"
- api vollständig dokumentiert
- viele java beispiele
- xml representations
- java
- connector
- rest api calls
infrastructure[edit]
- finely grained authorization system
- a collection of services for applicaiton
- without further gui application not suitable for non-technical users
- not a relational db.
community[edit]
default installation[edit]
- java installer
- installation
- upgrade
- limited options for configuration and local optimisations
resources[edit]
- xml namespace, href, last-modified-date
- metadata
- object-specific content
- object-specific references to other elements (eg filters)
Infrastructure 2:[edit]
- load samples 1.3.3, 1.4 coreservice