ESciDoc Developer Workshop 2009-07-29/30

ESciDoc

Date: 29/30.07.2009 Start time: 13:00

Location: München

Participants MPDL: Malte Dreyer, Michael Franke, Thomas Endres, Wilhelm Frank, ...

Participants FIZ: Matthias Razum, Harald Kappus, ...

=Agenda=

Discussions

 * What have we achieved?
 * Original eSciDoc proposal:
 * "Produktionsreifes System"
 * "integrierte Publikationsplattform"
 * "Multidisziplinäre Forschungsumgebung"
 * "e-research platform"
 * Aim: Publication management + 3 data collections online
 * eSciDoc is close to that
 * 3 solutions online
 * eLab/eLib taken out
 * Scholarly Workbench: project was transformed by input from many institutes
 * eSciDoc = e-science infrastructure ?
 * VREs: 9 / 50 new project proposals deal with eSciDoc
 * eSciDoc still not connected to grid technologies
 * A lot of attention in public
 * eSciDoc compared to other publicly funded projects (textgrid, opus)


 * Development opinion:
 * Extreme solution transformation (PubMan 1.0 -> PubMan 5.0)
 * Services evolution
 * Raise of interest (e.g. Faces)
 * Things work!
 * GUI advances
 * communication evolution (FIZ <-> MPDL)
 * Proper implementations facilitate many things (e.g. CSS)
 * Good working development environment (Maven, Continuum, Archiva)
 * Large set of functionalities in framework, but still some requirements left.
 * eSciDoc is a team
 * Even small components were running through continous changes
 * Barriers for other users shall be lowered


 * What haven't we achieved?
 * Full feature coverage
 * Community is not yet evolved fully


 * What makes eSciDoc special, compared to other solutions (e.g., Fedora, Islandora, Jackrabbit)?
 * Killer features ?
 * Versioning
 * LTP
 * Broad theoretical basis
 * eSciDoc interfaces are better than fedoras
 * Can eSciDoc and fedora pros be combined?


 * What important features are still missing?
 * Performance like a database
 * Shibboleth (basic functionality finally working)
 * More solutions
 * Semantic features
 * XML corpora
 * Tools (e.g. for backup, admin)
 * Transactions?


 * What do we want to achieve over the next 12/24 months?
 * 100 PubMan instances
 * 30-40 solutions
 * active development community
 * mature, but not out-dated


 * How do we want to evolve from a funded project to a community project?
 * Cooperations with third parties already started
 * Unify platform(s):
 * Source code
 * Documentation (servers, formats)
 * Enhance documentation
 * Check if it is possible to publish video conference colab pages.


 * How do we see the collaboration of MPDL and FIZ in the future?
 * FIZ eSciDoc team stays as it is, but now has new eSciDoc projects
 * try development exchange (e.g. 1 MPDL dev. against 1 FIZ dev. for 2 weeks):
 * learn about software
 * learn about workflows and dev. environment

Timing: 1.30 hours

2 Fundamental changes

 * Fundamental changes (mainly derived from duality between database and archive)


 * Ideas from FIZ Team:
 * dropping SOAP?
 * Reason: different XML reprentation
 * Addressing subresources and external resources
 * Unification of XML difficult
 * SOAP good for objects, we have XML (string)
 * Development efforts for SOAP are minimal, but test suite development need big efforts/long test time
 * MPDL will check


 * Replace atomistic model for Items/Components with compound model and RELS-INT
 * Customized retrieval method would probably work faster with compound objects
 * Also creation of complex items
 * imho, this topic makes no sense until properly resolved versioning issue (see Day 2 below) by Fedora (at least for MPDL), however, FIZ may implement it as an option during eScidoc-core installation or within content model definition. --Natasa 18:20, 29 July 2009 (UTC)


 * Replace DB-Cache with asynchronous Lucene Index and/or Object Database
 * Item creation is complex
 * Cache is only useful in some cases (only latest version)


 * Idea 1: synchronous Lucene Index
 * Compile XACML2CQL instead of XACML2SQL possible
 * Enhance by latest version
 * Indexing metadata takes about 300ms (without fulltext)
 * tested how much with fulltext and several fulltexts? --Natasa 18:17, 29 July 2009 (UTC)


 * Fulltext/related objects comes later
 * therefore before proper evaluation and tests we need not to take decision on this one, as it requires complete new development --Natasa 18:17, 29 July 2009 (UTC)

DB-Cache should be addressed by a joint working group
 * Idea 2: Persistent data objects in rel. DB
 * Can deliver partial objects or different representations
 * Might serve for batch operation
 * Will be slower
 * Will be mandatory (current cache is optional)
 * Drop latest-version from object representation
 * Representations cached with latest-release and latest-version
 * When a new version is created the cached items are either invalid or have to be updated
 * Might be no problem when DB-cache issue is solved
 * imho, the problem is not purely the idea of the DB-Cache, as one see no dramatical difference between idea 1 and idea 2 and current cache - with the exception - DB Cache is fastest. The problem with the latest version and latest revisions info are not the problem of the Cache, but of how these two are managed throughout the eSciDoc core architecture. Would also like to remind that the cache implementation is still partial, therefore the proposal for joint cache working group is very reasonable --Natasa 18:13, 29 July 2009 (UTC)


 * Remove mapping of "escidoc" MD-record to DC record in Components (set title directly)
 * Seems to be too complicated for use case where users do not want dc metadata
 * Will be checked by MPDL


 * Get rid of content-model-specific properties (my personal "Ceterum censeo Carthaginem esse delendam" ;-)
 * Will be checked by MPDL


 * issues/problems in the current core architecture (input from MPDL team, FIZ Team)
 * issues/problems in the current solutions architecture (input from FIZ team, MPDL Team)

3 hours

Specifics

 * Search and administrative search (additionally date indexes)
 * Search: Examples needed to define required handling of dates like " > 200904 "...
 * Administrative search:
 * Idea: Synchronous indexing of metadata, asynchronous indexing of fulltexts and linked information
 * AA will be integrated like in current DB-cache (in CQL).
 * FIZ will prototype the planned administrative search


 * Admin Tools development
 * A joint admin tool shall be developed
 * It has to be checked:
 * What are the functional requirements (colab page)
 * How should the development be organized ("Community")
 * What technology shall be used


 * Large sets of data ingest - how to avoid downtime to recache and reindexing
 * FIZ will implement the following:
 * In addition to the existing recache/reindex methods there will be methods that work without deleting all items in advance
 * The ingest tool will be able to remember the list of ingested items and index/cache them by user input


 * Trying to add/remove members to a very large container fails with 500 Internal eSciDoc System Error
 * No quick solution from Fedora
 * Possible solution: Temporary deactivate versioning (i.e. system can behave like with a transaction). --> does not solve all problems
 * Different use cases for ViRR and Faces
 * Rethink again

2 hours

Other

 * Alignment of tools and processes (e.g., Maven)
 * Possible joint code-base, deployment configurations such as e.g. PubMan+core only, Faces+core Only, core only etc. ? --Natasa 10:11, 21 July 2009 (UTC)
 * FIZ will build up Maven infrastructure. MPDL and FIZ will short-cut.
 * MPDL will enable solutions to run together with core-services
 * Project module documentation will be merged onto one server (e.g. new parent pom for "Services")


 * Improved and harmonized communication of eSciDoc
 * which information goes to Colab, which to escidoc.org?
 * MPDL will set up a new wiki for eSciDoc with the look&feel of escidoc.org
 * eSciDoc contents will be migrated from Colab
 * = http://wiki.escidoc.org


 * eSciDoc Blog? --Natasa 10:12, 21 July 2009 (UTC)
 * General news, conference reports, thoughts
 * MPDL will set up the blog and adapt GUI to eSciDoc


 * service names and classification
 * Voting: services will be divided into two groups:
 * Core services (Item, Container, Context, Content model, OU, AA, Search, Statistics, Admin)
 * Common services (Naming?)
 * "Manager" grouping will be abandoned


 * service-architecture board (but also important for bringing to a community project)
 * architectural decisions will be published on Colab and can be commented by community
 * eSciDoc technical board (starts with FIZ + MPDL)
 * Service developments life-cycle


 * documentation of services
 * clean-up of unused and untested methods from docu --Natasa 10:19, 21 July 2009 (UTC)
 * These methods will remain in the documentation, but will be marked as "deprecated" or "experimental"


 * installation guides
 * yes
 * Naming conventions will be applied
 * Look&feel will be unified


 * eSciDoc Lab: Colab page gathering experimental modules


 * Exchange of staff members for specific developments
 * Potential candidates:
 * Admin tool
 * DB-cache, administrative search
 * Solr
 * select mixed smaller team for specific developments --Natasa 10:18, 21 July 2009 (UTC)

Planning

 * short-term 6 months (by end of the year) MPDL / FIZ to discuss on the roadmap in more details --Natasa 10:14, 21 July 2009 (UTC)
 * see https://www.escidoc.org/jira/browse/INFR?report=com.atlassian.jira.plugin.system.project%3Aroadmap-panel
 * non-functional release (1.3?)
 * DB-cache
 * Code/performance review
 * Test Mulgara