ESciDoc Developer Workshop 2008-11-18

ESciDoc

Date: 11.18.2008 Start time: 15:00 (moved from 14:30 for this day only)

Location: Karlsruhe, München (Video conference)

Participants MPDL: Wilhelm Frank, Natasa Bulatovic

Participants FIZ: Harald Kappus, Matthias Razum

Previous workshop

 * ESciDoc_Developer_Workshop_2008-11-04

Next workshop

 * ESciDoc_Developer_Workshop_2008-11-24-25

=Agenda=

Filter issues
('/md-records/md-record/publication/creator/person/family-name',          '/md-records/md-record/publication/creator/organization/organization-name')
 * Sorting by "multiple-value" metadata such as "/creator/family-name"
 * Applying joker characters in filter criteria
 * Enablinng OR between same filter parameters
 * Enabling function in sort criteria for multiple-value metadata
 * Problem:
 * Creators can be sorted by either: creator/person/complete-name or creator/organization/name (depending on which one exists)
 * so somehow, this is a compound index that should be in the filter query.
 * made some tests, first look straight forward if we allow to specify the single filter criteria as


 * Note: sorting by lower-case indexing for EU characters brings proper results
 * check cache behaviour for Japanese characters

Outcome

 * topic important for Andre
 * important to understand the requirements
 * MPDL can send more Japanese data for test purposes

Duplicate checking

 * must the item be created in order to have a duplicate checking?
 * can we provide only the XML of item (not yet created) to get possible duplicates?
 * can we have duplicate checking for "pending" items?

Outcome

 * This is one of André's tasks, but he currently works on integrating Group Policies into the Cache (after returning from his vacation)
 * Input from Matthias:
 * takes FT (and very few metadata), extracts text from FT and creates hash value for 7 word groups
 * formats: same as for FT indexes (ASCI, PDF, WORd?, XML, HTML?)
 * FT is at present fine, but MD not yet - there is much bigger issue to functionally check which metadata are actually candidates to distinguish separate objects from duplicates.
 * 7 word groups will probably not work for metadata elements, so we will have to identify (and implement) better-suited algorithms for metadata elements.

Handles
"wir haben zur Zeit Eric Auer aus Nijmegen bei uns, der uns beim Design und dem Aufsetzen von Services auf Basis des Handle-Systems behilflich ist. Da die MPDL der erste ernsthafte Nutzer des Systems sein wird, wäre es für uns sinnvoll, eine möglichst genaue Beschreibung der klientenseitigen Serviceschnittstelle zu bekommen, auf die wir dann die zu implementierenden Services anpassen können. Möglicherweise haben sie dazu ja schon  Spezifikationen vorliegen, die wir übernehmen können. "
 * Citing the mail from Ulrich Schwardmann

We pointed to the PID Manager description on escidoc-project pages
 * will this suffice or there is more information that need to be provided?
 * who will be the contact from FIZ?

Outcome

 * Steffen is contact from FIZ
 * documentation is included in the ZIP file available from the download page: http://www.escidoc-project.de/software/builds/stable-releases/latest_release/escidoc-pid-manager-release.zip

Tests after migration of data from eDoc

 * Axis reports OutOfMemory errors when we retrieve items (total no of items 2700)
 * experience on FIZ side?
 * this is an issue with exporting of items

Tipp

 * check JBoss memory configuration
 * check recommendation on Wiki pages
 * may be easier to handle with REST

Content Model

 * see concept and definition at ESciDoc_Content_Models
 * see implementation and example object at ESciDoc_Content_Model_Object

JHOVE integration

 * discussion at ESciDoc_JHove_Integration

Default MD-Records

 * discussion at ESciDoc_Metadata_Records_Manipulation

Outcome

 * Not really happy with this as eSciDoc-core is resource-oriented and should be able to deliver complete resources
 * FIZ needs to check the concept proposal on the linked page first (not done yet)

TOC

 * discussion at ESciDoc_Toc

Outcome

 * Natasa will put the page on Colab

Ontology Manager

 * see ESciDoc_Content_Relations

OAI-PMH

 * MPDL: set up requirements for sets

Outcome

 * what would be good criteria to define sets?
 * what kind of sets we would like to provide to outside?
 * to be able to check if the set definition can be fulfilled by filters
 * FIZ will check if it is able to expose filters as sets
 * MPDL will check better requirements and put on Colab

Extras

 * Evil item from Matthias:
 * PubMan binding into same JBoss (Tom)
 * check/change timelines again
 * R4 by end of the year


 * PRONOM identifier: Hard-coded example object
 * At present it is not available
 * Should be within the technical metadata extraction service


 * Group policies, group handler (michael, andre, rozita)
 * support for japanese in search (michael)
 * no resources to work on administrative searches this year (michael, andre) unless support for japanese search is reschedules
 * check on wildcards in filter criteria?
 * performance issues