ESciDoc Committer Meeting 2011-02-22

Date: 22.02.2011 Start time: 14:30 End time: 15:30

Location: Karlsruhe, München Participants MPDL: Natasa Bulatovic, Wilhelm Frank, Michael Franke Participants FIZ: Michael Hoppe, Steffen Wagner, Harald Kappus

Previous committer meeting
 * ESciDoc_Committer_Meeting_2011-02-15

Next committer meetings
 * ESciDoc_Committer_Meeting_2011-03-01

=Topics=

1.3 Beta release

 * feedback MPDL - not yet tested fully
 * expected final release
 * expected migration actions
 * Admin tool (downloadable?)
 * experiences with Vaadin
 * Java API client status

Outcome

 * not yet clear when RC or stable release will be finalized
 * MPDL can try tests with newer release

postgres connections

 * feedback to patched version

outcome

 * Modification of the XPath functions seem to have resolved the memory problems MPDL was having

Issues with integrity

 * Item in list.propety, not in list.item, in list.fedora
 * would 1.3 support transactions when saving item to Fedora+riTriples, Lucene?

outcome

 * no transactions or similar integrity check support planned for 1.3

Support for operation

 * discussed in previous workshop
 * self-sanity and integrity checks
 * e.g. we have objects created by (or modified by) users which no longer exist (per id)
 * some statistical reports internally gathered from the core services
 * e.g. resource+user counts, average size of the items, average size of the fulltexts, mimetype statistics, no of not indexed fulltexts etc.
 * reindexing
 * incremental:
 * it takes too long to figure out what needs to be reindexed, see https://www.escidoc.org/jira/browse/INFR-1072
 * the chosen option on what to incrementaly reindex has no effect
 * see https://www.escidoc.org/jira/browse/INFR-1070 - what would be the approach
 * message displays on the Admin tool
 * partial reindexing
 * enable possiblity to re-index exactly selected set of items or other resources
 * Ingest API and reindexing
 * full reindexing
 * would be good to consider creation of separate index database and replacement of the files when the full reindexing is finalized
 * feedback on PDF extraction

Outcome

 * incremental reindexing
 * candidates discovery can probably be quicker if scan was used instead of search in the "check-for-indexed-objid" part
 * first tests show very good results
 * not in 1.3
 * fulltext reindexing (PDF extraction)
 * MPDL will implement chaining of several PDF Tools into single external indexing tool
 * no possibility to have indexing working in two rounds (metadata first and then fulltexts)
 * reindexing at present
 * MBean can be configured to process multiple messages, but there are identified issues with the Lucene IndexWriter, therefore it is set-up to 1
 * reindexing in 1.3 and future releases
 * mentioned possibility to spead-up e.g. have separate message beans for each index
 * at the moment this would only be possible for distinction between synchronous and asynchronous indexes
 * certainly a lot of potential to improve it and make it faster - but requires a lot of efforts

future topics

 * scalability/load tests environment (14.12.10)
 * partnering in setting-up such tests can be set-up as soon as we have environment defined
 * MPDL may have some free resources for setting up such environment next year
 * On Behalf-of deposit (14.12.10)
 * see also new features for v2.0
 * will be addressed in v.2.0 (with ownership - transfer ownership issue)
 * eSciDoc-Colab Page setup
 * installation guides

=Connections= VidCo: ip 192.129.1.132  ISDN  089.38602-595 TelCo: ISDN 089.38602-213 phone: Natasa 089.38602-223