ESciDoc Committer Meeting 2011-02-22

From MPDLMediaWiki
Jump to navigation Jump to search

Date: 22.02.2011
Start time: 14:30
End time: 15:30

Location: Karlsruhe, München
Participants MPDL: Natasa Bulatovic, Wilhelm Frank, Michael Franke
Participants FIZ: Michael Hoppe, Steffen Wagner, Harald Kappus

Previous committer meeting

Next committer meetings


1.3 Beta release[edit]

  • feedback MPDL - not yet tested fully
  • expected final release
  • expected migration actions
  • Admin tool (downloadable?)
    • experiences with Vaadin
  • Java API client status


  • not yet clear when RC or stable release will be finalized
  • MPDL can try tests with newer release

postgres connections[edit]

  • feedback to patched version


  • Modification of the XPath functions seem to have resolved the memory problems MPDL was having

Issues with integrity[edit]

  • Item in list.propety, not in list.item, in list.fedora
  • would 1.3 support transactions when saving item to Fedora+riTriples, Lucene?


  • no transactions or similar integrity check support planned for 1.3

Support for operation[edit]

  • discussed in previous workshop
    • self-sanity and integrity checks
      • e.g. we have objects created by (or modified by) users which no longer exist (per id)
    • some statistical reports internally gathered from the core services
      • e.g. resource+user counts, average size of the items, average size of the fulltexts, mimetype statistics, no of not indexed fulltexts etc.
  • reindexing
    • incremental:
    • message displays on the Admin tool
    • partial reindexing
      • enable possiblity to re-index exactly selected set of items or other resources
    • Ingest API and reindexing
    • full reindexing
      • would be good to consider creation of separate index database and replacement of the files when the full reindexing is finalized
    • feedback on PDF extraction


  • incremental reindexing
    • candidates discovery can probably be quicker if scan was used instead of search in the "check-for-indexed-objid" part
    • first tests show very good results
    • not in 1.3
  • fulltext reindexing (PDF extraction)
    • MPDL will implement chaining of several PDF Tools into single external indexing tool
    • no possibility to have indexing working in two rounds (metadata first and then fulltexts)
  • reindexing at present
    • MBean can be configured to process multiple messages, but there are identified issues with the Lucene IndexWriter, therefore it is set-up to 1
  • reindexing in 1.3 and future releases
    • mentioned possibility to spead-up e.g. have separate message beans for each index
    • at the moment this would only be possible for distinction between synchronous and asynchronous indexes
    • certainly a lot of potential to improve it and make it faster - but requires a lot of efforts

future topics[edit]

  • scalability/load tests environment (14.12.10)
    • partnering in setting-up such tests can be set-up as soon as we have environment defined
    • MPDL may have some free resources for setting up such environment next year
  • On Behalf-of deposit (14.12.10)
  • eSciDoc-Colab Page setup
  • installation guides


VidCo: ip ISDN 089.38602-595
TelCo: ISDN 089.38602-213
phone: Natasa 089.38602-223