ESciDoc Committer Meeting 2011-02-22
Date: 22.02.2011
Start time: 14:30
End time: 15:30
Location: Karlsruhe, München
Participants MPDL: Natasa Bulatovic, Wilhelm Frank, Michael Franke
Participants FIZ: Michael Hoppe, Steffen Wagner, Harald Kappus
Previous committer meeting
Next committer meetings
Topics[edit]
1.3 Beta release[edit]
- feedback MPDL - not yet tested fully
- expected final release
- expected migration actions
- Admin tool (downloadable?)
- experiences with Vaadin
- Java API client status
Outcome[edit]
- not yet clear when RC or stable release will be finalized
- MPDL can try tests with newer release
postgres connections[edit]
- feedback to patched version
outcome[edit]
- Modification of the XPath functions seem to have resolved the memory problems MPDL was having
Issues with integrity[edit]
- Item in list.propety, not in list.item, in list.fedora
- would 1.3 support transactions when saving item to Fedora+riTriples, Lucene?
outcome[edit]
- no transactions or similar integrity check support planned for 1.3
Support for operation[edit]
- discussed in previous workshop
- self-sanity and integrity checks
- e.g. we have objects created by (or modified by) users which no longer exist (per id)
- some statistical reports internally gathered from the core services
- e.g. resource+user counts, average size of the items, average size of the fulltexts, mimetype statistics, no of not indexed fulltexts etc.
- self-sanity and integrity checks
- reindexing
- incremental:
- it takes too long to figure out what needs to be reindexed, see https://www.escidoc.org/jira/browse/INFR-1072
- the chosen option on what to incrementaly reindex has no effect
- see https://www.escidoc.org/jira/browse/INFR-1070 - what would be the approach
- message displays on the Admin tool
- partial reindexing
- enable possiblity to re-index exactly selected set of items or other resources
- Ingest API and reindexing
- full reindexing
- would be good to consider creation of separate index database and replacement of the files when the full reindexing is finalized
- feedback on PDF extraction
- incremental:
Outcome[edit]
- incremental reindexing
- candidates discovery can probably be quicker if scan was used instead of search in the "check-for-indexed-objid" part
- first tests show very good results
- not in 1.3
- fulltext reindexing (PDF extraction)
- MPDL will implement chaining of several PDF Tools into single external indexing tool
- no possibility to have indexing working in two rounds (metadata first and then fulltexts)
- reindexing at present
- MBean can be configured to process multiple messages, but there are identified issues with the Lucene IndexWriter, therefore it is set-up to 1
- reindexing in 1.3 and future releases
- mentioned possibility to spead-up e.g. have separate message beans for each index
- at the moment this would only be possible for distinction between synchronous and asynchronous indexes
- certainly a lot of potential to improve it and make it faster - but requires a lot of efforts
future topics[edit]
- scalability/load tests environment (14.12.10)
- partnering in setting-up such tests can be set-up as soon as we have environment defined
- MPDL may have some free resources for setting up such environment next year
- On Behalf-of deposit (14.12.10)
- see also new features for v2.0
- will be addressed in v.2.0 (with ownership - transfer ownership issue)
- eSciDoc-Colab Page setup
- installation guides
Connections[edit]
VidCo: ip 192.129.1.132 ISDN 089.38602-595
TelCo: ISDN 089.38602-213
phone: Natasa 089.38602-223