Difference between revisions of "ESciDoc Developer Workshop 14 15 07 2011"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 40: Line 40:
**so practically Solr can be set up now, but it takes a lot of configuration
**so practically Solr can be set up now, but it takes a lot of configuration
**todo: send config and test locally  
**todo: send config and test locally  
 
==Future development plans==
*Future development plans - short term roadmap on versions/features for release 1.4 to 2.0
*Future development plans - short term roadmap on versions/features for release 1.4 to 2.0
**critical: Internal managed vs. externally managed datastreams of MD records
**critical: Internal managed vs. externally managed datastreams of MD records
Line 46: Line 46:
***https://www.escidoc.org/jira/browse/INFR-947
***https://www.escidoc.org/jira/browse/INFR-947
***https://www.escidoc.org/jira/browse/INFR-1190
***https://www.escidoc.org/jira/browse/INFR-1190
 
==


*Admin tool 1.3 offers only repository information
*Admin tool 1.3 offers only repository information

Revision as of 12:18, 14 July 2011

Developer Workshop[edit]

Participants MPDL[edit]

Participants FIZ[edit]

  • Steffen Wagner
  • Michael Hoppe
  • Christian Herlambang
  • Matthias Razum

Agenda 14.07.2011[edit]

Fulltext indexing[edit]

    • enhanced with own xslt - questions from MPDL related to
    • configuration of the search results output (rather than complete item/component/container)
    • highlighting of search results (e.g. get the last page break tag)
    • full text indexing for all FT visibility, searching according privileges and displaying snippets according privileges
    • selective indexing from Admin tools
    • incremental indexing
  • Solr support and interfaces

Outcome on Fulltext indexing[edit]

  • current aproach is not bad, one document is generated with many file-highlight fields - but it has to be checked if the limit of the highlighted fields is 100 and about the performance issues
    • performance issues could be by caused highlighting also during indexing itself, but mostly by search performance itself
    • more input from FIZ after analysis, as the problem is clear
  • fulltext search
    • search receives items to which user has privileges, but when searching from fulltext with restriction privileges - if user has rights on one ft and not on the other ft of the item, she will get both highlights back.
      • workaround: one can exclude ft highlighting for not public texts always, or include visiblity in populating the highlighting (again performance potential)
  • selective indexing
    • would be good to prioritize it (ab Oktober)
    • todo: send more info on internal script
    • index fulltext only or metadata only (in selectiv reindexing) - would not be possible (unless indexes are splitted)
  • incremental reindexing - does not function in 1.2 (to be checked by 1.3)
  • Solr
    • GSearch can index via Solr
    • we have to create an XML docu which Solr understands (similar like Lucene)
    • gSearch can do it,
    • however there are many Solr settings that have to be done i.e. specific fields configuration have to be done
    • so practically Solr can be set up now, but it takes a lot of configuration
    • todo: send config and test locally

Future development plans[edit]

==

  • Admin tool 1.3 offers only repository information
  • Digilib integration
    • plans , ideas, replacements?
  • Scalability&Performance
    • creation of items (MPDL provides some numbers)
    • reindexing
    • statistics - other store - faster and not dependent on escidoc-core and fedora?
  • stress testing, mass data generation, monitoring of core service - how is done internally at FIZ with reference to FIZ Fedora Performance and Scalability Wiki
  • JBoss, other AS, Tomcat
    • supporting newer versions of JBoss
    • support for other AS
    • Tomcat
  • LTA Long term archiving
  • workflow settings (content model, context?)
    • items immediately released (with proper indexing afterwards) -> one call to the service
    • item event log (update, insert comments?)
  • Content Models
    • Any plans from MPDL side?
    • Pragmatic and iterative approach - some ideas


  • SPO

Agenda 15.07.2011[edit]