PubMan 7 7

This page shall contain every change that is made during a qa release of the version mentioned above. If it's not here, it never happened!

= PubMan 7.7 Release =

Core Infrastructure

 * copy jboss-log4j.xml from srv11 /home/siedersleben/escidoc-core-1.3.10-For-Release7.7 to JBOSS_HOME/server/default/conf
 * copy escidoc-core-1.3.10-SNAPSHOT-build72.ear, fedoragsearch.war, srw.war from /home/siedersleben/escidoc-core-1.3.10-For-Release7.7 to JBOSS_HOME/server/default/deploy
 * chown jboss:jboss ....

Core Properties
gsearch.ignoreTextExtractionErrors = true gsearch.escidoc.indexingStylesheet = http://coreservice.mpdl.mpg.de/mpdlEscidocXmlToLucene.xslt gsearch.escidocou.indexingStylesheet = gsearch.pdfTextExtractorCommand = /usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar de.mpg.escidoc.services.extraction.ExtractionChain lucene.analyzer = de.escidoc.sb.common.lucene.analyzer.EscidocAnalyzer
 * escidoc-core.properties: remove the pdf extractor properties from escidoc-core.properties and put the corresponding properties to fedoragsearch.properties (gs: Änderungen eingefügt auf srv11 unter escidoc-core.properties.release7.7.)
 * 1) true|false Defines what happenes if an Exception occurs while extracting the text from an pdf for indexing
 * 2) if set to true, Exception is ignored and object is indexed without the fulltext.
 * 3) if set to false, Exception is thrown and object is not indexed at all.
 * 1) Location of the indexingStylesheet that generates the indexInformation-Document for gsearch-indexing.
 * 2) has to be an URL
 * 3) currently the eSciDoc-Core-Infrastructure provides 2 index-databases: escidoc_all and escidocou_all
 * 4) stylesheet-path-property for index escidoc_all is gsearch.escidoc.indexingStylesheet
 * 5) stylesheet-path-property for index escidoc_all is gsearch.escidocou.indexingStylesheet
 * 6) gsearch.escidoc.indexingStylesheet = http://escidoc1.escidoc.mpg.de/resources/searchIndexDefinition/mpdlEscidocXmlToLucene_1.2.xslt
 * 1) if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
 * 2) define command-line-command to custom pdf-text-extractor (has to get installed seperately)
 * 3) define command with full path, define inputfile with and outputfile with
 * 4) example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc
 * 5) gsearch.pdfTextExtractorCommand = /usr/bin/pdftotext -cfg /etc/xpdfrc
 * 1) Analyzer to use for indexing and search


 * add new property for skipping reindex to escidoc-core.properties (gs: already in escidoc-core.properties.release7.7)

escidoc-core.skip.notify.indexer.methods = assignObjectPid, assignVersionPid
 * 1) Comma separated List of method names, where automatic indexing is skipped


 * fedoragsearch.properties: add the following properties to JBOSS_HOME/conf/search/config/fedoragsearch.properties: (gs: Änderungen eingefügt auf srv11 unter fedoragsearch.properties.release7.7.)

fedoragsearch.pdfTextExtractorCommand=/usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/itextpdf-5.5.1.jar de.mpg.escidoc.services.extraction.ExtractionChain
 * 1) if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
 * 2) use a command-line tool.
 * 3) If you want to use a command-line tool,
 * 4) define command-line-command to custom pdf-text-extractor (has to get installed seperately)
 * 5) define command with full path, define inputfile with and outputfile with
 * 6) example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc

fedoragsearch.ignoreTextExtractionErrors=true
 * 1) true|false Defines what happens if an Exception occurs while extracting the text from an pdf for indexing
 * 2) if set to true, Exception is ignored and object is indexed without the fulltext.
 * 3) if set to false, Exception is thrown and object is not indexed at all.


 * copy directory pdf-extraction and pdfbox-app-1.8.6.jar from /home/siedersleben/escidoc-core-1.3.10-For-Release7.7 to JBOSS_HOME/conf (gs: already done)

Core Index Properties
#Use this property for bulk index operations: the index is hold in memory until ramBufferSize is reached. #Make sure this property does not conflict with fgsindex.maxBufferedDocs. fgsindex.ramBufferSizeMb = 128
 * Reindex: set the following properties in $JBOSS_HOME/conf/search/config/index/escidoc_all/index.properties (remove comment sign)

# Use this property to minimize garbage collection during indexing. Be careful: index is not thread save when running in this mode. fgsindex.indexMode = 1

Resource.Item.indexAsynchronous=true
 * same for item_container_admin
 * set indexing to asynchron for item_container_admin during reindex in $JBOSS_HOME/conf/search/config/index/item_container_admin/index.object-types.properties

WARN 2014-08-06 09:17:54,269 (TransformerToText)(http-0.0.0.0-8080-4) error while transforming pdf to text with external tool: Extracting PDF content Infile: /usr/share/jboss-4.2.3.GA/server/default/1407309473650.pdf Outfile: /usr/share/jboss-4.2.3.GA/server/default/1407309473650.txt Wed Aug 06 09:17:53 CEST 2014 -- started Extracting with xPDF Wed Aug 06 09:17:54 CEST 2014 -- finished successfully Extraction took 314
 * check if pdf extraction works properly looking for log messages like the following in fedoragsearch.log (don't care about the WARN)


 * dont't forget to set back all these properties modified when reindex has finished

Core Lucene Index

 * Take indexing stylesheets from wildfly branch, not from trunk:
 * https://subversion.mpdl.mpg.de/repos/common/wildfly_migration/common_services/framework_access/src/main/resources/

PubMan Properties
& check if transformation exists
 * escidoc.transformation.wos.stylesheet.filename=/usr/share/jboss/server/default/conf/transformation/transformations/otherFormats/xslt/wosxml2escidoc.xslt
 * escidoc.framework_access.framework.url=http://localhost:8080 (instead of coreservice)
 * CHANGE: escidoc.pubman.favicon.url=/pubman/faces/javax.faces.resources/pubman_favicon_32_32.png
 * escidoc.dataaquisition.resources.fop.configuration (?)
 * escidoc.cone.modelsxml.path (?)
 * escidoc.transformation.edoc.stylesheet.filename (?)
 * escidoc.transformation.endnote.ice.stylesheet.filename
 * escidoc.transformation.endnote.stylesheet.filename (?)
 * escidoc.transformation.edoc.configuration.filename (?)
 * escidoc.transformation.escidoc2marcxml.stylesheet.filename (?)
 * escidoc.aa.public.key.file
 * escidoc.aa.private.key.file
 * escidoc.aa.config.file

PubMan Apache

 * Add ProxyPassReverse for /cone, /sword-app, /dataacquisition in Apache 2 config, if not done yet

PubMan Wildfly
  
 * Add pubman module, which should contain all properties and configuration files for PubMan:
 * Create directory WILDFLY_HOME/modules/pubman/main
 * Add a file called module.xml to this directory, containing the following xml

  
 * Add all necessary property files to this directory (pubman.properties, solution.properties, auth.properties, cone.properties, conf.xml, apache-fop-config.xml)
 * Make this module global by adding the following xml snippet to standalone.xml, subsystem urn:jboss:domain:ee


 * Wildfly has a default maximum POST size of 10mb and a default POST parameter size of 1000, which is not convenient for large file uploads
 * Increase max-post and max-parameters size in standalone.xml, subsystem urn:jboss:domain:undertow by changing http-listener to (for e.g. 1024 mb)

eSciDoc-OAI-Provider

 * copy escidoc-oaiprovider.war from /home/walter