PubMan 7 7

From MPDLMediaWiki
Jump to navigation Jump to search

This page shall contain every change that is made during a qa release of the version mentioned above. If it's not here, it never happened!

PubMan 7.6 Release[edit]

Affected Servers[edit]

Prepare read only system[edit]

Fedora[edit]

Coreservice Apache[edit]

Coreservice JBoss[edit]

Core Infrastructure[edit]

Core Properties[edit]

  • escidoc-core.properties: remove pdf extractor properties
  1. true|false Defines what happenes if an Exception occurs while extracting the text from an pdf for indexing
  2. if set to true, Exception is ignored and object is indexed without the fulltext.
  3. if set to false, Exception is thrown and object is not indexed at all.

gsearch.ignoreTextExtractionErrors = true

  1. Location of the indexingStylesheet that generates the indexInformation-Document for gsearch-indexing.
  2. has to be an URL
  3. currently the eSciDoc-Core-Infrastructure provides 2 index-databases: escidoc_all and escidocou_all
  4. stylesheet-path-property for index escidoc_all is gsearch.escidoc.indexingStylesheet
  5. stylesheet-path-property for index escidoc_all is gsearch.escidocou.indexingStylesheet
  6. gsearch.escidoc.indexingStylesheet = http://escidoc1.escidoc.mpg.de/resources/searchIndexDefinition/mpdlEscidocXmlToLucene_1.2.xslt

gsearch.escidoc.indexingStylesheet = http://coreservice.mpdl.mpg.de/mpdlEscidocXmlToLucene.xslt gsearch.escidocou.indexingStylesheet =

  1. if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
  2. define command-line-command to custom pdf-text-extractor (has to get installed seperately)
  3. define command with full path, define inputfile with <inputfile> and outputfile with <outputfile>
  4. example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile>
  5. gsearch.pdfTextExtractorCommand = /usr/bin/pdftotext -cfg /etc/xpdfrc <inputfile> <outputfile>

gsearch.pdfTextExtractorCommand = /usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar de.mpg.escidoc.services.extraction.ExtractionChain <inputfile> <outputfile>

  1. Analyzer to use for indexing and search

lucene.analyzer = de.escidoc.sb.common.lucene.analyzer.EscidocAnalyzer

add the following properties to JBOSS_HOME/conf/sear/config/fedoragsearch.properties:

  1. if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
  2. use a command-line tool.
  3. If you want to use a command-line tool,
  4. define command-line-command to custom pdf-text-extractor (has to get installed seperately)
  5. define command with full path, define inputfile with <inputfile> and outputfile with <outputfile>
  6. example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile>

fedoragsearch.pdfTextExtractorCommand=/usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar de.mpg.escidoc.services.extraction.ExtractionChain <inputfile> <outputfile>

  1. true|false Defines what happens if an Exception occurs while extracting the text from an pdf for indexing
  2. if set to true, Exception is ignored and object is indexed without the fulltext.
  3. if set to false, Exception is thrown and object is not indexed at all.

fedoragsearch.ignoreTextExtractionErrors=true

Core Index Properties[edit]

Core Lucene Index[edit]

PubMan EAR[edit]

PubMan Properties[edit]

  • escidoc.transformation.wos.stylesheet.filename=/usr/share/jboss/server/default/conf/transformation/transformations/otherFormats/xslt/wosxml2escidoc.xslt

& check if transformation exists

PubMan Apache[edit]

PubMan JBoss[edit]

PubMan PidCache[edit]

AA[edit]

Validation Database[edit]

Migration database[edit]

CoNE[edit]

eSciDoc Admin[edit]

Data Migration[edit]

PubMan Software Homepage[edit]

Miscellaneous[edit]