PubMan 7.6 Release[edit]

  • remove pdf extractor properties
  1. true|false Defines what happenes if an Exception occurs while extracting the text from an pdf for indexing
  2. if set to true, Exception is ignored and object is indexed without the fulltext.
  3. if set to false, Exception is thrown and object is not indexed at all.

gsearch.ignoreTextExtractionErrors = true

  1. Location of the indexingStylesheet that generates the indexInformation-Document for gsearch-indexing.
  2. has to be an URL
  3. currently the eSciDoc-Core-Infrastructure provides 2 index-databases: escidoc_all and escidocou_all
  4. stylesheet-path-property for index escidoc_all is gsearch.escidoc.indexingStylesheet
  5. stylesheet-path-property for index escidoc_all is gsearch.escidocou.indexingStylesheet
  6. gsearch.escidoc.indexingStylesheet =

gsearch.escidoc.indexingStylesheet = gsearch.escidocou.indexingStylesheet =

  1. if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
  2. define command-line-command to custom pdf-text-extractor (has to get installed seperately)
  3. define command with full path, define inputfile with <inputfile> and outputfile with <outputfile>
  4. example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile>
  5. gsearch.pdfTextExtractorCommand = /usr/bin/pdftotext -cfg /etc/xpdfrc <inputfile> <outputfile>

gsearch.pdfTextExtractorCommand = /usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar <inputfile> <outputfile>

  1. Analyzer to use for indexing and search

lucene.analyzer =

add the following properties to JBOSS_HOME/conf/sear/config/

  1. if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
  2. use a command-line tool.
  3. If you want to use a command-line tool,
  4. define command-line-command to custom pdf-text-extractor (has to get installed seperately)
  5. define command with full path, define inputfile with <inputfile> and outputfile with <outputfile>
  6. example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile>

fedoragsearch.pdfTextExtractorCommand=/usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar <inputfile> <outputfile>

  1. true|false Defines what happens if an Exception occurs while extracting the text from an pdf for indexing
  2. if set to true, Exception is ignored and object is indexed without the fulltext.
  3. if set to false, Exception is thrown and object is not indexed at all.


  • escidoc.transformation.wos.stylesheet.filename=/usr/share/jboss/server/default/conf/transformation/transformations/otherFormats/xslt/wosxml2escidoc.xslt

& check if transformation exists

