Difference between revisions of "PubMan 7 7"
Siedersleben (talk | contribs) |
Siedersleben (talk | contribs) |
||
Line 16: | Line 16: | ||
== Core Properties == | == Core Properties == | ||
* escidoc-core.properties: remove pdf extractor properties | |||
# true|false Defines what happenes if an Exception occurs while extracting the text from an pdf for indexing | |||
# if set to true, Exception is ignored and object is indexed without the fulltext. | |||
# if set to false, Exception is thrown and object is not indexed at all. | |||
gsearch.ignoreTextExtractionErrors = true | |||
# Location of the indexingStylesheet that generates the indexInformation-Document for gsearch-indexing. | |||
# has to be an URL | |||
# currently the eSciDoc-Core-Infrastructure provides 2 index-databases: escidoc_all and escidocou_all | |||
# stylesheet-path-property for index escidoc_all is gsearch.escidoc.indexingStylesheet | |||
# stylesheet-path-property for index escidoc_all is gsearch.escidocou.indexingStylesheet | |||
#gsearch.escidoc.indexingStylesheet = http://escidoc1.escidoc.mpg.de/resources/searchIndexDefinition/mpdlEscidocXmlToLucene_1.2.xslt | |||
gsearch.escidoc.indexingStylesheet = http://coreservice.mpdl.mpg.de/mpdlEscidocXmlToLucene.xslt | |||
gsearch.escidocou.indexingStylesheet = | |||
# if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs, | |||
# define command-line-command to custom pdf-text-extractor (has to get installed seperately) | |||
# define command with full path, define inputfile with <inputfile> and outputfile with <outputfile> | |||
#example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile> | |||
# gsearch.pdfTextExtractorCommand = /usr/bin/pdftotext -cfg /etc/xpdfrc <inputfile> <outputfile> | |||
gsearch.pdfTextExtractorCommand = /usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar de.mpg.escidoc.services.extraction.ExtractionChain <inputfile> <outputfile> | |||
# Analyzer to use for indexing and search | |||
lucene.analyzer = de.escidoc.sb.common.lucene.analyzer.EscidocAnalyzer | |||
add the following properties to JBOSS_HOME/conf/sear/config/fedoragsearch.properties: | |||
# if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs, | |||
# use a command-line tool. | |||
# If you want to use a command-line tool, | |||
# define command-line-command to custom pdf-text-extractor (has to get installed seperately) | |||
# define command with full path, define inputfile with <inputfile> and outputfile with <outputfile> | |||
#example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile> | |||
fedoragsearch.pdfTextExtractorCommand=/usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar de.mpg.escidoc.services.extraction.ExtractionChain <inputfile> <outputfile> | |||
# true|false Defines what happens if an Exception occurs while extracting the text from an pdf for indexing | |||
# if set to true, Exception is ignored and object is indexed without the fulltext. | |||
# if set to false, Exception is thrown and object is not indexed at all. | |||
fedoragsearch.ignoreTextExtractionErrors=true | |||
== Core Index Properties == | == Core Index Properties == |
Revision as of 06:27, 10 July 2014
This page shall contain every change that is made during a qa release of the version mentioned above. If it's not here, it never happened!
PubMan 7.6 Release[edit]
Affected Servers[edit]
Prepare read only system[edit]
Fedora[edit]
Coreservice Apache[edit]
Coreservice JBoss[edit]
Core Infrastructure[edit]
Core Properties[edit]
- escidoc-core.properties: remove pdf extractor properties
- true|false Defines what happenes if an Exception occurs while extracting the text from an pdf for indexing
- if set to true, Exception is ignored and object is indexed without the fulltext.
- if set to false, Exception is thrown and object is not indexed at all.
gsearch.ignoreTextExtractionErrors = true
- Location of the indexingStylesheet that generates the indexInformation-Document for gsearch-indexing.
- has to be an URL
- currently the eSciDoc-Core-Infrastructure provides 2 index-databases: escidoc_all and escidocou_all
- stylesheet-path-property for index escidoc_all is gsearch.escidoc.indexingStylesheet
- stylesheet-path-property for index escidoc_all is gsearch.escidocou.indexingStylesheet
- gsearch.escidoc.indexingStylesheet = http://escidoc1.escidoc.mpg.de/resources/searchIndexDefinition/mpdlEscidocXmlToLucene_1.2.xslt
gsearch.escidoc.indexingStylesheet = http://coreservice.mpdl.mpg.de/mpdlEscidocXmlToLucene.xslt gsearch.escidocou.indexingStylesheet =
- if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
- define command-line-command to custom pdf-text-extractor (has to get installed seperately)
- define command with full path, define inputfile with <inputfile> and outputfile with <outputfile>
- example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile>
- gsearch.pdfTextExtractorCommand = /usr/bin/pdftotext -cfg /etc/xpdfrc <inputfile> <outputfile>
gsearch.pdfTextExtractorCommand = /usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar de.mpg.escidoc.services.extraction.ExtractionChain <inputfile> <outputfile>
- Analyzer to use for indexing and search
lucene.analyzer = de.escidoc.sb.common.lucene.analyzer.EscidocAnalyzer
add the following properties to JBOSS_HOME/conf/sear/config/fedoragsearch.properties:
- if pdfBox (internally used by gsearch to extract text from pdfs) is not working well for your pdfs,
- use a command-line tool.
- If you want to use a command-line tool,
- define command-line-command to custom pdf-text-extractor (has to get installed seperately)
- define command with full path, define inputfile with <inputfile> and outputfile with <outputfile>
- example: C:/Programme/xpdf-3.02pl2-win32/pdftotext -cfg C:/Programme/xpdf-3.02pl2-win32/xpdfrc <inputfile> <outputfile>
fedoragsearch.pdfTextExtractorCommand=/usr/bin/java -classpath /usr/share/jboss/server/default/conf/pdf-extraction/classes:/usr/share/jboss/server/default/conf/pdf-extraction/lib/iText-5.0.6.jar de.mpg.escidoc.services.extraction.ExtractionChain <inputfile> <outputfile>
- true|false Defines what happens if an Exception occurs while extracting the text from an pdf for indexing
- if set to true, Exception is ignored and object is indexed without the fulltext.
- if set to false, Exception is thrown and object is not indexed at all.
fedoragsearch.ignoreTextExtractionErrors=true
Core Index Properties[edit]
Core Lucene Index[edit]
PubMan EAR[edit]
PubMan Properties[edit]
- escidoc.transformation.wos.stylesheet.filename=/usr/share/jboss/server/default/conf/transformation/transformations/otherFormats/xslt/wosxml2escidoc.xslt
& check if transformation exists
- escidoc.framework_access.framework.url=http://localhost:8080 (instead of coreservice)