Talk:ESciDoc Services Search&Export

MPDL

For the "startRecord" and "maximumRecords" to work for me, I also need the "total number of records" to be returned in the snippet for the search query. -- Andreas Gros 09:00, 7 April 2009 (UTC)

= Performance problems =

Citation Styles export
All measures in ms., taken directly from the jboss log on dev-pubman
 * APA:
 * Request:

http://dev-pubman.mpdl.mpg.de:8080/search/SearchAndExport?cqlQuery=escidoc.context.name=%22Citation%20Style%20Testing%20Context%22&exportFormat=APA&outputFormat=pdf&language=all&sortOrder=ascending
 * Result: APA in pdf format, 135 items


 * Result: APA in pdf format, 1000 items


 * Scalability test, dev-pubman instance

Number of items 150	300	1000	1500 Runtime for the report generation, ms 2855	7683	53874	111091 2990	7453	53824	112850 2901	7505	53630	112850 2910	7442	53535	111009 2847	8623	53828	112483 Average, ms 2900.6	7741.2	53738.2	111560 Average per item, ms 19.26	26.09	53.68	74.24
 * Facit: problems with scalability


 * AJP:
 * Request:

http://dev-pubman.mpdl.mpg.de:8080/search/SearchAndExport?cqlQuery=escidoc.context.name=%22Citation%20Style%20Testing%20Context%22&exportFormat=AJP&outputFormat=pdf&language=all&sortOrder=ascending
 * Result: AJP in pdf format, 135 items


 * first call means first call of the citation manager after jboss start.


 * I also made some measures: 150 items in citationmanager take 9797ms to be exported to PDF on my local machine, whereas 1500 items take 503750ms (>8min). This is a factor of 51 --MFranke 12:56, 9 April 2009 (UTC)
 * Retested on the dev-pubman instance, please look above. Local PC is not reference performance test bench --Makarenko 11:30, 10 April 2009 (UTC).


 * Solutions for Citation Styles
 * Call dummy style processing by citation manager initialization (easy and fast, performance factor 2-3 for the real calls)
 * Rework report generation:
 * move variables to the srciptlet (to be tested first, lot of work during implementation)
 * reduce the number of variables
 * maybe paging of the source data might also help (see here). Or is this already implemented? --MFranke 11:45, 9 April 2009 (UTC)
 * no, datasource paging is not implemented. imo, it can help to process huge datasources (item list) w/o memory heap overflow but will not bring a lot for the performance issues. To be analyzed. --Makarenko 12:27, 9 April 2009 (UTC)
 * Try to use Jaxen XPath implementation instead of Xalan (pro: performance factor 2 for elements, contra: poor XPath implementation)

JasperReports should be tested directly (independently of the citationstyle manager) to find out the impact of the on the report generation performance.
 * 1) number of items in the data source (scalability of jasperreports)
 * 2) number of fields
 * 3) number of variables
 * 4) repeatable elements usage (i.e. scriptlets)
 * 5) jasperreports version
 * 6) jaxen XPath implementation

If no improvements under the JasperReports can be achieved, the current version of the FOP will be evaluated and compared with JapserReports.

BIBTEX Export
All measures in ms., taken from the firefox add-on lori

Request: http://dev-pubman.mpdl.mpg.de:8080/search/SearchAndExport?cqlQuery=escidoc.context.name=%22Citation%20Style%20Testing%20Context%22&exportFormat=BIBTEX&outputFormat=pdf&language=all&sortOrder=ascending Result: BIBTEX in txt format, 135 items


 * Solutions for BIBTEX
 * Use XSLT functions carefully, e.g.  with the special characters mapping should be moved to java function. See here for solution.

--Makarenko 11:18, 18 May 2009 (UTC): Fixed, see ticket: http://zim01.gwdg.de:8080/browse/AS-767

Common solutions

 * delay warning in HTTP response, JIRA Ticket
 * --Makarenko 11:20, 18 May 2009 (UTC). Fixed, see http://zim01.gwdg.de:8080/browse/AS-730


 * set  forcibly to some reasonable value
 * --Makarenko 11:22, 18 May 2009 (UTC): Decided not to set any limits for the moment.

Miscellaneous

 * For the future: Generation of the huge reports with JRVirtualizer. Article

FACES Export
I also tested performance of export in FACES. I used an album with 300 items and then run the different export available in FACES. --Bastien 14:55, 9 April 2009 (UTC)


 * CSV Files + pictures:
 * Thumbnails resolution : 52,2 s
 * Web resolution : 55,3 s
 * Original resolution: 201,1 s
 * Thumbnails & Web resolutions: : 91,5 s


 * XML Files + pictures:
 * Thumbnails : 53,1 s
 * Web resolution : 56,5
 * Original:
 * Thumbnails + web : 91,1


 * Only pictures:
 * Thumbnails : 47,2s
 * Web resolution : 54,7 s
 * Original:
 * Thumb + web :92,1


 * CSV file: 12,0 s


 * XML File: 300 items: 11,368 s


 * Remarks:
 * Unfortunately, this performance are the better I got. It happens often that performance get worse (until 1,5 slowly). I guess the problem comes from content download from FW which has not really stable performance.
 * This performance value have performed from FACES itself. It would be very much interesting to test it directly from Search&Export.

--Makarenko 11:38, 18 May 2009 (UTC): The album export doesn't use JasperReports, it is only XSLT and archive generation transformations. The performance are only limited by the framework.