ZFN project overview

MPDL

2013-05-28 13:15 MPDL 213

MPDL ZfN project

 * Project description
 * http://www.znaturforsch.com/
 * http://zfn.mpdl.mpg.de/xtf/home
 * https://projects.gwdg.de/projects/zfn

Possible solutions for ZfN

 * pubman + coreservice
 * pros: fine-grained serach, ready for LTA, support, in-house solution
 * contras: too many resources (technical and human ) are needed for the relative small project
 * wordpress blog as enchancement (similar to Sengbusch Collection)
 * http://xtf.cdlib.org/

XTF (eXtensible Text Framework)

 * about
 * technologoies
 * XSLT 2.0
 * java
 * css + ajax (YUI)
 * lucene
 * tomcat
 * architecture
 * 4 main components
 * crossQuery: The front-end to the collection search system.
 * dynaXML: Interface to individual documents.
 * Text Engine: Used by crossQuery and dynaXML to perform text searches.
 * Indexer: Full-text indexer based on Lucene.

dev/prod process

 * ZfN project @ gwdg chili
 * git infrastructure
 * main gitosis repo @ vm65
 * prod repo in webapps @ vm65
 * trace repo @ git.projects.gwdg.de
 * auto- deployment and pushes with git post-receive scripts

data ingestion/qa process

 * partners (external)
 * 1) scan to PDFa
 * 2) OCR of PDFs (https://github.com/kermitt2/grobid)
 * 3) creation of related TEIs (semi automatic)
 * 4) upload of the bundles (PDFs+TEIs) on the ftp server rzblx9.uni-regensburg.de
 * MPDL (internal)
 * 1) nightly rsync from rzblx9.uni-regensburg.de to the vm65.mpdl.mpg.de (crontab)
 * 2) XTF indexing of the incoming bundles (crontab)
 * 3) checking of data consistency (web interface)
 * 4) updating of wrong TEIs (if possible, if not - clear with partners) + commit to the git
 * 5) rsync of the TEIs to vm65.mpdl.mpg.de

TODO

 * GUI issues
 * last bundles to be OCRed and uploaded (partners)
 * data clean up (mpdl + partners)
 * LTA/backup issues