Difference between revisions of "Talk:MPDL Project XML Workflow"
Jump to navigation
Jump to search
m |
|||
Line 1: | Line 1: | ||
=Agenda Meeting 16.10.2008= | |||
==Project and current status == | |||
*https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content | *https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content | ||
*https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-software | *https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-software | ||
Line 21: | Line 21: | ||
*One of the main goals is to use a repository functionality for all artefacts and enable easier reuse such as: importing the texts on which a work needs to be done into tool installed on a local system, as well as easily submitting the modified texts back to the repository | *One of the main goals is to use a repository functionality for all artefacts and enable easier reuse such as: importing the texts on which a work needs to be done into tool installed on a local system, as well as easily submitting the modified texts back to the repository | ||
==Introduction&status overview of eSciDoc project== | |||
==High-Level Requirements (functional, technical) for XMl Workflow project== | |||
===Repository=== | |||
Functions that need to be provided are basically: | |||
**persistent storage for resources that come from various projects | **persistent storage for resources that come from various projects | ||
**versioning of resources | **versioning of resources | ||
Line 34: | Line 35: | ||
***links to language specific functionality | ***links to language specific functionality | ||
***links to sources (available on the web) | ***links to sources (available on the web) | ||
===searching functionality (general)=== | |||
**project team considers as a core system to search within the XML documents the following: eXist database, Lucene or Oracle 11g | **project team considers as a core system to search within the XML documents the following: eXist database, Lucene or Oracle 11g | ||
**two types of queries need to be supported: | **two types of queries need to be supported: | ||
Line 42: | Line 44: | ||
*Digilib - to be enabled as a service for viewing in-line images such as figures, diagrams | *Digilib - to be enabled as a service for viewing in-line images such as figures, diagrams | ||
**need to have the possibility to use quite mature/robust tools for working with images | **need to have the possibility to use quite mature/robust tools for working with images | ||
===Relation between the two projects and possibility for reuse | ===Lemmatized search (more precized)=== | ||
The functionality MPI WG intends to build is to enable searching by a word in its citation form | |||
*for EU languages stemming is usually fine | |||
*for other languages (e.g. Sanscrit, Chinese) words need to be indexed | |||
==Relation between the two projects and possibility for reuse== |
Revision as of 11:31, 5 November 2008
Agenda Meeting 16.10.2008[edit]
Project and current status[edit]
- https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content
- https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-software
The XML Workflow project started September 2008. There are two main parts of the project:
- Defining a working process (workflow) for production of XML texts (and documenting this process so that it can be reused)
- Digitization of e.g. manuscripts
- Transcription of the text presented on the manuscripts
- Markup of parts of the XML texts
- Enrichment of the XML texts
- Development aspect by enabling tools and infrastructure to
- enable access to documents, linking between documents and internal parts of the documents
- building functions for searching, indexing and retrieval of relevant results
The main motivation is to standardize the working processes and develop a Center of competence that provides guidelines for transcription of texts.
- Currently the project is in the initial design phase
- One of the main goals is to use a repository functionality for all artefacts and enable easier reuse such as: importing the texts on which a work needs to be done into tool installed on a local system, as well as easily submitting the modified texts back to the repository
Introduction&status overview of eSciDoc project[edit]
High-Level Requirements (functional, technical) for XMl Workflow project[edit]
Repository[edit]
Functions that need to be provided are basically:
- persistent storage for resources that come from various projects
- versioning of resources
- persistent identification of resources
- possibility to access arbitrary functions of XML Documents
- transformation of XML documents to XHTML for presentation purposes
- enrichment of dsata such as:
- links to language specific functionality
- links to sources (available on the web)
searching functionality (general)[edit]
- project team considers as a core system to search within the XML documents the following: eXist database, Lucene or Oracle 11g
- two types of queries need to be supported:
- structural queries of XML documents (in particular trees, subset of trees)
- Full-text searching (integrated language technology)
- support for different languages/scripts such as: Latin, Greek, Chinese, European languages, Sanscrit
- Digilib - to be enabled as a service for viewing in-line images such as figures, diagrams
- need to have the possibility to use quite mature/robust tools for working with images
Lemmatized search (more precized)[edit]
The functionality MPI WG intends to build is to enable searching by a word in its citation form
- for EU languages stemming is usually fine
- for other languages (e.g. Sanscrit, Chinese) words need to be indexed