Difference between revisions of "Talk:MPDL Project XML Workflow"

From MPDLMediaWiki
Jump to navigation Jump to search
m
Line 1: Line 1:
==Agenda Meeting 16.10.2008==
=Agenda Meeting 16.10.2008=
===Project and current status ===
==Project and current status ==
*https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content
*https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-content
*https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-software
*https://itgroup.mpiwg-berlin.mpg.de:8080/tracs/mpdl-project-software
Line 21: Line 21:
*One of the main goals is to use a repository functionality for all artefacts and enable easier reuse such as: importing the texts on which a work needs to be done into tool installed on a local system, as well as easily submitting the modified texts back to the repository
*One of the main goals is to use a repository functionality for all artefacts and enable easier reuse such as: importing the texts on which a work needs to be done into tool installed on a local system, as well as easily submitting the modified texts back to the repository


===Introduction&status overview of eSciDoc project===
==Introduction&status overview of eSciDoc project==


===High-Level Requirements (functional, technical) for XMl Workflow project===
==High-Level Requirements (functional, technical) for XMl Workflow project==
===Repository===


*Repository - the functions that need to be provided are basically:
Functions that need to be provided are basically:
**persistent storage for resources that come from various projects
**persistent storage for resources that come from various projects
**versioning of resources
**versioning of resources
Line 34: Line 35:
***links to language specific functionality
***links to language specific functionality
***links to sources (available on the web)
***links to sources (available on the web)
*searching functionality  
 
===searching functionality (general)===
**project team considers as a core system to search within the XML documents the following: eXist database, Lucene or Oracle 11g  
**project team considers as a core system to search within the XML documents the following: eXist database, Lucene or Oracle 11g  
**two types of queries need to be supported:
**two types of queries need to be supported:
Line 42: Line 44:
*Digilib - to be enabled as a service for viewing in-line images such as figures, diagrams
*Digilib - to be enabled as a service for viewing in-line images such as figures, diagrams
**need to have the possibility to use quite mature/robust tools for working with images
**need to have the possibility to use quite mature/robust tools for working with images
*Lemmatized search


===Relation between the two projects and possibility for reuse===
===Lemmatized search (more precized)===
 
The functionality MPI WG intends to build is to enable searching by a word in its citation form
*for EU languages stemming is usually fine
*for other languages (e.g. Sanscrit, Chinese) words need to be indexed
 
==Relation between the two projects and possibility for reuse==

Revision as of 11:31, 5 November 2008

Agenda Meeting 16.10.2008[edit]

Project and current status[edit]

The XML Workflow project started September 2008. There are two main parts of the project:

  • Defining a working process (workflow) for production of XML texts (and documenting this process so that it can be reused)
    • Digitization of e.g. manuscripts
    • Transcription of the text presented on the manuscripts
    • Markup of parts of the XML texts
    • Enrichment of the XML texts
  • Development aspect by enabling tools and infrastructure to
    • enable access to documents, linking between documents and internal parts of the documents
    • building functions for searching, indexing and retrieval of relevant results

The main motivation is to standardize the working processes and develop a Center of competence that provides guidelines for transcription of texts.

  • Currently the project is in the initial design phase
  • One of the main goals is to use a repository functionality for all artefacts and enable easier reuse such as: importing the texts on which a work needs to be done into tool installed on a local system, as well as easily submitting the modified texts back to the repository

Introduction&status overview of eSciDoc project[edit]

High-Level Requirements (functional, technical) for XMl Workflow project[edit]

Repository[edit]

Functions that need to be provided are basically:

    • persistent storage for resources that come from various projects
    • versioning of resources
    • persistent identification of resources
    • possibility to access arbitrary functions of XML Documents
    • transformation of XML documents to XHTML for presentation purposes
    • enrichment of dsata such as:
      • links to language specific functionality
      • links to sources (available on the web)

searching functionality (general)[edit]

    • project team considers as a core system to search within the XML documents the following: eXist database, Lucene or Oracle 11g
    • two types of queries need to be supported:
      • structural queries of XML documents (in particular trees, subset of trees)
      • Full-text searching (integrated language technology)
      • support for different languages/scripts such as: Latin, Greek, Chinese, European languages, Sanscrit
  • Digilib - to be enabled as a service for viewing in-line images such as figures, diagrams
    • need to have the possibility to use quite mature/robust tools for working with images

Lemmatized search (more precized)[edit]

The functionality MPI WG intends to build is to enable searching by a word in its citation form

  • for EU languages stemming is usually fine
  • for other languages (e.g. Sanscrit, Chinese) words need to be indexed

Relation between the two projects and possibility for reuse[edit]