Difference between revisions of "Talk:PubMan Func Spec Submission/Generic TEI 2 PubItem Mapping"

From MPDLMediaWiki
Jump to navigation Jump to search
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Mapping of TEI Genres==
* Here we need a default, if no level attribute is provided. Would propose 'Article'.
==Mapping of TEI Fields==
==Mapping of TEI Fields==
* Author/Editor:
* Author/Editor:
** '.name or .orgName are mapped to .Name' - what happens if both are provided?
** What happens with sub elements like forename, persname, namelink etc.
** Add author.email to address (?)
** Add author.email to address (?)
* publicationStmt.*.address
** Can this be added to the field PublishingInfo.Place or .Publisher ?
* teiHeader.fileDesc.publicationStmt.authority
** dc:rights? ("(release authority) supplies the name of a person or other agency responsible for making an electronic file available, other than a publisher or distributor.")
* date
** What with the different date types? like:
<monogr>
  <imprint>
      <date type="Accepted" when="2009-02-17"/>
  </imprint>
</monogr>
* ...meeting.title
** "Genre to "Conference Paper" (however, the Genres could also include "Talk at Event" or "Conference Report" etc.)" - We should only map if the mapping is distinct
* biblStruct.monogr.imprint.biblScope 
** "This field may contain subfields like pubPlace, extent etc." - I am afraid we have to list relevant elements again, to be able to write the transformation (have to check with Julia)
* teiHeader.fileDesc.sourceDesc
** Unfortunatly a xslt can not distinguish between a phrase and a title, clear mapping needed. "extract everything that can be automatically recognized and mapping it to other fields, and putting the rest into Title" - Not clear,  what can be in there? Please provide example.
* teiHeader.fileDesc.seriesStmt
** The genre type mapping is not clear to me.
* ...biblStruct.monogr.imprint.biblScope@type="chap"
** would not map to Source.SequenceNumber
* *.adress
** Needs more specific mapping, check example
* publicationStmt.date
** Map to dcterms:dateCopyrighted (?)
* meeting.date  
* meeting.date  
** Map to: Event.StartDate, Event.EndDate, Here we should provide an example to know if we can distinguish between start and enddate
** Map to: Event.StartDate, Event.EndDate, Here we should provide an example to know if we can distinguish between start and enddate
* *.classCode.list.item
**Would not map this, as the usage is unclear (for me), or do you have example?


==General==
 
* "Organization editors are not supported in this mapping. " - Why not, is there any restriction from tei side? (same for creator)
* *.idno: We have to specify what happens to identifier types which are not provided in PubMan


== Examples==
== Examples==
Line 63: Line 28:
  <author>La Fayette, Marie Madeleine Pioche de la Vergne, comtesse de (1634–1693)</author>
  <author>La Fayette, Marie Madeleine Pioche de la Vergne, comtesse de (1634–1693)</author>
: All info within round brackets in an author element should be parsed out. This should apply to all person names (editor, contributor etc.)
: All info within round brackets in an author element should be parsed out. This should apply to all person names (editor, contributor etc.)
'''If element persName does exist, apply the following mapping:'''
Element ''author'' is an example for all roles which might appear in this mapping (editor, contributor etc.)
{|{{table}}
|-
! '''TEI Element'''
! '''[[PubMan_Metadata_Sets|PubMan Metadata Set]]'''
!'''Description''' 
|-
|TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.''author''.persName ||Publication.Creator.Person.FamilyName <br/> '''Note:''' Only if ...author.persName.surname is empty  ||
|-
|TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.''author''.persName.forename || Publication.Creator.Person.GivenName ||Add middle name to first name.
|-
|TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.''author''.persName.surname || Publication.Creator.Person.FamilyName||--
|-
|TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.''author''.persName.roleName ||not mapped ||contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank. Dr, Miss, M.Tech (degree)
|-
|TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.''author''.persName.nameLink||add to Publication.Creator.Person.FamilyName <br/> Seperated from the family name by space ||contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of
|-
|TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.''author''.persName.genName||add to Publication.Creator.Person.GivenName <br/> seperated from the given name by comma||contains a name component like JR.
|}


===Address===
===Address===
Line 78: Line 64:
     <name type="country">United Kingdom</name>
     <name type="country">United Kingdom</name>
  </address>
  </address>
Mapping of address: All sub elements are concatenated with comma and blank and are mapped to the corresponding pubItem address field.
===Dates===
{|border="2"
|-
! width="300" |'''change/date'''
! width="300" |'''Pubman date type'''
|- style="height:20px"
|Received||created
|-
|Revised||modified
|-
|Accepted||accepted
|-
|Registration||not mapped
|-
|Online||published online
|-
|Submitted||submitted
|-
|publication||published in print
|-
|Published||published in print
|-
|ePublished||published online
|}
If dates of the same type are modelled twice, the most recent date will be mapped.
'''Note:''' This value can be found in the date type attribute or in the value of the date or change element <br/>
         
<change when="2009-01-16">Revised</change>
<change when="2009-01-28">Accepted</change>
<date type="publication" when="2008-11-18"/>
<date when="2008-11-08">Online</date>
<date type="Accepted" when="2009-02-17"/>
===General===
(http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD7)
<pre>
<teiHeader>
<fileDesc>
  <titleStmt>
  <title>Common sense, a machine-readable transcript</title>
  <author>Paine, Thomas (1737-1809)</author>
  <respStmt>
    <resp>compiled by</resp>
    <name>Jon K Adams</name>
  </respStmt>
  </titleStmt>
  <editionStmt>
  <edition>
    <date>1986</date>
  </edition>
  </editionStmt>
  <publicationStmt>
  <distributor>Oxford Text Archive.</distributor>
  <address>
    <addrLine>Oxford University Computing Services,</addrLine>
    <addrLine>13 Banbury Road,</addrLine>
    <addrLine>Oxford OX2 6RB,</addrLine>
    <addrLine>UK</addrLine>
  </address>
  </publicationStmt>
  <notesStmt>
  <note>Brief notes on the text are in a
      supplementary file.</note>
  </notesStmt>
  <sourceDesc>
  <biblStruct>
    <monogr>
    <editor>Foner, Philip S.</editor>
    <title>The collected writings of Thomas Paine</title>
    <imprint>
      <pubPlace>New York</pubPlace>
      <publisher>Citadel Press</publisher>
      <date>1945</date>
    </imprint>
    </monogr>
  </biblStruct>
  </sourceDesc>
</fileDesc>
<encodingDesc>
  <samplingDecl>
  <p>Editorial notes in the Foner edition have not
      been reproduced. </p>
  <p>Blank lines and multiple blank spaces, including paragraph
      indents, have not been preserved. </p>
  </samplingDecl>
  <editorialDecl>
  <correction status="high" method="silent">
    <p>The following errors
        in the Foner edition have been corrected:
    <list>
      <item>p. 13 l. 7 cotemporaries contemporaries </item>
      <item>p. 28 l. 26 [comma] [period] </item>
      <item>p. 84 l. 4 kin kind </item>
      <item>p. 95 l. 1 stuggle struggle </item>
      <item>p. 101 l. 4 certainy certainty </item>
      <item>p. 167 l. 6 than that </item>
      <item>p. 209 l. 24 publshed published </item>
    </list>
    </p>
  </correction>
  <normalization>
    <p>No normalization beyond that performed
        by Foner, if any. </p>
  </normalization>
  <quotation marks="all" form="std">
    <p>All double quotation marks
        rendered with ", all single quotation marks with
        apostrophe. </p>
  </quotation>
  <hyphenation eol="none">
    <p>Hyphenated words that appear at the
        end of the line in the Foner edition have been reformed.</p>
  </hyphenation>
  <stdVals>
    <p>The values of <att>when-iso</att> on the <gi>time</gi>
        element always end in the format <val>HH:MM</val> or
    <val>HH</val>; i.e., seconds, fractions thereof, and time
        zone designators are not present.</p>
  </stdVals>
  <interpretation>
    <p>Compound proper names are marked. </p>
    <p>Dates are marked. </p>
    <p>Italics are recorded without interpretation. </p>
  </interpretation>
  </editorialDecl>
  <classDecl>
  <taxonomy xml:id="lcsh">
    <bibl>Library of Congress Subject Headings</bibl>
  </taxonomy>
  <taxonomy xml:id="lc">
    <bibl>Library of Congress Classification</bibl>
  </taxonomy>
  </classDecl>
</encodingDesc>
<profileDesc>
  <creation>
  <date>1774</date>
  </creation>
  <langUsage>
  <language ident="en" usage="100">English.</language>
  </langUsage>
  <textClass>
  <keywords scheme="#lcsh">
    <list>
    <item>Political science</item>
    <item>United States -- Politics and government —
          Revolution, 1775-1783</item>
    </list>
  </keywords>
  <classCode scheme="#lc">JC 177</classCode>
  </textClass>
</profileDesc>
<revisionDesc>
  <change when="1996-01-22">
  <name>CMSMcQ</name> finished proofreading
  </change>
  <change when="1995-10-30">
  <name>L.B. </name> finished proofreading
  </change>
  <change when="1995-07-20">
  <name>R.G. </name> finished proofreading
  </change>
  <change when="1995-07-04">
  <name>R.G. </name> finished data entry
  </change>
  <change when="1995-01-15">
  <name>R.G. </name> began data entry
  </change>
</revisionDesc>
</teiHeader>
</pre>
'''Wikipedia'''
(http://de.wikipedia.org/wiki/Text_Encoding_Initiative#Praxisbeispiel)
<pre>
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>Auf dem Brocken</title>
                <author>Heinrich Heine (1797–1856)</author>
                <respStmt>
                    <name>Wiki Autor</name>
                    <resp>Umwandlung in TEI-konformes XML</resp>
                </respStmt>
            </titleStmt>
            <publicationStmt>
                <p>aus Wikisource, der freien Quellensammlung
                    (<ptr target="http://de.wikisource.org/wiki/Auf_dem_Brocken"/>)</p>
            </publicationStmt>
            <sourceDesc>
                <biblFull>
                    <titleStmt>
                        <title level="a">Auf dem Brocken</title>
                        <title level="m">Buch der Lieder</title>
                        <title level="m" type="sub">Aus der Harzreise</title>
                        <author>Heine, Heinrich</author>
                    </titleStmt>
                    <publicationStmt>
                        <publisher>Hoffmann und Campe</publisher>
                        <pubPlace>Hamburg</pubPlace>
                        <date>1827</date>
                        <availability>
                            <p>Gemeinfrei, keine Nutzungsbeschränkungen</p>
                        </availability>
                    </publicationStmt>
                </biblFull>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <pb n="302"/>
            <head>Auf dem Brocken.</head>
            <lg type="stanza">
                <l>Heller wird es schon im Osten</l>
                <l>Durch der Sonne kleines Glimmen,</l>
                <l>Weit und breit die Bergesgipfel,</l>
                <l>In dem Nebelmeere schwimmen.</l>
            </lg>
            <lg type="stanza">
                <l n="5">Hätt’ ich Siebenmeilenstiefel,</l>
                <l>Lief ich, mit der Hast des Windes,</l>
                <l>Ueber jene Bergesgipfel,</l>
                <l>Nach dem Haus des lieben Kindes.</l>
            </lg>
            <lg type="stanza">
                <l>Von dem Bettchen, wo sie schlummert,</l>
                <l n="10">Zög’ ich leise die Gardinen,</l>
                <l>Leise küßt’ ich ihre Stirne,</l>
                <l>Leise ihres Munds Rubinen.</l>
            </lg>
            <lg type="stanza">
                <l>Und noch leiser wollt’ ich flüstern</l>
                <l>In die kleinen Lilien-Ohren:</l>
                <l n="15">Denk’ im Traum, daß wir uns lieben,</l>
                <l>Und daß wir uns nie verloren.</l>
            </lg>
        </body>
    </text>
</TEI>
</pre>

Latest revision as of 12:58, 8 September 2010

Mapping of TEI Fields[edit]

  • Author/Editor:
    • Add author.email to address (?)
  • meeting.date
    • Map to: Event.StartDate, Event.EndDate, Here we should provide an example to know if we can distinguish between start and enddate


Examples[edit]

StartPage/EndPage[edit]

<biblScope type="pp">381-417</biblScope>
<biblScope type="pp" from="12" to="34"/>
<biblScope type="pp">12</biblScope> 
<biblScope type="pages">3-46</biblScope>
First check if attribute 'from', 'to' is provided
If two numbers are separated by '-' the first number is mapped to StartPage and the second to EndPage, else the whole String is mapped to StartPage

Extend[edit]

<extent>78 p.</extent> -mapped
<extent>19 pp.</extent> -mapped
<extent>3200 sentences</extent> -not mapped
<extent>between 10 and 20 Mb</extent> -not mapped

Author[edit]

<author>Paine, Thomas (1737-1809)</author>
<author>La Fayette, Marie Madeleine Pioche de la Vergne, comtesse de (1634–1693)</author>
All info within round brackets in an author element should be parsed out. This should apply to all person names (editor, contributor etc.)

If element persName does exist, apply the following mapping: Element author is an example for all roles which might appear in this mapping (editor, contributor etc.)

TEI Element PubMan Metadata Set Description
TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.author.persName Publication.Creator.Person.FamilyName
Note: Only if ...author.persName.surname is empty
TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.author.persName.forename Publication.Creator.Person.GivenName Add middle name to first name.
TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.author.persName.surname Publication.Creator.Person.FamilyName --
TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.author.persName.roleName not mapped contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank. Dr, Miss, M.Tech (degree)
TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.author.persName.nameLink add to Publication.Creator.Person.FamilyName
Seperated from the family name by space
contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of
TEIHeader.fileDesc.sourceDesc.biblStruct.analytic.author.persName.genName add to Publication.Creator.Person.GivenName
seperated from the given name by comma
contains a name component like JR.

Address[edit]

<address>
   <addrLine>Oxford University Computing Services,</addrLine>
   <addrLine>13 Banbury Road,</addrLine>
   <addrLine>Oxford OX2 6RB,</addrLine>
   <addrLine>UK</addrLine>
</address>
<address>
   <street>110 Southmoor Road</street>
   <name type="city">Oxford</name>
   <postCode>OX2 6RB</postCode>
   <name type="country">United Kingdom</name>
</address>

Mapping of address: All sub elements are concatenated with comma and blank and are mapped to the corresponding pubItem address field.

Dates[edit]

change/date Pubman date type
Received created
Revised modified
Accepted accepted
Registration not mapped
Online published online
Submitted submitted
publication published in print
Published published in print
ePublished published online

If dates of the same type are modelled twice, the most recent date will be mapped.

Note: This value can be found in the date type attribute or in the value of the date or change element 
<change when="2009-01-16">Revised</change> <change when="2009-01-28">Accepted</change> <date type="publication" when="2008-11-18"/> <date when="2008-11-08">Online</date> <date type="Accepted" when="2009-02-17"/>

General[edit]

(http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD7)

<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>Common sense, a machine-readable transcript</title>
   <author>Paine, Thomas (1737-1809)</author>
   <respStmt>
    <resp>compiled by</resp>
    <name>Jon K Adams</name>
   </respStmt>
  </titleStmt>
  <editionStmt>
   <edition>
    <date>1986</date>
   </edition>
  </editionStmt>
  <publicationStmt>
   <distributor>Oxford Text Archive.</distributor>
   <address>
    <addrLine>Oxford University Computing Services,</addrLine>
    <addrLine>13 Banbury Road,</addrLine>
    <addrLine>Oxford OX2 6RB,</addrLine>
    <addrLine>UK</addrLine>
   </address>
  </publicationStmt>
  <notesStmt>
   <note>Brief notes on the text are in a
       supplementary file.</note>
  </notesStmt>
  <sourceDesc>
   <biblStruct>
    <monogr>
     <editor>Foner, Philip S.</editor>
     <title>The collected writings of Thomas Paine</title>
     <imprint>
      <pubPlace>New York</pubPlace>
      <publisher>Citadel Press</publisher>
      <date>1945</date>
     </imprint>
    </monogr>
   </biblStruct>
  </sourceDesc>
 </fileDesc>
 <encodingDesc>
  <samplingDecl>
   <p>Editorial notes in the Foner edition have not
       been reproduced. </p>
   <p>Blank lines and multiple blank spaces, including paragraph
       indents, have not been preserved. </p>
  </samplingDecl>
  <editorialDecl>
   <correction status="high" method="silent">
    <p>The following errors
         in the Foner edition have been corrected:
    <list>
      <item>p. 13 l. 7 cotemporaries contemporaries </item>
      <item>p. 28 l. 26 [comma] [period] </item>
      <item>p. 84 l. 4 kin kind </item>
      <item>p. 95 l. 1 stuggle struggle </item>
      <item>p. 101 l. 4 certainy certainty </item>
      <item>p. 167 l. 6 than that </item>
      <item>p. 209 l. 24 publshed published </item>
     </list>
    </p>
   </correction>
   <normalization>
    <p>No normalization beyond that performed
         by Foner, if any. </p>
   </normalization>
   <quotation marks="all" form="std">
    <p>All double quotation marks
         rendered with ", all single quotation marks with
         apostrophe. </p>
   </quotation>
   <hyphenation eol="none">
    <p>Hyphenated words that appear at the
         end of the line in the Foner edition have been reformed.</p>
   </hyphenation>
   <stdVals>
    <p>The values of <att>when-iso</att> on the <gi>time</gi>
         element always end in the format <val>HH:MM</val> or
    <val>HH</val>; i.e., seconds, fractions thereof, and time
         zone designators are not present.</p>
   </stdVals>
   <interpretation>
    <p>Compound proper names are marked. </p>
    <p>Dates are marked. </p>
    <p>Italics are recorded without interpretation. </p>
   </interpretation>
  </editorialDecl>
  <classDecl>
   <taxonomy xml:id="lcsh">
    <bibl>Library of Congress Subject Headings</bibl>
   </taxonomy>
   <taxonomy xml:id="lc">
    <bibl>Library of Congress Classification</bibl>
   </taxonomy>
  </classDecl>
 </encodingDesc>
 <profileDesc>
  <creation>
   <date>1774</date>
  </creation>
  <langUsage>
   <language ident="en" usage="100">English.</language>
  </langUsage>
  <textClass>
   <keywords scheme="#lcsh">
    <list>
     <item>Political science</item>
     <item>United States -- Politics and government —
           Revolution, 1775-1783</item>
    </list>
   </keywords>
   <classCode scheme="#lc">JC 177</classCode>
  </textClass>
 </profileDesc>
 <revisionDesc>
  <change when="1996-01-22">
   <name>CMSMcQ</name> finished proofreading
  </change>
  <change when="1995-10-30">
   <name>L.B. </name> finished proofreading
  </change>
  <change when="1995-07-20">
   <name>R.G. </name> finished proofreading
  </change>
  <change when="1995-07-04">
   <name>R.G. </name> finished data entry
  </change>
  <change when="1995-01-15">
   <name>R.G. </name> began data entry
  </change>
 </revisionDesc>
</teiHeader>

Wikipedia (http://de.wikipedia.org/wiki/Text_Encoding_Initiative#Praxisbeispiel)

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>Auf dem Brocken</title>
                <author>Heinrich Heine (1797–1856)</author>
                <respStmt>
                    <name>Wiki Autor</name>
                    <resp>Umwandlung in TEI-konformes XML</resp>
                </respStmt>
            </titleStmt>
            <publicationStmt>
                <p>aus Wikisource, der freien Quellensammlung 
                    (<ptr target="http://de.wikisource.org/wiki/Auf_dem_Brocken"/>)</p>
            </publicationStmt>
            <sourceDesc>
                <biblFull>
                    <titleStmt>
                        <title level="a">Auf dem Brocken</title>
                        <title level="m">Buch der Lieder</title>
                        <title level="m" type="sub">Aus der Harzreise</title>
                        <author>Heine, Heinrich</author>
                    </titleStmt>
                    <publicationStmt>
                        <publisher>Hoffmann und Campe</publisher>
                        <pubPlace>Hamburg</pubPlace>
                        <date>1827</date>
                        <availability>
                            <p>Gemeinfrei, keine Nutzungsbeschränkungen</p>
                        </availability>
                    </publicationStmt>
                </biblFull>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <pb n="302"/>
            <head>Auf dem Brocken.</head>
            <lg type="stanza">
                <l>Heller wird es schon im Osten</l>
                <l>Durch der Sonne kleines Glimmen,</l>
                <l>Weit und breit die Bergesgipfel,</l>
                <l>In dem Nebelmeere schwimmen.</l>
            </lg>
            <lg type="stanza">
                <l n="5">Hätt’ ich Siebenmeilenstiefel,</l>
                <l>Lief ich, mit der Hast des Windes,</l>
                <l>Ueber jene Bergesgipfel,</l>
                <l>Nach dem Haus des lieben Kindes.</l>
            </lg>
            <lg type="stanza">
                <l>Von dem Bettchen, wo sie schlummert,</l>
                <l n="10">Zög’ ich leise die Gardinen,</l>
                <l>Leise küßt’ ich ihre Stirne,</l>
                <l>Leise ihres Munds Rubinen.</l>
            </lg>
            <lg type="stanza">
                <l>Und noch leiser wollt’ ich flüstern</l>
                <l>In die kleinen Lilien-Ohren:</l>
                <l n="15">Denk’ im Traum, daß wir uns lieben,</l>
                <l>Und daß wir uns nie verloren.</l>
            </lg>
        </body>
    </text>
</TEI>