Difference between revisions of "PubMan Func Spec Submission/Generic TEI 2 PubItem Mapping"

From MPDLMediaWiki
Jump to navigation Jump to search
(→‎Mapping of TEI fields: added default for lang)
Line 62: Line 62:
|teiHeader.fileDesc.editionStmt.respStmt.name|| Creator.Person.CompleteName || Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Contributor".  
|teiHeader.fileDesc.editionStmt.respStmt.name|| Creator.Person.CompleteName || Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Contributor".  
|-
|-
|teiHeader.fileDesc.extent|| TotalNumberOfPages || only if the field contains "pages" or "pp" etc.
|teiHeader.fileDesc.extent|| TotalNumberOfPages || only if the field contains "pages" or "pp" etc. <br/> [[Talk:PubMan_Func_Spec_Submission/Generic_TEI_2_PubItem_Mapping#Extend | check Example]]
|-
|-
|teiHeader.fileDesc.publicationStmt || PublishingInfo.Publisher || only if there are no subfields (see below).
|teiHeader.fileDesc.publicationStmt || PublishingInfo.Publisher || only if there are no subfields (see below).

Revision as of 09:21, 10 July 2009

This page specifies the mapping from TEI xml to eSciDoc Publication Metadata.

(A PEER specific mapping can be found here)

Mapping of TEI Genres[edit]

Genres taken from @level values (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-title.html).

@level value PubMan Genre Comment
a Article -
m Book -
j Journal -
s Series -
u Other or Thesis? (if not commercially published)

Mapping of TEI fields[edit]

This mapping bases on the P5 Guidelines.

TEI Element PubMan Metadata Set Comment
*.affiliation *.Creator.Person.Organization From any affiliation field that is a subfield of a person (e.g. author, editor), Organizations can be derived: .name or .orgName are mapped to .Name, .address to .Address.
*.title *.Title only if @type attribute = "main" or empty. Otherwise, map to *.AlternativeTitle. However, if there is no main title, the first alternative title is mapped to Title, and if there is more than one main title, only the first main title is mapped to Title and all others are mapped to AlternativeTitle. This mapping rule applies additionally to the following rules.
teiHeader.fileDesc.titleStmt.title Title The @level attribute seems unusual, but not forbidden according to the specification, so it won't do any harm to get a Genre from a Genre mapping here.
teiHeader.fileDesc.titleStmt.author Creator.Person.CompleteName Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Author".
Please note that this field may additionally contain the years of birth and death. Check example
teiHeader.fileDesc.titleStmt.editor Creator.Person.CompleteName Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Editor".
teiHeader.fileDesc.titleStmt.sponsor - -
teiHeader.fileDesc.titleStmt.funder - -
teiHeader.fileDesc.titleStmt.principal Creator.Person.CompleteName Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Contributor".
teiHeader.fileDesc.titleStmt.respStmt.resp - This field contains creator roles other than the aforementioned. If the field content matches one of the remaining PubMan CreatorRoles, the Creator.CreatorRole may be set to that Role instead of "Contributor" (see also respStmt.name below).
teiHeader.fileDesc.titleStmt.respStmt.name Creator.Person.CompleteName Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Contributor".
teiHeader.fileDesc.editionStmt.edition PublishingInfo.Edition -
teiHeader.fileDesc.editionStmt.edition.date Date.Date Set Date.Type to "created".
teiHeader.fileDesc.editionStmt.respStmt.resp - If the field content matches one of the PubMan CreatorRoles, the Creator.CreatorRole may be set to that Role instead of "Contributor" (see also respStmt.name below).
teiHeader.fileDesc.editionStmt.respStmt.name Creator.Person.CompleteName Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Contributor".
teiHeader.fileDesc.extent TotalNumberOfPages only if the field contains "pages" or "pp" etc.
check Example
teiHeader.fileDesc.publicationStmt PublishingInfo.Publisher only if there are no subfields (see below).
teiHeader.fileDesc.publicationStmt.publisher PublishingInfo.Publisher -
teiHeader.fileDesc.publicationStmt.distributor PublishingInfo.Publisher only if publicationStmt.publisher is empty.
teiHeader.fileDesc.publicationStmt.authority PublishingInfo.Publisher only if publicationStmt.publisher and publicationStmt.distributor are empty.
teiHeader.fileDesc.publicationStmt.*.pubPlace PublishingInfo.Place only if it is a subfield of the publicationStmt. subfield that is mapped to PublishingInfo.Publisher (see above). Please note that .pubPlace may also be a direct subfield of publicationStmt, in which case it refers to "the publisher, distributor, or release authority most recently mentioned", so if this is not the same that is mapped to the PublishingInfo.Publisher field, it must not be mapped.
teiHeader.fileDesc.publicationStmt.*.address - -
teiHeader.fileDesc.publicationStmt.idno Identifier.Id only if the @type attribute matches one of the PubMan IdTypes, which then determines the Identifier.IdType. (check mapping here)
teiHeader.fileDesc.publicationStmt.availability - Alternatively, it could be mapped to a dc:rights field, like in PubMan_Func_Spec_Submission/TEI_2_PubItem_Mapping.
teiHeader.fileDesc.publicationStmt.date Date.Date Set Date.DateType to "created".
teiHeader.fileDesc.seriesStmt Source.Title if there are no subfields. It will probably be impossible to extract only the title, so the Source.Title field might also contain volume numbers, ISSNs etc. Set Source.Genre to "Journal". If the item can be identified as a book (e.g. if it has an ISBN), set Source.Genre to "Series" instead.
teiHeader.fileDesc.seriesStmt.title Source.Title Set Source.Genre to "Journal". If the item can be identified as a book (e.g. if it has an ISBN), set Source.Genre to "Series" instead.
teiHeader.fileDesc.seriesStmt.idno Source.Identifier.Id only if the @type attribute matches one of the PubMan IdTypes, which then determines the Identifier.IdType. Alternatively, if the @type attribute identifies the idno as a volume number (e.g. "vol", "volume", "v"), map to Source.Volume instead. In this case, the field may also contain an issue number (in brackets, or separated by a dot), which then is mapped to Source.Issue.
teiHeader.fileDesc.seriesStmt.respStmt.name Source.Creator.Person.CompleteName Set Creator.CreatorRole to "editor".
teiHeader.fileDesc.seriesStmt.respStmt.resp - This field can be scanned for "ed"/"editor"/"edited" etc. to make sure that it's actually an editor. If the respStmt.resp is found to match another CreatorRole instead, the Creator.CreatorRole should be set accordingly (see above).
teiHeader.fileDesc.notesStmt - -
teiHeader.fileDesc.sourceDesc - If this field only contains 'a phrase such as "born digital"', it is ignored. Otherwise, and if it has no subfields, and if the metadata of the item is not already contained in previous fileDesc fields, I suggest mapping the whole field to Title (or trying to extract everything that can be automatically recognized and mapping it to other fields, and putting the rest into Title). sourceDesc fields may contain direct subfields like title, which may have a @level attribute an can thus be easily mapped to the corresponing PubMan fields.
teiHeader.fileDesc.sourceDesc.bibl Title only if titleStmt.title is empty. If this field has no subfields, one could try to extract everything that can be automatically recognized and map it to other fields, and put the rest into Title. If this field has subfields, and if the data in these subfields is not already given in previous fileDesc fields, they should be mapped as described in sourceDesc.biblStruct (see below).
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.title Title only if titleStmt.title is empty. Set Genre to "Book" (except if it can be somehow identified as a "Proceedings" or "Thesis" etc. I'm not sure how this could work, though). Please note: if there is also a sourceDesc.*.analytic field, map monogr.title to Source.Title instead and set Source.Genre to "Book" and Genre to "Book Item" (it could also be "Proceedings" and "Conference Paper", though - I'm not sure how to determine this).
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.author Creator.Person.CompleteName only if titleStmt.author is empty. Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Author".
Please note that this field may additionally contain the years of birth and death and other additional data. Check example
If there is also a sourceDesc.*.analytic field, map monogr.author to Source.Creator instead.
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.editor Creator.Person.CompleteName Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Editor".
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.edition PublishingInfo.Edition only if editionStmt is empty.
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.edition.date Date.Date only if editionStmt is empty. Set Date.DateType to "created".
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.idno Identifier.Id only if the @type attribute matches one of the PubMan IdTypes, which then determines the Identifier.IdType. Alternatively, if the @type attribute identifies the idno as a volume number (e.g. "vol", "volume", "v"), map to Source.Volume instead.
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.extent TotalNumberOfPages only if the field contains the String "pages", "pp" or "p" and if biblStruct.analytic is empty
Check example
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.imprint [see comment] This field may contain subfields like pubPlace, extent etc., which should be mapped accordingly if they are not already given in previous fileDesc fields (see above). Additionally, it may contain the subfield biblScope (see below).
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.imprint.biblScope Source.Volume only if @type = "vol".
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.imprint.biblScope Source.Issue only if @type = "issue".
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.imprint.biblScope Source.StartPage, Source.EndPage only if @type = "pp" or "pages". Check example
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.imprint.biblScope Source.SequenceNumber only if @type = "chap" or "part" and if it's actually a number, not a chapter title.
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.meeting.title Event.Title If biblStruc.analytic is not empty, set Source.Genre to "Proceedings" and Genre to "Conference Paper" (however, the Genres could also include "Talk at Event" or "Conference Report" etc.). If biblStruc.analytic is empty, set Genre to "Proceedings".
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.meeting.date Event.StartDate, Event.EndDate -
teiHeader.fileDesc.sourceDesc.biblStruct.monogr.meeting.address Event.Place -
teiHeader.fileDesc.sourceDesc.biblStruct.analytic.title Title only if titleStmt.title is empty. Set Genre to "Article" (except if it can be somehow identified as a "Book Item" or "Conference Paper" etc. I'm not sure how this could work, though).
teiHeader.fileDesc.sourceDesc.biblStruct.analytic.author Creator.Person.CompleteName only if titleStmt.author is empty. Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Author".
Please note that this field may additionally contain the years of birth and death. Check example
teiHeader.fileDesc.sourceDesc.biblStruct.analytic.editor Creator.Person.CompleteName only if titleStmt.editor is empty. Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Editor". (I'm not sure if this field is ever used, i.e. if there are edited journal articles or something like that.)
teiHeader.fileDesc.sourceDesc.biblStruct.analytic.respStmt.name - This field contains creator roles other than the aforementioned. If the field content matches one of the remaining PubMan CreatorRoles, the Creator.CreatorRole may be set to that Role instead of "Contributor" (see also respStmt.name below).
teiHeader.fileDesc.sourceDesc.biblStruct.analytic.respStmt.resp Creator.Person.CompleteName Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Contributor".
teiHeader.fileDesc.sourceDesc.biblStruct.series.editor Source.Creator.Person.CompleteName only if biblStruct.analytic and seriesStmt are empty. Set Creator.CreatorType to "Person". Set Creator.CreatorRole to "Editor".
teiHeader.fileDesc.sourceDesc.biblStruct.series.biblScope Source.Volume only if biblStruct.analytic and seriesStmt are empty.
teiHeader.fileDesc.sourceDesc.biblStruct.series.title Source.Title only if biblStruct.analytic and seriesStmt are empty.
teiHeader.fileDesc.sourceDesc.biblStruct.series.respStmt.name - This field contains creator roles other than editor. If the field content matches one of the remaining PubMan CreatorRoles, the Creator.CreatorRole may be set to that Role instead of "Contributor" (see also respStmt.name below).
teiHeader.fileDesc.sourceDesc.biblStruct.series.respStmt.resp Source.Creator.Person.CompleteName only if biblStruct.analytic and seriesStmt are empty. Set Source.Creator.CreatorType to "Person". Set Source.Creator.CreatorRole to "Contributor".
teiHeader.fileDesc.sourceDesc.bibl.biblFull [see comment] If the data is not already given in previous fileDesc fields, all subfields should be mapped as stated above in this mapping.
teiHeader.encodingDesc - -
teiHeader.profileDesc.creation Date.Date only if all previous .date fields are empty. Set Date.DateType to "created".
teiHeader.profileDesc.langUsage.language Language Scan the @ident attribute for ISO-639 codes, and if they are found, use them instead.
Default is 'en'
teiHeader.profileDesc.textClass.keywords.list.item Subject -
teiHeader.profileDesc.textClass.classCode.list.item Subject Check the @scheme attribute to see if the classification used is widespread (otherwise, ignore this field) and the code recognizable, or add it as a prefix.
teiHeader.profileDesc.textClass.catRef - -
teiHeader.revisionDesc.change [see comment] Map the content of the @when attribute to Date.Date and set Date.DataType to "modified".
text.front [see comment] Map the content of a div field to TableOfContents if its @type attribute is "contents", and to Abstract if its @type attribute is "abstract".
text.body - -
text.back - -

Annotations[edit]

Short Desc of the Sections[edit]

Section Description Comment
fileDesc (file description) contains a full bibliographic description of an electronic file -
publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text -
seriesStmt (series statement) groups information about the series, if any, to which a publication belongs -
sourceDesc (source description) describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence -
biblStruct (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order -
monogr (monographic level) contains bibliographic elements describing an item (e.g. a book or journal) published as an independent item -
analytic (analytic level) contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication. -
series (series information) contains information about the series in which a book or other bibliographic item has appeared -
profileDesc (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting -

Examples[edit]

TEI P5 example header[edit]

(http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD7)

<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>Common sense, a machine-readable transcript</title>
   <author>Paine, Thomas (1737-1809)</author>
   <respStmt>
    <resp>compiled by</resp>
    <name>Jon K Adams</name>
   </respStmt>
  </titleStmt>
  <editionStmt>
   <edition>
    <date>1986</date>
   </edition>
  </editionStmt>
  <publicationStmt>
   <distributor>Oxford Text Archive.</distributor>
   <address>
    <addrLine>Oxford University Computing Services,</addrLine>
    <addrLine>13 Banbury Road,</addrLine>
    <addrLine>Oxford OX2 6RB,</addrLine>
    <addrLine>UK</addrLine>
   </address>
  </publicationStmt>
  <notesStmt>
   <note>Brief notes on the text are in a
       supplementary file.</note>
  </notesStmt>
  <sourceDesc>
   <biblStruct>
    <monogr>
     <editor>Foner, Philip S.</editor>
     <title>The collected writings of Thomas Paine</title>
     <imprint>
      <pubPlace>New York</pubPlace>
      <publisher>Citadel Press</publisher>
      <date>1945</date>
     </imprint>
    </monogr>
   </biblStruct>
  </sourceDesc>
 </fileDesc>
 <encodingDesc>
  <samplingDecl>
   <p>Editorial notes in the Foner edition have not
       been reproduced. </p>
   <p>Blank lines and multiple blank spaces, including paragraph
       indents, have not been preserved. </p>
  </samplingDecl>
  <editorialDecl>
   <correction status="high" method="silent">
    <p>The following errors
         in the Foner edition have been corrected:
    <list>
      <item>p. 13 l. 7 cotemporaries contemporaries </item>
      <item>p. 28 l. 26 [comma] [period] </item>
      <item>p. 84 l. 4 kin kind </item>
      <item>p. 95 l. 1 stuggle struggle </item>
      <item>p. 101 l. 4 certainy certainty </item>
      <item>p. 167 l. 6 than that </item>
      <item>p. 209 l. 24 publshed published </item>
     </list>
    </p>
   </correction>
   <normalization>
    <p>No normalization beyond that performed
         by Foner, if any. </p>
   </normalization>
   <quotation marks="all" form="std">
    <p>All double quotation marks
         rendered with ", all single quotation marks with
         apostrophe. </p>
   </quotation>
   <hyphenation eol="none">
    <p>Hyphenated words that appear at the
         end of the line in the Foner edition have been reformed.</p>
   </hyphenation>
   <stdVals>
    <p>The values of <att>when-iso</att> on the <gi>time</gi>
         element always end in the format <val>HH:MM</val> or
    <val>HH</val>; i.e., seconds, fractions thereof, and time
         zone designators are not present.</p>
   </stdVals>
   <interpretation>
    <p>Compound proper names are marked. </p>
    <p>Dates are marked. </p>
    <p>Italics are recorded without interpretation. </p>
   </interpretation>
  </editorialDecl>
  <classDecl>
   <taxonomy xml:id="lcsh">
    <bibl>Library of Congress Subject Headings</bibl>
   </taxonomy>
   <taxonomy xml:id="lc">
    <bibl>Library of Congress Classification</bibl>
   </taxonomy>
  </classDecl>
 </encodingDesc>
 <profileDesc>
  <creation>
   <date>1774</date>
  </creation>
  <langUsage>
   <language ident="en" usage="100">English.</language>
  </langUsage>
  <textClass>
   <keywords scheme="#lcsh">
    <list>
     <item>Political science</item>
     <item>United States -- Politics and government —
           Revolution, 1775-1783</item>
    </list>
   </keywords>
   <classCode scheme="#lc">JC 177</classCode>
  </textClass>
 </profileDesc>
 <revisionDesc>
  <change when="1996-01-22">
   <name>CMSMcQ</name> finished proofreading
  </change>
  <change when="1995-10-30">
   <name>L.B. </name> finished proofreading
  </change>
  <change when="1995-07-20">
   <name>R.G. </name> finished proofreading
  </change>
  <change when="1995-07-04">
   <name>R.G. </name> finished data entry
  </change>
  <change when="1995-01-15">
   <name>R.G. </name> began data entry
  </change>
 </revisionDesc>
</teiHeader>

Wikipedia[edit]

(http://de.wikipedia.org/wiki/Text_Encoding_Initiative#Praxisbeispiel)

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>Auf dem Brocken</title>
                <author>Heinrich Heine (1797–1856)</author>
                <respStmt>
                    <name>Wiki Autor</name>
                    <resp>Umwandlung in TEI-konformes XML</resp>
                </respStmt>
            </titleStmt>
            <publicationStmt>
                <p>aus Wikisource, der freien Quellensammlung 
                    (<ptr target="http://de.wikisource.org/wiki/Auf_dem_Brocken"/>)</p>
            </publicationStmt>
            <sourceDesc>
                <biblFull>
                    <titleStmt>
                        <title level="a">Auf dem Brocken</title>
                        <title level="m">Buch der Lieder</title>
                        <title level="m" type="sub">Aus der Harzreise</title>
                        <author>Heine, Heinrich</author>
                    </titleStmt>
                    <publicationStmt>
                        <publisher>Hoffmann und Campe</publisher>
                        <pubPlace>Hamburg</pubPlace>
                        <date>1827</date>
                        <availability>
                            <p>Gemeinfrei, keine Nutzungsbeschränkungen</p>
                        </availability>
                    </publicationStmt>
                </biblFull>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <pb n="302"/>
            <head>Auf dem Brocken.</head>
            <lg type="stanza">
                <l>Heller wird es schon im Osten</l>
                <l>Durch der Sonne kleines Glimmen,</l>
                <l>Weit und breit die Bergesgipfel,</l>
                <l>In dem Nebelmeere schwimmen.</l>
            </lg>
            <lg type="stanza">
                <l n="5">Hätt’ ich Siebenmeilenstiefel,</l>
                <l>Lief ich, mit der Hast des Windes,</l>
                <l>Ueber jene Bergesgipfel,</l>
                <l>Nach dem Haus des lieben Kindes.</l>
            </lg>
            <lg type="stanza">
                <l>Von dem Bettchen, wo sie schlummert,</l>
                <l n="10">Zög’ ich leise die Gardinen,</l>
                <l>Leise küßt’ ich ihre Stirne,</l>
                <l>Leise ihres Munds Rubinen.</l>
            </lg>
            <lg type="stanza">
                <l>Und noch leiser wollt’ ich flüstern</l>
                <l>In die kleinen Lilien-Ohren:</l>
                <l n="15">Denk’ im Traum, daß wir uns lieben,</l>
                <l>Und daß wir uns nie verloren.</l>
            </lg>
        </body>
    </text>
</TEI>

References[edit]