PubMan Func Spec Submission/arXiv mapping

From MPDLMediaWiki
Revision as of 08:22, 9 April 2008 by Inga (talk | contribs) (→‎arXiv)
Jump to navigation Jump to search

arXiv[edit]

Schemas[edit]

arXiv currently provides the following arXiv metadata formats via the OAI-PMH interface:

format schema example note
oai_dc (OAI_DC Schema Record in oai_dc Parsing is needed see below in mappings
arXiv (arXiv Schema Record in arXiv Probably a bit outdated as "doi" element is not included in the schema but comes out in the results, see below in mappings
arXivRaw (arXivRaw Schema Record in arXivRaw not complete yet and subject to modification according to schema comments
arXivOld (arXivOld Schema Record in arXivOld not considered

For start we will use arXiv metadata format as it seems to require minimum parsing of the metadata values to PubItem.

Mapping from arXiv to PubItem[edit]

Arxiv metadata format[edit]

1. header/identifier => identifier (without "oai:arXiv.org:" prefix)
(note: this identifier is important because in the output is pointing to the exact version i.e. v1, v2) which is by arXiv used in "citeAs"

2. Authors
2.1. author/keyname => LastName
2.2. author/forename => Firstname
2.3. author/affiliation => External organization 

3. title => title

4. report-no => Source/sequence-number (only if journal-ref is in, otherwise do not map?)
5. journal-ref => source/title (Parsing not in R3)
6. msc-class => dc:subject
7. abstract => abstract
8. categories => dc:subject
9. doi => dc:identifier (DOI)

10. http://arxiv.org/abs/<header/identifier value> => dc:identifier (OTHER)

Note: missing dates, affiliations are in

OAI_DC metadata format[edit]

  • Pretty simple as we also use dc metadata in publication profile, but is not correct if not parsed
1. dc:description => dc:abstract (if dc:description does not start with "Comment")
2. dc:date => lists all dates of all versions that exist (earliest date is date when submitted to arXiv, all other are dates when a new revision is done) 
3. dc:identifier => partly is identifier, partly is a source information (journal reference i.e. from journal-ref in the arXiv metadata format)
4. dc:subject => to dc:subject (it is full-name of category ids that are delivered from arXiv format)
5. dc:creators => affiliations are missing. 

Issues[edit]

  • Affiliations: (no possibility for parsing MPI für XXX as organizational units service does not fully support search by organization name)
    • as not certain if we would like to have it within the controlled vocab or directly ask for search-organizations methods from core services an issue is not created as extra requirement for core services. Might be internal requirement for controlled vocab service institutions).
  • Parsing of journal names: to check if it is feasible and if possible to relate it in future with controlled vocab service (journals)

Examples[edit]