Difference between revisions of "PubMan Func Spec Submission/arXiv mapping"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 1: Line 1:
==arXiv==
==arXiv==
===Schema===
===Schema===
arXiv currently provides 3 metadata formats via the OAI-PMH interface:
arXiv currently provides the following [http://export.arxiv.org/oai2?verb=ListMetadataFormats  arXiv metadata formats] via the OAI-PMH interface:
1. oai_dc
* oai_dc ([http://www.openarchives.org/OAI/2.0/oai_dc.xsd OAI_DC Schema])
2. arXiv
* arXiv ([http://arxiv.org/OAI/arXiv.xsd arXiv Schema], ''Note: probably a bit outdated as "doi" element is not included in the schema but comes out in the results'')
3. arXivRaw
* arXivRaw ([http://arxiv.org/OAI/arXivRaw.xsd arXivRaw Schema])
* arXivOld ([http://arxiv.org/OAI/arXivOld.xsd arXivOld Schema], ''Note: not considered'')


see also [http://export.arxiv.org/oai2?verb=ListMetadataFormats  arXiv metadata formats]


For start we will use arXiv metadata format as it seems to require minimum parsing of the metadata values to PubItem.
For start we will use arXiv metadata format as it seems to require minimum parsing of the metadata values to PubItem.


===Mapping from arXiv to PubItem===
===Mapping from arXiv to PubItem===

Revision as of 12:40, 8 April 2008

arXiv[edit]

Schema[edit]

arXiv currently provides the following arXiv metadata formats via the OAI-PMH interface:


For start we will use arXiv metadata format as it seems to require minimum parsing of the metadata values to PubItem.

Mapping from arXiv to PubItem[edit]

1. header/identifier => identifier (without "oai:arXiv.org:" prefix)
(note: this identifier is important because in the output is pointing to the exact version i.e. v1, v2) which is by arXiv used in "citeAs"

2. Authors
2.1. author/keyname =>LastName
2.2 author/forename => Firstname
2.3 author/affiliation => External organization 

3. title => title

4. report-no => Source/sequence-number (only if journal-ref is in, otherwise do not map?)
5. journal-ref => source/title (Parsing not in R3)
6. msc-class => dc:subject
7. abstract => abstract
8. categories => dc:subject
9. doi => dc:identifier (DOI)

10. http://arxiv.org/abs/<header/identifier value> => dc:identifier (OTHER)

Issues[edit]

  • Affiliations: (no possibility for parsing MPI für XXX as organizational units service does not fully support search by organization name)
    • as not certain if we would like to have it within the controlled vocab or directly ask for search-organizations methods from core services an issue is not created as extra requirement for core services. Might be internal requirement for controlled vocab service institutions).
  • Parsing of journal names: to check if it is feasible and if possible to relate it in future with controlled vocab service (journals)