Difference between revisions of "PubMan Func Spec Submission/arXiv mapping"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 2: Line 2:
===Schema===
===Schema===
arXiv currently provides the following [http://export.arxiv.org/oai2?verb=ListMetadataFormats  arXiv metadata formats] via the OAI-PMH interface:
arXiv currently provides the following [http://export.arxiv.org/oai2?verb=ListMetadataFormats  arXiv metadata formats] via the OAI-PMH interface:
* oai_dc ([http://www.openarchives.org/OAI/2.0/oai_dc.xsd OAI_DC Schema])
* oai_dc ([http://www.openarchives.org/OAI/2.0/oai_dc.xsd OAI_DC Schema], ''Note: not the complete metadata from this schema are delivered from OAI-PMH interface from arXiv e.g. dc:description is used for both abstract and comment '')
* arXiv ([http://arxiv.org/OAI/arXiv.xsd arXiv Schema], ''Note: probably a bit outdated as "doi" element is not included in the schema but comes out in the results'')
* arXiv ([http://arxiv.org/OAI/arXiv.xsd arXiv Schema], ''Note: probably a bit outdated as "doi" element is not included in the schema but comes out in the results'')
* arXivRaw ([http://arxiv.org/OAI/arXivRaw.xsd arXivRaw Schema], ''Note: not complete yet and subject to modification according to schema comments'')
* arXivRaw ([http://arxiv.org/OAI/arXivRaw.xsd arXivRaw Schema], ''Note: not complete yet and subject to modification according to schema comments'')

Revision as of 12:42, 8 April 2008

arXiv[edit]

Schema[edit]

arXiv currently provides the following arXiv metadata formats via the OAI-PMH interface:

  • oai_dc (OAI_DC Schema, Note: not the complete metadata from this schema are delivered from OAI-PMH interface from arXiv e.g. dc:description is used for both abstract and comment )
  • arXiv (arXiv Schema, Note: probably a bit outdated as "doi" element is not included in the schema but comes out in the results)
  • arXivRaw (arXivRaw Schema, Note: not complete yet and subject to modification according to schema comments)
  • arXivOld (arXivOld Schema, Note: not considered)


For start we will use arXiv metadata format as it seems to require minimum parsing of the metadata values to PubItem.

Mapping from arXiv to PubItem[edit]

1. header/identifier => identifier (without "oai:arXiv.org:" prefix)
(note: this identifier is important because in the output is pointing to the exact version i.e. v1, v2) which is by arXiv used in "citeAs"

2. Authors
2.1. author/keyname =>LastName
2.2 author/forename => Firstname
2.3 author/affiliation => External organization 

3. title => title

4. report-no => Source/sequence-number (only if journal-ref is in, otherwise do not map?)
5. journal-ref => source/title (Parsing not in R3)
6. msc-class => dc:subject
7. abstract => abstract
8. categories => dc:subject
9. doi => dc:identifier (DOI)

10. http://arxiv.org/abs/<header/identifier value> => dc:identifier (OTHER)

Issues[edit]

  • Affiliations: (no possibility for parsing MPI für XXX as organizational units service does not fully support search by organization name)
    • as not certain if we would like to have it within the controlled vocab or directly ask for search-organizations methods from core services an issue is not created as extra requirement for core services. Might be internal requirement for controlled vocab service institutions).
  • Parsing of journal names: to check if it is feasible and if possible to relate it in future with controlled vocab service (journals)