Difference between revisions of "PubMan Func Spec Submission/arXiv mapping"
Jump to navigation
Jump to search
Line 9: | Line 9: | ||
| oai_dc||[http://www.openarchives.org/OAI/2.0/oai_dc.xsd OAI_DC Schema] ||[http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:astro-ph/0702728v2&metadataPrefix=oai_dc Record in oai_dc]||*''parsing is needed see below in mappings'' <br>*''source information needs to be parsed from string, e.g. "<dc:identifier>Phys.Rev. D75 (2007) 083523</dc:identifier>"'' | | oai_dc||[http://www.openarchives.org/OAI/2.0/oai_dc.xsd OAI_DC Schema] ||[http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:astro-ph/0702728v2&metadataPrefix=oai_dc Record in oai_dc]||*''parsing is needed see below in mappings'' <br>*''source information needs to be parsed from string, e.g. "<dc:identifier>Phys.Rev. D75 (2007) 083523</dc:identifier>"'' | ||
|- | |- | ||
| arXiv||[http://arxiv.org/OAI/arXiv.xsd arXiv Schema]||[http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:astro-ph/0702728v2&metadataPrefix=arXiv Record in arXiv]||*''probably a bit outdated as "doi" element is not included in the schema but comes out in the results, see below in mappings''<br>*''subject categories only available as code, e.g. "hep-th" -> mapping to descriptor, e.g. "High Energy Physics - Theory"?''<br>*''more complete/precise version dates are missing'' | | arXiv||[http://arxiv.org/OAI/arXiv.xsd arXiv Schema]||[http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:astro-ph/0702728v2&metadataPrefix=arXiv Record in arXiv]||*''probably a bit outdated as "doi" element is not included in the schema but comes out in the results, see below in mappings''<br>*''subject categories only available as code, e.g. "hep-th" -> mapping to descriptor, e.g. "High Energy Physics - Theory"?''<br>*''more complete/precise version dates are missing''<br>*''source information needs to be parsed from string, e.g. "<journal-ref>Phys.Rev. D75 (2007) 083523</journal-ref>"'' | ||
|- | |- | ||
| arXivRaw||[http://arxiv.org/OAI/arXivRaw.xsd arXivRaw Schema]||[http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:astro-ph/0702728v2&metadataPrefix=arXivRaw Record in arXivRaw]||*''not complete yet and subject to modification according to schema comments''<br>*''authors need to be parsed from string, e.g. "<authors>J. Santos, J.S. Alcaniz, N. Pires, M.J. Reboucas</authors>"'' | | arXivRaw||[http://arxiv.org/OAI/arXivRaw.xsd arXivRaw Schema]||[http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:astro-ph/0702728v2&metadataPrefix=arXivRaw Record in arXivRaw]||*''not complete yet and subject to modification according to schema comments''<br>*''authors need to be parsed from string, e.g. "<authors>J. Santos, J.S. Alcaniz, N. Pires, M.J. Reboucas</authors>"'' |
Revision as of 09:01, 9 April 2008
arXiv[edit]
Schemas[edit]
arXiv currently provides the following arXiv metadata formats via the OAI-PMH interface:
format | schema | example record | notes |
---|---|---|---|
oai_dc | OAI_DC Schema | Record in oai_dc | *parsing is needed see below in mappings *source information needs to be parsed from string, e.g. "<dc:identifier>Phys.Rev. D75 (2007) 083523</dc:identifier>" |
arXiv | arXiv Schema | Record in arXiv | *probably a bit outdated as "doi" element is not included in the schema but comes out in the results, see below in mappings *subject categories only available as code, e.g. "hep-th" -> mapping to descriptor, e.g. "High Energy Physics - Theory"? *more complete/precise version dates are missing *source information needs to be parsed from string, e.g. "<journal-ref>Phys.Rev. D75 (2007) 083523</journal-ref>" |
arXivRaw | arXivRaw Schema | Record in arXivRaw | *not complete yet and subject to modification according to schema comments *authors need to be parsed from string, e.g. "<authors>J. Santos, J.S. Alcaniz, N. Pires, M.J. Reboucas</authors>" |
arXivOld | arXivOld Schema | Record in arXivOld | *not considered |
For start we will use arXiv metadata format as it seems to require minimum parsing of the metadata values to PubItem.
Mapping from arXiv to PubItem[edit]
Arxiv metadata format[edit]
1. header/identifier => identifier (without "oai:arXiv.org:" prefix)
(note: this identifier is important because in the output is pointing to the exact version i.e. v1, v2) which is by arXiv used in "citeAs"
2. Authors
2.1. author/keyname => LastName
2.2. author/forename => Firstname
2.3. author/affiliation => External organization
3. title => title
4. report-no => Source/sequence-number (only if journal-ref is in, otherwise do not map?)
5. journal-ref => source/title (Parsing not in R3)
6. msc-class => dc:subject
7. abstract => abstract
8. categories => dc:subject
9. doi => dc:identifier (DOI)
10. http://arxiv.org/abs/<header/identifier value> => dc:identifier (OTHER)
Note: missing dates, affiliations are in
OAI_DC metadata format[edit]
- Pretty simple as we also use dc metadata in publication profile, but is not correct if not parsed
1. dc:description => dc:abstract (if dc:description does not start with "Comment") 2. dc:date => lists all dates of all versions that exist (earliest date is date when submitted to arXiv, all other are dates when a new revision is done) 3. dc:identifier => partly is identifier, partly is a source information (journal reference i.e. from journal-ref in the arXiv metadata format) 4. dc:subject => to dc:subject (it is full-name of category ids that are delivered from arXiv format) 5. dc:creators => affiliations are missing.
Issues[edit]
- Affiliations: (no possibility for parsing MPI für XXX as organizational units service does not fully support search by organization name)
- as not certain if we would like to have it within the controlled vocab or directly ask for search-organizations methods from core services an issue is not created as extra requirement for core services. Might be internal requirement for controlled vocab service institutions).
- Parsing of journal names: to check if it is feasible and if possible to relate it in future with controlled vocab service (journals)