PubMan Func Spec Submission/arXiv mapping
This page is discussing issues related to PubMan Func Spec Submission: UC_PM_SM_04_fetch_metadata_from_external_system
Format Name[edit]
Source: arXiv
Target: escidoc-publication
Overview on Schemas supported by arXiv[edit]
arXiv currently provides the following arXiv metadata formats via the OAI-PMH interface:
format | schema | example records | note |
---|---|---|---|
arXiv | arXiv Schema | Record in arXiv Record in arXiv (with affiliation) Record in arXiv (with doi) Record in arXiv (with TeX in title and abstract) Record in arXiv (article in German, abstract in English) Record in arXiv (article with report number) Record in arXiv (article with report number) |
most complete descriptive metadata format, without complete administrative data |
oai_dc | OAI_DC Schema | Record in oai_dc Record in oai_dc (with affiliation) Record in oai_dc (with doi) |
standard format with some specific mappings to available elements, therefore an arXiv specific parsing would be required |
arXivRaw | arXivRaw Schema | Record in arXivRaw Record in arXivRaw (with affiliation) Record in arXivRaw (with doi) |
more complete administrative data, but descriptive metadata is less structured, e.g. all author names are mapped to one creator element Note: not complete yet and subject to modification according to schema comments |
arXivOld | arXivOld Schema | Record in arXivOld | not considered |
Decision[edit]
For start we will use arXiv metadata format as it seems to require minimum parsing of the metadata values to PubItem.
- Agreed, please note that splitting of journal-ref information is probably desired --Inga 12:11, 9 April 2008 (CEST)
- I would plan splitting of journal-ref information to R4 (if not a problem, as there is no single rule - based on checking several records) --Natasa 12:46, 9 April 2008 (CEST)
Mapping from arXiv formats to PubItem[edit]
arXiv format[edit]
- DOI is available, even though not mentioned in schema
1. header/identifier => identifier (without "oai:arXiv.org:" prefix)
(note: this identifier is important because in the output is pointing to the exact version i.e. v1, v2) which is by arXiv used in "citeAs"
2. Authors
2.1. author/keyname => LastName
2.2. author/forename => Firstname
2.3. author/affiliation => External organization
3. title => title
4. report-no => Source/sequence-number (only if journal-ref is in, otherwise do not map?). Inga would rather map it to identifier.
5. journal-ref => source/title/volume/issue/pages (one single field in arxiv, therefore no 'better' mapping possible)
6. msc-class => dcterms:subject
7. abstract => abstract
8. categories => dcterms:subject
9. doi => dc:identifier of type DOI
10. proxy => ???, e.g. <proxy>ccsd hal-00260045</proxy>
11. id => identifier, type=arXiv
11. http://arxiv.org/abs/<header/identifier value> => dc:identifier (arXiv)
Subject Classes[edit]
From arXiv we get the abbreviated subject which we can dissolve for further use in publication metadata.
Abbreviation | Keyword Element in Publication Item |
---|---|
astro-ph | Astrophysics ( astro-ph ) |
astro-ph.CO | Astrophysics, Cosmology and Extragalactic Astrophysics ( astro-ph.CO ) |
astro-ph.EP | Astrophysics, Earth and Planetary Astrophysics ( astro-ph.EP ) |
astro-ph.GA | Astrophysics, Galaxy Astrophysics ( astro-ph.GA ) |
astro-ph.HE | Astrophysics, High Energy Astrophysical Phenomena ( astro-ph.HE ) |
astro-ph.IM | Astrophysics, Instrumentation and Methods for Astrophysics ( astro-ph.IM ) |
astro-ph.SR | Astrophysics, Solar and Stellar Astrophysics ( astro-ph.SR ) |
cond-mat | Condensed Matter ( cond-mat ) |
cond-mat.dis-nn | Condensed Matter, Disordered Systems and Neural Networks ( cond-mat.dis-nn ) |
cond-mat.mtrl-sci | Condensed Matter, Materials Science ( cond-mat.mtrl-sci ) |
cond-mat.mes-hall | Condensed Matter, Mesoscale and Nanoscale Physics ( cond-mat.mes-hall ) |
cond-mat.other | Condensed Matter ( cond-mat.other ) |
cond-mat.quant-gas | Condensed Matter, Quantum Gases ( cond-mat.quant-gas ) |
cond-mat.soft | Condensed Matter, Soft Condensed Matter ( cond-mat.soft ) |
cond-mat.stat-mech | Condensed Matter, Statistical Mechanics ( cond-mat.stat-mech ) |
cond-mat.str-el | Condensed Matter, Strongly Correlated Electrons ( cond-mat.str-el ) |
cond-mat.supr-con | Condensed Matter, Superconductivity ( cond-mat.supr-con ) |
physics | Physics ( physics ) |
physics.acc-ph | Physics, Accelerator Physics ( physics.acc-ph ) |
physics.ao-ph | Physics, Atmospheric and Oceanic Physics ( physics.ao-ph ) |
physics.atom-ph | Physics, Atomic Physics ( physics.atom-ph ) |
physics.atm-clus | Physics, Atomic and Molecular Clusters ( physics.atm-clus ) |
physics.bio-ph | Physics, Biological Physics ( physics.bio-ph ) |
physics.chem-ph | Physics, Chemical Physics ( physics.chem-ph ) |
physics.class-ph | Physics, Classical Physics ( physics.class-ph ) |
physics.comp-ph | Physics, Computational Physics ( physics.comp-ph ) |
physics.data-an | Physics, Data Analysis, Statistics and Probability ( physics.data-an ) |
physics.flu-dyn | Physics, Fluid Dynamics ( physics.flu-dyn ) |
physics.gen-ph | Physics, General Physics ( physics.gen-ph ) |
physics.geo-ph | Physics, Geophysics ( physics.geo-ph ) |
physics.hist-ph | Physics, History of Physics( physics.hist-ph ) |
physics.ins-det | Physics, Instrumentation and Detectors ( physics.ins-det ) |
physics.med-ph | Physics, Medical Physics ( physics.med-ph ) |
physics.optics | Physics, Optics ( physics.optics ) |
physics.ed-ph | Physics, Physics Education ( physics.ed-ph ) |
physics.soc-ph | Physics, Physics and Society( physics.soc-ph ) |
physics.plasm-ph | Physics, Plasma Physics ( physics.plasm-ph ) |
physics.pop-ph | Physics, Popular Physics ( physics.pop-ph ) |
physics.space-ph | Physics, Space Physics ( physics.space-ph ) |
gr-qc | General Relativity and Quantum Cosmology ( gr-qc ) |
hep-ex | High Energy Physics - Experiment ( hep-ex ) |
hep-lat | High Energy Physics - Lattice ( hep-lat ) |
hep-ph | High Energy Physics - Phenomenology ( hep-ph ) |
hep-th | High Energy Physics - Theory ( hep-th ) |
math-ph | Mathematical Physics ( math-ph ) |
nucl-ex | Nuclear Experiment ( nucl-ex ) |
nucl-th | Nuclear Theory ( nucl-th ) |
quant-ph | Quantum Physics ( quant-ph ) |
math | Mathematics ( math ) |
math.AG | Algebraic Geometry ( math.AG ) |
math.AT | Algebraic Topology ( math.AT ) |
math.AP | Analysis of PDEs ( math.AP ) |
math.CT | Category Theory ( math.CT ) |
math.CA | Classical Analysis and ODEs ( math.CA ) |
math.CO | Combinatorics ( math.CO ) |
math.AC | Commutative Algebra ( math.AC ) |
math.CV | Complex Variables ( math.CV ) |
math.DG | Differential Geometry ( math.DG ) |
math.DS | Dynamical Systems ( math.DS ) |
math.FA | Functional Analysis ( math.FA ) |
math.GM | General Mathematics ( math.GM ) |
math.GN | General Topology ( math.GN ) |
math.GT | Geometric Topology ( math.GT ) |
math.GR | Group Theory ( math.GR ) |
math.HO | History and Overview ( math.HO ) |
math.IT | Information Theory ( math.IT ) |
math.KT | K-Theory and Homology ( math.KT ) |
math.LO | Logic ( math.LO ) |
math.MP | Mathematical Physics ( math.MP ) |
math.MG | Metric Geometry ( math.MG ) |
math.NT | Number Theory ( math.NT ) |
math.NA | Numerical Analysis ( math.NA ) |
math.OA | Operator Algebras( math.OA ) |
math.OC | Optimization and Control ( math.OC ) |
math.PR | Probability ( math.PR ) |
math.QA | Quantum Algebra ( math.QA ) |
math.RT | Representation Theory ( math.RT ) |
math.RA | Rings and Algebras ( math.RA ) |
math.SP | Spectral Theory ( math.SP ) |
math.ST | Statistics ( math.ST ) |
math.SG | Symplectic Geometry ( math.SG ) |
nlin | Nonlinear Sciences ( nlin ) |
nlin.AO | Adaptation and Self-Organizing Systems ( nlin.AO ) |
nlin.CG | Cellular Automata and Lattice Gases ( nlin.CG ) |
nlin.CD | Chaotic Dynamics ( nlin.CD ) |
nlin.SI | Exactly Solvable and Integrable Systems ( nlin.SI ) |
nlin.PS | Pattern Formation and Solitons ( nlin.PS ) |
cs | Computer Science ( cs) |
cs.AR | Architecture ( cs.AR ) |
cs.AI | Artificial Intelligence ( cs.AI ) |
cs.CL | Computation and Language ( cs.CL ) |
cs.CC | Computational Complexity ( cs.CC ) |
cs.CE | Computational Engineering, Finance, and Science ( cs.CE ) |
cs.CG | Computational Geometry ( cs.CG ) |
cs.GT | Computer Science and Game Theory ( cs.GT ) |
cs.CV | Computer Vision and Pattern Recognition ( cs.CV ) |
cs.CY | Computers and Society ( cs.CY ) |
cs.CR | Cryptography and Security ( cs.CR ) |
cs.DS | Data Structures and Algorithms ( cs.DS ) |
cs.DB | Databases ( cs.DB ) |
cs.DL | Digital Libraries ( cs.DL ) |
cs.DM | Discrete Mathematics ( cs.DM ) |
cs.DC | Distributed, Parallel, and Cluster Computing ( cs.DC) |
cs.FL | Formal Languages and Automata Theory ( cs.FL ) |
cs.GL | General Literature ( cs.GL ) |
cs.GR | Graphics ( cs.GR ) |
cs.HC | Human-Computer Interaction ( cs.HC ) |
cs.IR | Information Retrieval ( cs.IR ) |
cs.IT | Information Theory ( cs.IT ) |
cs.LG | Learning ( cs.LG ) |
cs.LO | Logic in Computer Science ( cs.LO ) |
cs.MS | Mathematical Software ( cs.MS ) |
cs.MA | Multiagent Systems ( cs.MA ) |
cs.MM | Multimedia ( cs.MM ) |
cs.NI | Networking and Internet Architecture ( cs.NI ) |
cs.NE | Neural and Evolutionary Computing ( cs.NE ) |
cs.NA | Numerical Analysis ( cs.NA ) |
cs.OS | Operating Systems ( cs.OS ) |
cs.OH | Computer Science ( cs.OH ) |
cs.PF | Performance ( cs.PF) |
cs.PL | Programming Languages ( cs.PL ) |
cs.RO | Robotics ( cs.RO) |
cs.SE | Software Engineering ( cs.SE ) |
cs.SD | Sound ( cs.SD ) |
cs.SC | Symbolic Computation ( cs.SC) |
q-bio | Quantitative Biology ( q-bio ) |
q-bio.BM | Biomolecules ( q-bio.BM ) |
q-bio.CB | Cell Behavior ( q-bio.CB ) |
q-bio.GN | Genomics ( q-bio.GN ) |
q-bio.MN | Molecular Networks ( q-bio.MN ) |
q-bio.NC | Neurons and Cognition ( q-bio.NC ) |
q-bio.OT | Quantitative Biology ( q-bio.OT ) |
q-bio.PE | Populations and Evolution ( q-bio.PE ) |
q-bio.QM | Quantitative Methods ( q-bio.QM ) |
q-bio.SC | Subcellular Processes ( q-bio.SC ) |
q-bio.TO | Tissues and Organs ( q-bio.TO ) |
q-fin | Quantitative Finance ( q-fin ) |
q-fin.CP | Computational Finance ( q-fin.CP ) |
q-fin.GN | General Finance ( q-fin.GN ) |
q-fin.PM | Portfolio Management ( q-fin.PM ) |
q-fin.PR | Pricing of Securities ( q-fin.PR ) |
q-fin.RM | Risk Management ( q-fin.RM ) |
q-fin.ST | Statistical Finance ( q-fin.ST ) |
q-fin.TR | Trading and Market Microstructure ( q-fin.TR ) |
stat | Statistics ( stat ) |
stat.AP | Applications ( stat.AP ) |
stat.CO | Computation ( stat.CO ) |
stat.ML | Machine Learning ( stat.ML ) |
stat.ME | Methodology ( stat.ME ) |
stat.TH | Statistics Theory ( stat.TH ) |
OAI_DC metadata format[edit]
- Pretty simple as we also use dc metadata in publication profile, but is not correct if not parsed
- Affiliations are not available
1. dc:description => dc:abstract (if dc:description does not start with "Comment") 2. dc:date => lists all dates of all versions that exist (earliest date is date when submitted to arXiv, all other are dates when a new revision is done) 3. dc:identifier => partly is identifier, partly is source information (journal reference i.e. from journal-ref in the arXiv metadata format) 4. dcterms:subject => to dcterms:subject (it is full-name of category ids that are delivered from arXiv format) 5. dc:creators => affiliations are missing.
See https://dev.livingreviews.org/projects/epubtk/browser/trunk/ePubTk/lib/arxiv.py for an example of how to use arxiv's oai-pmh interface.
arXivRaw format[edit]
Issues[edit]
- Affiliations: (no possibility for parsing MPI für XXX as organizational units service does not fully support search by organization name)
- as not certain if we would like to have it within the controlled vocab or directly ask for search-organizations methods from core services an issue is not created as extra requirement for core services. Might be internal requirement for controlled vocab service institutions).
- Parsing of journal/source information: to check if it is feasible and if possible to relate it in future with controlled vocab service (journals)
- Genre: Ein weiteres Problem ist, dass bei beiden Formaten das Genre nicht ersichtlich wird, aber hier wird man vermutlich so vorgehen, dass man Article nimmt
- to check for book chapters?