ESciDoc Services DataAcquisitionHandler
ID (Label)[edit]
DA
Complete Name[edit]
Data Acquisition Service
Status[edit]
Implemented
Description[edit]
Acquisition Service for data from internal and external sources with an unAPI interface.
JavaDoc:
- TODO
Operations Overview[edit]
Operation | Status | Input | Output | Description |
---|---|---|---|---|
explainSources | implemented | none | String | Scope:Public Gives back a list of all available sources for acquisition and which formats can be fetched from these sources |
doFetch | implemented | sourceName: String identifier: String |
byte[] | Scope:Public This operation fetches data from the specified source. The format of the requested data will be the default format defined in sources.xml |
doFetch | implemented | sourceName: String identifier: String Format: String |
byte[] | Scope:Public This operation fetches data from the specified source and returns it in the requested format. This format can either be the format the external source provides, or a format we can transform from a format the external source provides |
doFetch | implemented | sourceName: String identifier: String Formats: String[] |
byte[] | Scope:Public This operation fetches data from the specified source and returns it in the requested format. The fetched data will return in zip format, currently only file fetching is possible for multiple formats |
Supported Systems[edit]
- EsciDoc
- Arxiv
- PubMed Central
- Spires (in design)
- BioMed Central (in planning)
Service interfaces[edit]
The four steps to fetch data:
1. Choose the presentation of the data dataacquisition/view: Views the fetched data in the browser dataacquisition/download: The fetched data will be provided as a download
2. Call the unAPI service interface dataacquisition/view/unapi dataacquisition/download/unapi
3. Provide the identifier of the item you want to fetch dataacquisition/view/unapi?id=escidoc:1234 dataacquisition/download/unapi?id=escidoc:1234
4. Provide the format you want the fetched item in dataacquisition/view/unapi?id=escidoc:1234&format=bibtex dataacquisition/download/unapi?id=escidoc:1234&format=bibtex
Supported Identifiers:
1. A identifier from a supported source (explained in /dataacquisition).
2. A identifier = any URL (the eSciDoc DataAcquisition Service has no information about this source and can only try to call the given URL for the fetching request).
- The format has to be set to "url". The response will be a zip file of the fetched content. The view option for url identifiers is disabled
Services supporting OAI-PMH[edit]
Full list can be found at the openarchives.org site. Following some services which might be interesting:
- arXiv
- PubMed Central
- BioMed Central
- CERN Document Server (http://cdsweb.cern.ch/oai2d?verb=Identify)
The University of Illinois provides a OAI-PMH Data Provider Registry with a search interface (SRU interface)
Future Development[edit]
- priorize fetching formats for import (client or serverside?). E.g. fetch pdf if not possible fetch doc.
- Prevent that the Import Manager to be a security leak for the sources he fetches from
- Inform arXiv about unAPI interface
- extend sources.xml with the sources disclaimer and copyright infos and add this info to the unapi source description
- extend sources.xml with identifierPrefix and identifierExample and add this info to the unapi source description
- also add source url for source to unapi source desc
- Support multiple Identifiers for one source
- Fetch from Spires
- Fetch from BioMedCentral