Springer Open Choice Data Transfer
|
This page describes various issues regarding the data transfer required in the context of the Springer Open Choice Agreement from 2008-2009.
Workflow Overview
Springer
- after acceptance process: corresponding author tags that paper any creator is affiliated with MPGMax-Planck-Gesellschaft
Note: This information is not checked automatically (e.g. by comparing the author's affiliation with the selection), therefore MPGMax-Planck-Gesellschaft articles could be missing or external articles could be mapped incorrectly! - publication gets processed, in particular the metadata record and the pdfPortable Document Format file is generated
- publication is made available via SpringerLink ("online first")
- publication is issued, thus metadata record is completed
- Springer's data delivery service (dds) runs an export job on a regular base and makes the information available to MPGMax-Planck-Gesellschaft ("issued" publications only)
MPGMax-Planck-Gesellschaft
status: proposed revision from March 2009; see talk page for former proposals
- download data from Springer's FTPFile Transport Protocol server (by Aushilfe/Inga)
- process data, i.e.
- [not implemented]: check for plausibility and correctness.
- [not implemented]: create Nachmeldungen an Springer? (only manually, maybe once a year)
- compile overview file and manually map publications to institutes (by Aushilfe)
- upload data to subversion repository (e.g https://devtools.mpdl.mpg.de/repos/mdbase) (by Aushilfe/Inga)
The repository was formerly checked out daily to a web server which is access restricted to the MPGMax-Planck-Gesellschaft IPInternet Protocol range, but this service was stopped in 2017, see https://devtools.mpdl.mpg.de/projects/vlib/ticket/4018 - check current edoc status, i.e. to identify how many Springer OC articles are already represented with an full text on eDocElectronic Documentation (by Inga)
- notification of MPGMax-Planck-Gesellschaft librarians (by Antje), including
- overview file grouped by institute
- [not implemented]: instructions on how to proceed in order to attach full text to edoc record
- potentially/after defined time frame: manual upload to edoc (by Aushilfe/Nicole)
- attach full text to existing eDocElectronic Documentation record
Current status of Data Transfer: FTPFile Transport Protocol
On 1st of September, we receive an excel sheet (manually generated by G. Schaefer?) and delivery note (automatically generated by Springer's Data Delivery Service)
1. excel sheet includes following information:
ArticleDOI | OpenChoice OrgName | ArticleTitle | Accepted | Corresp. Author | Offprint OrgName | Author(s) |
10.1365/s10337-008-0603-9 | Germany - Max-Planck-Gesellschaft/-Institut | Establishment of the model CD40 cell membrane chromatography and its chromatographic | 20080229 | Rong Lin | University of xi'an jiaotong | Guangde Yang, Rong Lin, Zhen Hu, Jiye Zhang, Chunjie Han, Langchong He, Weirong Wang |
10.1007/s10791-008-9048-x | Germany - Max-Planck-Gesellschaft/-Institut | Output-sensitive autocompletion search | 20080123 | Ingmar Weber | MPIMax-Planck-Institut Informatik - Geb. 46.1 | Holger Bast, Christian W. Mortensen, Ingmar Weber |
10.1007/s10827-008-0088-4 | Germany - Max-Planck-Gesellschaft/-Institut | A Neurocomputational Model for Optimal Temporal Processing | 20080227 | Joachim Haß | MPIDS | Joachim Haß, Stefan Blaschke, Thomas Rammsayer, J. Michael Herrmann |
10.1007/s10876-008-0183-8 | Germany - Max-Planck-Gesellschaft/-Institut | Two New One-Dimensional Chain-Like Compounds Constructed from the Sandwich-Type Polyoxotungstate Clusters | 20080229 | Enbo Wang | Xiong Gan, Zhiming Zhang, Shuang Yao, Weilin Chen, Enbo Wang, Hong Zhang | |
10.1007/s10878-008-9139-z | Germany - Max-Planck-Gesellschaft/-Institut | A Lagrangian relaxation approach for the multiple sequence alignment problem | Stefan Canzar | MPIMax-Planck-Institut fuer Informatik | Ernst Althaus, Stefan Canzar |
02.09.2008 11:21:06 I dds_send: Handling customer Max_Planck_Gesellschaft_OpenAccess (IDIdentifier 3693) 02.09.2008 11:21:06 I dds_send: Determining new/resend units 02.09.2008 11:26:05 I dds_send: ... 2355 new/resend unit(s) [...] 02.09.2008 11:26:06 I dds_send: Fetching units: 02.09.2008 11:26:06 I dds_send: ... units fetched 02.09.2008 11:26:06 D dds_send: Using cached BLOB for UnitID 31392944 [...] 02.09.2008 11:27:19 I dds_zip_struct_conv: Starting new ZIP archive: ftp_PUB_08-09-02_11-27-18.zip 02.09.2008 11:27:19 D dds_zip_struct_conv: add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114/JournalOnlineFirst/ART=2008_406 02.09.2008 11:27:19 D dds_zip_struct_conv: add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114/JournalOnlineFirst 02.09.2008 11:27:19 D dds_zip_struct_conv: add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114 02.09.2008 11:27:19 D dds_zip_struct_conv: add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00122/JournalOnlineFirst/ART=2008_847 [...] 02.09.2008 11:29:04 D dds_send: Storing 0 failed units in ResendUnits [...] 02.09.2008 11:29:05 I dds_send: FTPFile Transport Protocol upload to ftp.springer-dds.com started 02.09.2008 11:29:05 I dds_send: Uploading local directory /u01/DDS/blobcache/dds_ftp_todo,customer=3693 to ftpmpg:******@ftp.springer-dds.com/data/in 02.09.2008 11:29:05 D dds_send: Transfering /u01/DDS/blobcache/dds_ftp_todo,customer=3693/ftp_PUB_08-09-02_11-27-18.zip => ./ftp_PUB_08-09-02_11-27-18.zip 02.09.2008 11:29:14 I dds_send: FTPFile Transport Protocol upload to ftp.springer-dds.com completed
3. download via ftp: afterwards, the zip file can be downloaded from ftp.springer-dds.com (login details are stored in internal wiki)
4. zip file provides a complex tree structure which contains one metadata record (A++ headers format) and one full text file (searchable pdfPortable Document Format) for each item. The structure contains information on the corresponding publisher, journal and the publication status. Example "issued":
PUB=Springer-Verlag-Berlin-Heidelberg\JOU=00109\VOL=2008.86\ISU=9\...
Example "Online First"
PUB=Springer-Verlag-Berlin-Heidelberg\JOU=00114\JournalOnlineFirst\...
Additional information and Discussion
- Publication status: "Sie erhalten die Daten entweder im Online FirstStatus (dies wird bei den zükünftigen Zuwachslieferungen der grösste Teil sein) oder im Heft. Die entsprechende Information ist in den A++ Daten erhalten und wird in der Lieferstruktur wiedergegeben".
- Do we really need the "online first" articles or should we try to receive "issued" versions only? In the latter case, Springer could deliver complete metadata (incl. issue, pages) and we could skip any updating/merging procedures for Springer imports --Inga 14:41, 11 September 2008 (UTCCoordinated Universal Time)
- Documentation of the A++ metadata format and the DTDDocument type definition are available on Springer's Bulk User site: http://production-customer.springer.com (login stored in internal wiki)
- One ZIP-File was downloaded and stored on internal web space
Data Transfer: FTPFile Transport Protocol Download versus OAI-PMHOpen Archives Initiative Protocol for Metadata Harvesting
Mail an Springer:
Springer bietet für Datenaustauschprozesse normalerweise eine Download-Moglichkeit per ftp an. Da es sich in diesem konkreten Fall
um verhältnismässig wenig Daten handelt und wir diese gerne unregelmässig aber wiederholt herunterladen wuerden, würde uns das Angebot eines OAIOpen Archives Initiative Service Provider sehr entgegen kommen. Im Repository-Umfeld hat die MPDLMax Planck Digital Library bereits Erfahrungen mit der Implementierung des OAI-PMHOpen Archives Initiative Protocol for Metadata Harvesting-Protokolls, die wir gerne nutzen wuerden. Herr van der Stelt hat uns in dieser Hinsicht "erste Hoffnungen" gemacht, können Sie diese vielleicht konkretisieren?
Erinnerung an task per Email an Frau Schäfer am 9.10.08. Bisher keine Antwort.
Vielleicht ist ein konkretes Nutzungsszenario hilfreich? Im Rahmen des eSciDocEnhanced Scientific Documentation-Projekts wird derzeit z.B. ein DataAcquisition Service entwickelt, s. ESciDoc_Services_DataAcquisitionHandler. Dieser soll folgende Funktionalitäten bieten:
- Download des Metadaten-Records über OAI-PMHOpen Archives Initiative Protocol for Metadata Harvesting
- Automatische Erstellung eines neuen Objektes in unserem Repository
- Generierung eines URLs zum Volltext-Download
- Herunterladen des Volltextes und Speicherung mit dem Objekt
Als erste Datenquelle ist arXiv angedacht, aber mittelfristig soll daraus ein generischer Service entstehen, der dann Datensätze unterschiedlicher Anbieter verarbeiten kann. Fuer mich hört sich das nach einem "perfect match" an.
Um die Sichtbarkeit des Open Choice-Programms von SpringerLink insgesamt zu stärken, wuerde ich sogar noch einen Schritt weitergehen und ein Springer Open Choice Repository anregen (s. z.B. den PMC Open Archives Service). Auf diesem Repository könnten dann für die am Open Choice Programm beteiligten Institutionen entsprechende "Sets" eingerichtet werden, um auch einen selektiven Download zu ermöglichen.
Reply:
In autumn 2008 we will start to work on a project on OAIOpen Archives Initiative protocol so that this delivery feature is not available at the moment. Delivery via DDS (Data Delivery System) will be made via ftp uploads to our ftp site ftp.springer-dds.com for you to download the data.
Metadata
Initial proposal by Springer
In April 2008, Antje provided us with an example of an metadata record as it could be delivered by Springer and asked for further requirements. The record included:
- article title
- article sub title
- article copyright year
- journal name
- corresponding author with sub elements: family name, given name, suffix, division, organization, address, email
- further authors with sub elements: family name, given name, suffix, division, organization, address, email
- publication dates: received, revised, accepted
- abstract
- keywords
- footnote information
Additional requirements
This initial draft was discussed among the MPDLMax Planck Digital Library team and we requested following data additionally:
- doi*
- journal issn*
- full text url of pdfPortable Document Format*
- publication date (online)*
- publication date (in print) - for "Nachlieferungen", as soon as the article is "issued"
- volume - for "Nachlieferungen", as soon as the article is "issued"
- issue - for "Nachlieferungen", as soon as the article is "issued"
- pages - for "Nachlieferungen", as soon as the article is "issued"
In addition, we defined two options for providing the data in a structured way:
- Option a: Springer provides an proprietary metadata format (and we try to map it to eSciDocEnhanced Scientific Documentation xml)
- Option b: We send the eSciDocEnhanced Scientific Documentation pubman to Springer and ask them to adapt it
Nachmeldungen
By selecting the corresponding option on the submission mask, the submitting author decides if an articles is published under Springer OC or not. There may be several reasons for not-choosing OC even the publication is entitles, e.g. reluctance, ignorance or oversight. The current Springer OC contract includes an option to report non-OC articles later to correct errors after publication. However, these publication are not equivalent to original OC articles: They are free available, but have an Springer copyright statement "all rights reserved by Springer". In addition, they are not part of the data packages delivered to the MPG.