Springer Open Choice Data Transfer

From MPDLMediaWiki
Revision as of 11:34, 16 August 2012 by Erik (Talk | contribs)

Jump to: navigation, search

This page describes various issues regarding the data transfer required in the context of the Springer Open Choice Agreement.

Workflow Overview


  • after acceptance process: corresponding author tags that paper any creator is affiliated with MPGMax-Planck-Gesellschaft
    Note: This information is not checked automatically (e.g. by comparing the author's affiliation with the selection), therefore MPGMax-Planck-Gesellschaft articles could be missing or external articles could be mapped incorrectly!
  • publication gets processed, in particular the metadata record and the pdfPortable Document Format file is generated
  • publication is made available via SpringerLink ("online first")
  • publication is issued, thus metadata record is completed
  • Springer's data delivery service (dds) runs an export job on a regular base and makes the information available to MPGMax-Planck-Gesellschaft ("issued" publications only)

status: proposed revision from March 2009; see talk page for former proposals

  • download data from Springer's FTPFile Transport Protocol server (by Aushilfe/Inga)
  • process data, i.e.
    • [not implemented]: check for plausibility and correctness.
    • [not implemented]: create Nachmeldungen an Springer? (only manually, maybe once a year)
    • compile overview file and manually map publications to institutes (by Aushilfe)
  • upload data to subversion repository (e.g https://devtools.mpdl.mpg.de/repos/mdbase) (by Aushilfe/Inga)
    The repository is checked out daily to a web server which is access restricted to the MPGMax-Planck-Gesellschaft IPInternet Protocol range (see http://devtools.mpdl.mpg.de/mdbase/trunk/springer_openchoice), thus the original data would be available to users in MPG.
  • check current edoc status, i.e. to identify how many Springer OC articles are already represented with an full text on eDocElectronic Documentation (by Inga)
  • notification of MPGMax-Planck-Gesellschaft librarians (by Antje), including
    • overview file grouped by institute
    • [not implemented]: instructions on how to proceed in order to attach full text to edoc record
  • potentially/after defined time frame: manual upload to edoc (by Aushilfe/Nicole)
    • attach full text to existing eDocElectronic Documentation record

Current status of Data Transfer: FTPFile Transport Protocol

On 1st of September, we receive an excel sheet (manually generated by G. Schaefer?) and delivery note (automatically generated by Springer's Data Delivery Service)

1. excel sheet includes following information:

ArticleDOI OpenChoice OrgName ArticleTitle Accepted Corresp. Author Offprint OrgName Author(s)
10.1365/s10337-008-0603-9 Germany - Max-Planck-Gesellschaft/-Institut Establishment of the model CD40 cell membrane chromatography and its chromatographic 20080229 Rong Lin University of xi'an jiaotong Guangde Yang, Rong Lin, Zhen Hu, Jiye Zhang, Chunjie Han, Langchong He, Weirong Wang
10.1007/s10791-008-9048-x Germany - Max-Planck-Gesellschaft/-Institut Output-sensitive autocompletion search 20080123 Ingmar Weber MPIMax-Planck-Institut Informatik - Geb. 46.1 Holger Bast, Christian W. Mortensen, Ingmar Weber
10.1007/s10827-008-0088-4 Germany - Max-Planck-Gesellschaft/-Institut A Neurocomputational Model for Optimal Temporal Processing 20080227 Joachim Haß MPIDS Joachim Haß, Stefan Blaschke, Thomas Rammsayer, J. Michael Herrmann
10.1007/s10876-008-0183-8 Germany - Max-Planck-Gesellschaft/-Institut Two New One-Dimensional Chain-Like Compounds Constructed from the Sandwich-Type Polyoxotungstate Clusters 20080229 Enbo Wang Xiong Gan, Zhiming Zhang, Shuang Yao, Weilin Chen, Enbo Wang, Hong Zhang
10.1007/s10878-008-9139-z Germany - Max-Planck-Gesellschaft/-Institut A Lagrangian relaxation approach for the multiple sequence alignment problem Stefan Canzar MPIMax-Planck-Institut fuer Informatik Ernst Althaus, Stefan Canzar
2. delivery note includes the log of the data upload process executed for MPGMax-Planck-Gesellschaft:
02.09.2008 11:21:06 I dds_send: Handling customer Max_Planck_Gesellschaft_OpenAccess (IDIdentifier 3693)
02.09.2008 11:21:06 I dds_send: Determining new/resend units
02.09.2008 11:26:05 I dds_send: ... 2355 new/resend unit(s)
02.09.2008 11:26:06 I dds_send:   Fetching units:
02.09.2008 11:26:06 I dds_send:   ... units fetched
02.09.2008 11:26:06 D dds_send:   Using cached BLOB for UnitID 31392944
02.09.2008 11:27:19 I dds_zip_struct_conv: Starting new ZIP archive: ftp_PUB_08-09-02_11-27-18.zip
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114/JournalOnlineFirst/ART=2008_406
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114/JournalOnlineFirst
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00122/JournalOnlineFirst/ART=2008_847
02.09.2008 11:29:04 D dds_send: Storing 0 failed units in ResendUnits
02.09.2008 11:29:05 I dds_send: FTPFile Transport Protocol upload to ftp.springer-dds.com started
02.09.2008 11:29:05 I dds_send: Uploading local directory /u01/DDS/blobcache/dds_ftp_todo,customer=3693 to ftpmpg:******@ftp.springer-dds.com/data/in
02.09.2008 11:29:05 D dds_send: Transfering /u01/DDS/blobcache/dds_ftp_todo,customer=3693/ftp_PUB_08-09-02_11-27-18.zip => ./ftp_PUB_08-09-02_11-27-18.zip
02.09.2008 11:29:14 I dds_send: FTPFile Transport Protocol upload to ftp.springer-dds.com completed

3. download via ftp: afterwards, the zip file can be downloaded from ftp.springer-dds.com (login details are stored in internal wiki)

4. zip file provides a complex tree structure which contains one metadata record (A++ headers format) and one full text file (searchable pdfPortable Document Format) for each item. The structure contains information on the corresponding publisher, journal and the publication status. Example "issued":


Example "Online First"


Additional information and Discussion

  • Publication status: "Sie erhalten die Daten entweder im Online FirstStatus (dies wird bei den zükünftigen Zuwachslieferungen der grösste Teil sein) oder im Heft. Die entsprechende Information ist in den A++ Daten erhalten und wird in der Lieferstruktur wiedergegeben".
Do we really need the "online first" articles or should we try to receive "issued" versions only? In the latter case, Springer could deliver complete metadata (incl. issue, pages) and we could skip any updating/merging procedures for Springer imports --Inga 14:41, 11 September 2008 (UTCCoordinated Universal Time)

Data Transfer: FTPFile Transport Protocol Download versus OAI-PMHOpen Archives Initiative Protocol for Metadata Harvesting

Mail an Springer:
Springer bietet für Datenaustauschprozesse normalerweise eine Download-Moglichkeit per ftp an. Da es sich in diesem konkreten Fall um verhältnismässig wenig Daten handelt und wir diese gerne unregelmässig aber wiederholt herunterladen wuerden, würde uns das Angebot eines OAIOpen Archives Initiative Service Provider sehr entgegen kommen. Im Repository-Umfeld hat die MPDLMax Planck Digital Library bereits Erfahrungen mit der Implementierung des OAI-PMHOpen Archives Initiative Protocol for Metadata Harvesting-Protokolls, die wir gerne nutzen wuerden. Herr van der Stelt hat uns in dieser Hinsicht "erste Hoffnungen" gemacht, können Sie diese vielleicht konkretisieren? Erinnerung an task per Email an Frau Schäfer am 9.10.08. Bisher keine Antwort.

Vielleicht ist ein konkretes Nutzungsszenario hilfreich? Im Rahmen des eSciDocEnhanced Scientific Documentation-Projekts wird derzeit z.B. ein DataAcquisition Service entwickelt, s. ESciDoc_Services_DataAcquisitionHandler. Dieser soll folgende Funktionalitäten bieten:

  • Download des Metadaten-Records über OAI-PMHOpen Archives Initiative Protocol for Metadata Harvesting
  • Automatische Erstellung eines neuen Objektes in unserem Repository
  • Generierung eines URLs zum Volltext-Download
  • Herunterladen des Volltextes und Speicherung mit dem Objekt

Als erste Datenquelle ist arXiv angedacht, aber mittelfristig soll daraus ein generischer Service entstehen, der dann Datensätze unterschiedlicher Anbieter verarbeiten kann. Fuer mich hört sich das nach einem "perfect match" an.

Um die Sichtbarkeit des Open Choice-Programms von SpringerLink insgesamt zu stärken, wuerde ich sogar noch einen Schritt weitergehen und ein Springer Open Choice Repository anregen (s. z.B. den PMC Open Archives Service). Auf diesem Repository könnten dann für die am Open Choice Programm beteiligten Institutionen entsprechende "Sets" eingerichtet werden, um auch einen selektiven Download zu ermöglichen.

In autumn 2008 we will start to work on a project on OAIOpen Archives Initiative protocol so that this delivery feature is not available at the moment. Delivery via DDS (Data Delivery System) will be made via ftp uploads to our ftp site ftp.springer-dds.com for you to download the data.


Initial proposal by Springer

In April 2008, Antje provided us with an example of an metadata record as it could be delivered by Springer and asked for further requirements. The record included:

  • article title
  • article sub title
  • article copyright year
  • journal name
  • corresponding author with sub elements: family name, given name, suffix, division, organization, address, email
  • further authors with sub elements: family name, given name, suffix, division, organization, address, email
  • publication dates: received, revised, accepted
  • abstract
  • keywords
  • footnote information

Additional requirements

This initial draft was discussed among the MPDLMax Planck Digital Library team and we requested following data additionally:

  • doi*
  • journal issn*
  • full text url of pdfPortable Document Format*
  • publication date (online)*
  • publication date (in print) - for "Nachlieferungen", as soon as the article is "issued"
  • volume - for "Nachlieferungen", as soon as the article is "issued"
  • issue - for "Nachlieferungen", as soon as the article is "issued"
  • pages - for "Nachlieferungen", as soon as the article is "issued"

In addition, we defined two options for providing the data in a structured way:

  • Option a: Springer provides an proprietary metadata format (and we try to map it to eSciDocEnhanced Scientific Documentation xml)
  • Option b: We send the eSciDocEnhanced Scientific Documentation pubman to Springer and ask them to adapt it


By selecting the corresponding option on the submission mask, the submitting author decides if an articles is published under Springer OC or not. There may be several reasons for not-choosing OC even the publication is entitles, e.g. reluctance, ignorance or oversight. The current Springer OC contract includes an option to report non-OC articles later to correct errors after publication. However, these publication are not equivalent to original OC articles: They are free available, but have an Springer copyright statement "all rights reserved by Springer". In addition, they are not part of the data packages delivered to the MPG.