Springer Open Choice Data Transfer

From MPDLMediaWiki
Revision as of 00:19, 8 March 2017 by Inga (talk | contribs) (mdbase web checkout no longer available)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


PAGE NO LONGER UPDATED

This page describes various issues regarding the data transfer required in the context of the Springer Open Choice Agreement from 2008-2009.

Workflow Overview[edit]

Springer

  • after acceptance process: corresponding author tags that paper any creator is affiliated with MPG
    Note: This information is not checked automatically (e.g. by comparing the author's affiliation with the selection), therefore MPG articles could be missing or external articles could be mapped incorrectly!
  • publication gets processed, in particular the metadata record and the pdf file is generated
  • publication is made available via SpringerLink ("online first")
  • publication is issued, thus metadata record is completed
  • Springer's data delivery service (dds) runs an export job on a regular base and makes the information available to MPG ("issued" publications only)

MPG
status: proposed revision from March 2009; see talk page for former proposals

  • download data from Springer's FTP server (by Aushilfe/Inga)
  • process data, i.e.
    • [not implemented]: check for plausibility and correctness.
    • [not implemented]: create Nachmeldungen an Springer? (only manually, maybe once a year)
    • compile overview file and manually map publications to institutes (by Aushilfe)
  • upload data to subversion repository (e.g https://devtools.mpdl.mpg.de/repos/mdbase) (by Aushilfe/Inga)
    The repository was formerly checked out daily to a web server which is access restricted to the MPG IP range, but this service was stopped in 2017, see https://devtools.mpdl.mpg.de/projects/vlib/ticket/4018
  • check current edoc status, i.e. to identify how many Springer OC articles are already represented with an full text on eDoc (by Inga)
  • notification of MPG librarians (by Antje), including
    • overview file grouped by institute
    • [not implemented]: instructions on how to proceed in order to attach full text to edoc record
  • potentially/after defined time frame: manual upload to edoc (by Aushilfe/Nicole)
    • attach full text to existing eDoc record

Current status of Data Transfer: FTP[edit]

On 1st of September, we receive an excel sheet (manually generated by G. Schaefer?) and delivery note (automatically generated by Springer's Data Delivery Service)

1. excel sheet includes following information:

ArticleDOI OpenChoice OrgName ArticleTitle Accepted Corresp. Author Offprint OrgName Author(s)
10.1365/s10337-008-0603-9 Germany - Max-Planck-Gesellschaft/-Institut Establishment of the model CD40 cell membrane chromatography and its chromatographic 20080229 Rong Lin University of xi'an jiaotong Guangde Yang, Rong Lin, Zhen Hu, Jiye Zhang, Chunjie Han, Langchong He, Weirong Wang
10.1007/s10791-008-9048-x Germany - Max-Planck-Gesellschaft/-Institut Output-sensitive autocompletion search 20080123 Ingmar Weber MPI Informatik - Geb. 46.1 Holger Bast, Christian W. Mortensen, Ingmar Weber
10.1007/s10827-008-0088-4 Germany - Max-Planck-Gesellschaft/-Institut A Neurocomputational Model for Optimal Temporal Processing 20080227 Joachim Haß MPIDS Joachim Haß, Stefan Blaschke, Thomas Rammsayer, J. Michael Herrmann
10.1007/s10876-008-0183-8 Germany - Max-Planck-Gesellschaft/-Institut Two New One-Dimensional Chain-Like Compounds Constructed from the Sandwich-Type Polyoxotungstate Clusters 20080229 Enbo Wang Xiong Gan, Zhiming Zhang, Shuang Yao, Weilin Chen, Enbo Wang, Hong Zhang
10.1007/s10878-008-9139-z Germany - Max-Planck-Gesellschaft/-Institut A Lagrangian relaxation approach for the multiple sequence alignment problem Stefan Canzar MPI fuer Informatik Ernst Althaus, Stefan Canzar


2. delivery note includes the log of the data upload process executed for MPG:

02.09.2008 11:21:06 I dds_send: Handling customer Max_Planck_Gesellschaft_OpenAccess (ID 3693)
02.09.2008 11:21:06 I dds_send: Determining new/resend units
02.09.2008 11:26:05 I dds_send: ... 2355 new/resend unit(s)
[...]
02.09.2008 11:26:06 I dds_send:   Fetching units:
02.09.2008 11:26:06 I dds_send:   ... units fetched
02.09.2008 11:26:06 D dds_send:   Using cached BLOB for UnitID 31392944
[...]
02.09.2008 11:27:19 I dds_zip_struct_conv: Starting new ZIP archive: ftp_PUB_08-09-02_11-27-18.zip
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114/JournalOnlineFirst/ART=2008_406
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114/JournalOnlineFirst
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00114
02.09.2008 11:27:19 D dds_zip_struct_conv:   add to ZIP: ./PUB=Springer-Verlag-Berlin-Heidelberg/JOU=00122/JournalOnlineFirst/ART=2008_847
[...]
02.09.2008 11:29:04 D dds_send: Storing 0 failed units in ResendUnits
[...]
02.09.2008 11:29:05 I dds_send: FTP upload to ftp.springer-dds.com started
02.09.2008 11:29:05 I dds_send: Uploading local directory /u01/DDS/blobcache/dds_ftp_todo,customer=3693 to ftpmpg:******@ftp.springer-dds.com/data/in
02.09.2008 11:29:05 D dds_send: Transfering /u01/DDS/blobcache/dds_ftp_todo,customer=3693/ftp_PUB_08-09-02_11-27-18.zip => ./ftp_PUB_08-09-02_11-27-18.zip
02.09.2008 11:29:14 I dds_send: FTP upload to ftp.springer-dds.com completed

3. download via ftp: afterwards, the zip file can be downloaded from ftp.springer-dds.com (login details are stored in internal wiki)

4. zip file provides a complex tree structure which contains one metadata record (A++ headers format) and one full text file (searchable pdf) for each item. The structure contains information on the corresponding publisher, journal and the publication status. Example "issued":

PUB=Springer-Verlag-Berlin-Heidelberg\JOU=00109\VOL=2008.86\ISU=9\...

Example "Online First"

PUB=Springer-Verlag-Berlin-Heidelberg\JOU=00114\JournalOnlineFirst\...

Additional information and Discussion[edit]

  • Publication status: "Sie erhalten die Daten entweder im Online FirstStatus (dies wird bei den zükünftigen Zuwachslieferungen der grösste Teil sein) oder im Heft. Die entsprechende Information ist in den A++ Daten erhalten und wird in der Lieferstruktur wiedergegeben".
Do we really need the "online first" articles or should we try to receive "issued" versions only? In the latter case, Springer could deliver complete metadata (incl. issue, pages) and we could skip any updating/merging procedures for Springer imports --Inga 14:41, 11 September 2008 (UTC)
  • Documentation of the A++ metadata format and the DTD are available on Springer's Bulk User site: http://production-customer.springer.com (login stored in internal wiki)
  • One ZIP-File was downloaded and stored on internal web space

Data Transfer: FTP Download versus OAI-PMH[edit]

Mail an Springer:
Springer bietet für Datenaustauschprozesse normalerweise eine Download-Moglichkeit per ftp an. Da es sich in diesem konkreten Fall um verhältnismässig wenig Daten handelt und wir diese gerne unregelmässig aber wiederholt herunterladen wuerden, würde uns das Angebot eines OAI Service Provider sehr entgegen kommen. Im Repository-Umfeld hat die MPDL bereits Erfahrungen mit der Implementierung des OAI-PMH-Protokolls, die wir gerne nutzen wuerden. Herr van der Stelt hat uns in dieser Hinsicht "erste Hoffnungen" gemacht, können Sie diese vielleicht konkretisieren? Erinnerung an task per Email an Frau Schäfer am 9.10.08. Bisher keine Antwort.

Vielleicht ist ein konkretes Nutzungsszenario hilfreich? Im Rahmen des eSciDoc-Projekts wird derzeit z.B. ein DataAcquisition Service entwickelt, s. ESciDoc_Services_DataAcquisitionHandler. Dieser soll folgende Funktionalitäten bieten:

  • Download des Metadaten-Records über OAI-PMH
  • Automatische Erstellung eines neuen Objektes in unserem Repository
  • Generierung eines URLs zum Volltext-Download
  • Herunterladen des Volltextes und Speicherung mit dem Objekt

Als erste Datenquelle ist arXiv angedacht, aber mittelfristig soll daraus ein generischer Service entstehen, der dann Datensätze unterschiedlicher Anbieter verarbeiten kann. Fuer mich hört sich das nach einem "perfect match" an.

Um die Sichtbarkeit des Open Choice-Programms von SpringerLink insgesamt zu stärken, wuerde ich sogar noch einen Schritt weitergehen und ein Springer Open Choice Repository anregen (s. z.B. den PMC Open Archives Service). Auf diesem Repository könnten dann für die am Open Choice Programm beteiligten Institutionen entsprechende "Sets" eingerichtet werden, um auch einen selektiven Download zu ermöglichen.

Reply:
In autumn 2008 we will start to work on a project on OAI protocol so that this delivery feature is not available at the moment. Delivery via DDS (Data Delivery System) will be made via ftp uploads to our ftp site ftp.springer-dds.com for you to download the data.

Metadata[edit]

Initial proposal by Springer[edit]

In April 2008, Antje provided us with an example of an metadata record as it could be delivered by Springer and asked for further requirements. The record included:

  • article title
  • article sub title
  • article copyright year
  • journal name
  • corresponding author with sub elements: family name, given name, suffix, division, organization, address, email
  • further authors with sub elements: family name, given name, suffix, division, organization, address, email
  • publication dates: received, revised, accepted
  • abstract
  • keywords
  • footnote information

Additional requirements[edit]

This initial draft was discussed among the MPDL team and we requested following data additionally:

  • doi*
  • journal issn*
  • full text url of pdf*
  • publication date (online)*
  • publication date (in print) - for "Nachlieferungen", as soon as the article is "issued"
  • volume - for "Nachlieferungen", as soon as the article is "issued"
  • issue - for "Nachlieferungen", as soon as the article is "issued"
  • pages - for "Nachlieferungen", as soon as the article is "issued"

In addition, we defined two options for providing the data in a structured way:

  • Option a: Springer provides an proprietary metadata format (and we try to map it to eSciDoc xml)
  • Option b: We send the eSciDoc pubman to Springer and ask them to adapt it


Nachmeldungen[edit]

By selecting the corresponding option on the submission mask, the submitting author decides if an articles is published under Springer OC or not. There may be several reasons for not-choosing OC even the publication is entitles, e.g. reluctance, ignorance or oversight. The current Springer OC contract includes an option to report non-OC articles later to correct errors after publication. However, these publication are not equivalent to original OC articles: They are free available, but have an Springer copyright statement "all rights reserved by Springer". In addition, they are not part of the data packages delivered to the MPG.