Trip Report: ElPub 2013

MPDL Event: ELPub 2013 in Karlskrona, Schweden Participants MPDL Andrea Wuchner
 * Program
 * Conference Page
 * Proceedings

=Zusammenfassungen=

Keynote 1: S. Shakespeare - Getting value out of our digital trace: a strategy for unleashing the economic and social potential of data sharing
Stephan Shakespear führte im Auftrag der britischen Regierung eine unabhängige Untersuchung über Daten, die im öffentlichen Sektor generiert werden, durch. Die britischen Regierung hat das Ziel diese Daten der Öffentlichkeit zur Verfügung zu stellen, um die größtmögliche Wertschöpfung zu erzielen. Dazu arbeitet sie an einer nationalen Strategie für Daten.

Als Grundlage für die Untersuchung dienten eine Marktstudie, die von Deloitte durchgeführt wurde, sowie zwei Umfragen, die über das Portal "YouGov" durchgeführt wurden. Darin wird die mögliche Wertschöpfung auf rund 8 Milliarden Euro pro Jahr geschätzt.

Um die bestmögliche Wertschöpfung zu erreichen schlägt der Autor verschiedene Maßnahmen vor:

Es soll eine klare, transparente und nachhaltige nationale Datenstrategie entwickelt und in die öffentlichen Institutionen implementiert werden. Diese soll dafür sorgen, dass Daten kurzfristig möglichst schnell und mittelfristig in hoher Qualität veröffentlicht werden. Die Privatsphäre der Bürger muss durch eine entsprechende Policy geschützt werden. Neue Fähigkeiten und Kompetenzen im Umgang mit den öffentlichen Daten müssen erworben werden. Die Nachnutzungsmöglichkeiten der Daten muss sowohl für den öffentlichen Sektor, die Wirtschaft und die Bürger auf eine Art gegeben sein, die jeder Gruppe den größtmöglichen Nutzen bietet. Auch im politischen Prozess sollten die Daten öffentlicher Institutionen als Grundlage für Entscheidungen stärker herangezogen werden.

Im Juni 2013 werden diese Empfehlungen vom britischen Premierminister beantwortet.

Podiumsdiskussion
Im Rahmen der Podiumsdiskussion hielten die folgenden Teilnehmer kurze Vorträge:


 * Stephan Shakespeare: stellte die Plattform YouGov vor. Hier kann man sich registrieren und an verschiedenen Umfragen teilnehmen. Die Daten werden für kommerzielle Marktforschung verwendet.
 * David Rosenthal: stellte die Problematik hoher und langfristig entstehender Kosten für den Ingest, die Speicherung und die Verbreitung von steigenden Datenmengen vor.
 * Hans Jurgen Marker: stellte den Schwedischen Nationalen Daten Service vor. Dieser gewährt Zugang zu Forschungsdaten aus den Humanities, Medizin und Sozialwissenschaften. Verschiedene Bibliotheken, Archive, Universitäten, Förderinstitutionen wirken mit, die Verantwortlichkeiten müssen noch abgesteckt werden. Der SND wirkt an der Europeana und an DataCite mit.
 * Felix Wu: Daten, die in Twitter oder Facebook generiert wurden, sind als Forschungsgegenstand interessant. Sie bilden weltweite soziale Beziehungen, Meinungen und Diskussionen ab, die wiederium Einfluss auf unsere Gesellschaft haben.

Leider war nach diesen kurzen Vorträgen die Zeit schon um, so dass es nicht zu einer Diskussion kam.

P. Olsbo - Does openness and OpenAccess policy relate to the success of universities?
Dieser Beitrag geht der Frage nach, ob es einen Zusammenhang zwischen frei zugänglichen Forschungsergebnissen, Open Access Policy und dem Erfolg von Universitäten gibt. Dazu wurden das "Ranking Web of Universities" und das "Ranking Web of Repositories" untersucht. Das Ranking der Universitäten wird durch einen Indikator ermittelt, der sich aus verschiedenen Faktoren, darunter den Betrieb von Repositorien und die Anzahl von Webseiten, von externen Links auf die Universtitätswebseiten und die Anzahl von veröffentlichten Volltexten zusammensetzt. Unter den 20 Top-Unis haben Harvard, das MIT, die University of Michigan und die Uni Sao Paolo ebenfalls Top-Repositories. Ein Vergleich der Länder Schweiz, Niederlanden, Dänemark, Schweden, Norwegen, Irland, Finnland und Österreich zeigt, dass die Schweiz, die Niederlanden und Dänemark am erfolgreichsten sind. In Norwegen und Schweden gibt es eine Verbindung zwischen dem großen Erfolg der Unis und dem hohen Ranking im Bereich "Openness" und der Platzierung im RWR. Die Platzierung von Universitäten in Finnland und Dänemark hat sich nach der Verabschiedung von Open Access Policies verbessert.

Leider konnte nicht geklärt werden, wie genau Platzierung der Uni und Platzierung des Repositories in RWU/RWR zusammenhängen. Legen erfolgreiche Unis mehr wert auf ein hochwertiges Repository, stehen evtl. mehr Mittel dazu zu verfügung? Oder trägt ein hochwertiges Repository zur höheren Platzierung einer Uni bei? Die Zusammensetzung des Indikators lässt darauf schließen, ebenso der steigende Erfolg von Unis in Ländern mit Open Access Policy.

N. Jahn - PlosOpenR: Exploring FP7 funded Plos publications
Durch neue Technologien und Dienste ergeben sich neue Möglichkeiten, um die Performance von wissenschaftlichen Publikationen zu messen. In dieser Studie werden verschiedene statistische Messgrößen (PLOS Article Level Metrics (PLOS ALM)) FP7-geförderter Publikationen mit Hilfe des Software PlosOpenR untersucht. Dazu gehören Aufrufe der Publikationen, Anzahl der Zitierungen (PubMed Central, CrossRef, Scopus) und Aktivitäten in Social Media Diensten wie Twitter, Facebook, Mendeley, CiteULike.

Die Untersuchung erfolgte in zwei Schritten: mit Hilfe der Such-API von PlosOpenR wurde PLOS nach FP7-geförderten Artikel durchsucht, mit der PLOS ALM API wurden die Nutzungsdaten abgefragt und visualisiert.

Es wurden 1166 Publikationen gefunden, die durch 624 unterschiedliche FP7-Projekte gefördert wurden. Etwa 10% der Artikel wurden durch mehr als ein Projekt gefördert. Die meisten Publikationen stammten aus USA, GB oder Deutschland. Die Mehrzahl der Seitenaufrufe und Social Web Aktivitäten finden in den Tagen und Monaten nach der Publikation statt, Zitierungen steigern sich hingegen nur langsam. Deshalb sollten die Publikationen zu einem späteren Zeitpunkt noch einmal ausgewertet werden. Zusammenfassend kann festgestellt werden, dass plosOpenR die Auswertung von FP7-geförderten Publikationen erlaubt. Zukünftige Studien über PLOS Article Level Metrics werden sich mit den Hauptproblemen befassen. Die Resultate betonen die Bedeutung offen zugänglicher Forschungsdienste.

E. Tonkin/S. Taylor - Coversheets considered harmful?
Unter "Coversheets" werden Seiten verstanden, die zusätzlich in Volltexte (meist PDFs) integriert werden und einige zusammenfassende Metadaten enthalten. Sie können zusammenfassende Informationen zur Publikation, Copyright Informationen und "Wasserzeichen" eines Repositories enthalten. Der Beitrag diskutiert Vor- und Nachteile der Verwendung. Es wurde eine Umfrage über britische Mailinglisten durchgeführt, die von 88 Repository Managern beantwortet wurde. In GB werden häufig Coversheets genutzt, meistens werden sie automatisch erstellt. Das wichtigste Argument ist dabei die Erhaltung von Information. Die Autorinnen empfehlen, Alternativen für die Erhaltung der Informationen in Erwägung zu ziehen, z.B. Metadatenfelder innerhalb der Volltext-Formate und den Erstellungsprozess von Coversheets zu überprüfen. Der originale Volltext, ohne Coversheet, sollte gespeichert werden.

Nuno Freire - Facilitating Access and Reuse of Research Materials: the Case of the European Library
In diesem Vortrag wurde die "European Library" vorgestellt. Sie ist der Aggregator für digitale Inhalte für die Europeana. Das Portal der EL bietet Zugang zu den Nationalbibliografien, sowie zu wissenschaftlichen Sammlungen der teilnehmenden Länder. Ein weiteres wichtiges Werkzeug sind die Such-APIs sowie Daten, die als Linked Open Data vorliegen. Für die Institutionen, die die Daten liefern bedeutet dies erhöhte Sichtbarkeit und Zugriff, die EL ist ein Data Hub und eine zentrale Instanz was Bibliotheksdaten angeht. Die EL stellt einen zentralen Einstiegspunkt für ARROW dar, ein Tool um Rechteinformationen zu Digitalisaten zu erhalten.

In der Zukunft soll eine Forschungsplattform in Zusammenarbeit mit DARIAH und CESSDA entwickelt werden.

M. Malta & A. Baptista: A method for the development of Dublin Core Application Profiles
In diesem Beitrag wird die Methode "Me4DCAP" vorgestellt, die einen standardisierten Prozess beschreibt, um ein Dublin Core Application Profile zu erstellen. Die bisherigen Anleitungen in den Dokumenten der DCMI sind nicht umfassend genug.

Methodisch basiert die Methode auf dem "Design Science Approach" nach A. Hevner.

Das Team zur Erstellung des Application Profils sollte aus Managern, Systemanalysten, Metadatenprogrammierer und Endnutzern bestehen.Die Methode legt fest, wann welche Aktivitäten durchgeführt werden müssen, wie diese zusammenspielen und welche Artefakte dabei entstehen sollen. Daneben werden auch noch Techniken vorgeschlagen, die für die Entwicklung der Artefakte zum Einsatz kommen können. Insgesamt müssen 12 Artefakte entwickelt werden. Dies geschieht in einem iterativen Prozess, bei dem verschiedene Stadien mehrmals durchlaufen werden. Eine ausführliche Beschreibung der Methode ist hier zu finden.

Derzeit wird die Methode einer ausführlichen Review unterzogen, um sie zukünftig den Bedürfnisse der Metadaten-Community noch besser anzupassen. Des weiteren soll sie in weiteren Projekten zum Einsatz kommen. Die Ergebnisse werden in die nächste Version von Me4DCAP einfließen.

H. Roued-Cunliffe - Opening up Digital Publications - suggestions for making Humanities data available for research
Die Autorin untersuchte im Rahmen ihrer Masterthesis die Möglichkeiten, ein geschichtliches Webportal aufzubauen, dass die übergreifende Suche von geschichtlichen Datenbeständen in Europa ermöglicht. Dabei hat sie festgestellt, dass der Zugriff für die Allgemeinheit und für externe Wissenschaflter bei den meisten bestehenden Portalen nicht möglich ist. Sie empfiehlt die Kategorisierung der Nutzer in drei Gruppen: Allgemeine Öffentlichkeit, externe Wissenschaftler und interne Wissenschaflter. Die allgemeine Öffentlichkeit soll Suchabfragen durchführen können, deren Output in verschiedenen Formaten, darunter CSV geliefert wird. Für externe Wissenschaflter soll es die Möglichkeit geben, sich zu registrieren und über dynamische Webdienste zuzugreifen und nach Möglichkeit neue Suchkriterien zu definieren. Ein Teil der Thesis war die Entwicklung des REST Services "APELLO". Dieser ermöglicht dynamische Abfragen von Datenbeständen im TEI-Format. Dieser REST Service wird bislang für die Webseite von Vindolanda Tablets Online 2 eingesetzt. Er kann nachgenutzt werden. Für größere Bestände muss die Frage der Sicherheit und der Serverkapazitäten bedacht werden.

=Interesting people I talked to=


 * Aina Svensson, Uppsala University Library: Promoted COAR, collaborates at DIVA Consortium
 * Lina Andren, Mäladarlen University
 * Ina Nordenström, Umea universitatsbibliothek medicinska biblioteket, collaborates at DIVA Consortium
 * Jörgen Erikkson, Lund University Library, Head of the Department of Scientific Communication, collaborates at PubLister

=DAY 1=

Keynote 1: Stephan Shakespear, UK Government's Data Strategy Board - Getting value out of our digital trace: a strategy for unleashing the economic and social potential of data sharing
Stephan Shakespeare published the Shakespeare's Review He reviewed the government's data strategy and spoke out several recommendations.

Introduction

 * Digital trace: any data being produced and stored digitally, things are unwillingly published and stored forever
 * Entering a new strange world:
 * Data sharing brings more beneftis than risks
 * There is much potential in our digital trace
 * The way we think, behave and decide will change
 * Data makes our lives longer, healthier and much more productve
 * Governments has to most data
 * Role of government in the digital revolution: new world begins in the hands of the government
 * Difference between commitment and real strategy
 * PSI: public sector information
 * Data strategy: highly visible, policy development, predictable
 * Data has to be made usefull, publishing is not enough
 * Data people publish has to be made usful as possible and not being abused

The review

 * Target audience: British Government
 * Scope
 * Current and future PSI market
 * Potential benefits of using and reusing PSI
 * Evaluates current and anticipated future needs of government to establish a data strategy

Evidence of the review

 * Review based on data of a market assessment report on PSI by Deloitte and two surveys done by YouGov
 * Results
 * Value of PSI: around £6.8bn a year, (£1.8 bn direct economic benefits and a wider social value of £5bn a year)
 * People's opinion of PSI: public supports opening up certain data, if anonymisation is granted and misuse is under penalty. To make the most of PSI investment in skills, partnership with business and focus on quality is required. Support for the arrangements where companies share more of their data with public.
 * Key challenges: accessibility

Recommendations

 * Government should define a clear, predictable, accountable "National Data Strategy"
 * National Data strategy should include a twin-track policy for data release: publish early even if imperfect AND a commitment to a "high quality core"
 * Clear leadership for driving the implementation of the National Data Strategy within the public sector.
 * The trading fund model for organisations like Met Office and other public data providers should be adapted in a way that maximum value can delivered from their data.
 * Developing of pragmatic policy on privacy and confidentiality that increases protection for citizens
 * Investment in building skills to handle PSI
 * Look at new ways to gather evidence of economic und social value of PSI, creating a "data intelligence and innovation group"
 * PSI should be used systematically and transparently within policital process as evidence data for decisions.
 * Develop a model of mixed economy, so that everyone can benefit from some forms of sharing between the public and commercial sectors.

Further activities

 * Tomorrow Prime Minister Cameron will answer the recommendations

Panel discussion: Setting research data free - problems and solutions
Partipicants '''
 * Stephan Shakespeare: UK government's Data Strategy Board
 * Felix Wu: Professor Computer Science department at UC Davis, USA
 * David Rosenthal: LOCKSS program Stanford University
 * Hans Jorgen Marker: Director Swedish National Data Service

David Rosenthal
 * Problem: Sustainbale Economics:
 * The Big Data Fetish: Save it all!
 * Value of data not kept = 0
 * Value of data kept < 0, but small
 * Economics of Preservation:
 * Kryder's Law
 * Data grows 60% a year (IDC), IT Budget grows 2% a year (computerecnomics.com, if storage now costs 5% of budget, in 10 years it will be more than 100%)
 * Ingest has to be paid for
 * Dissemination must be paid for
 * Storage must be paid for
 * On-going costs, can't be paid from grant
 * Selective data sharing is a very bad idea, so either all data must be shared or none

 Hans Jurgen Marker Keeping research data accesible
 * Swedish National Data Service: preserve and grant access to research data from humanities, social science and medicine, University of Göteborg pays, Council decides
 * Services
 * Provides an overview of data available
 * Provides access
 * Provides an opportunity for international comparison
 * Provides longterm preservation
 * National cooperation: Universities, Archives, Libraries, Research-funders, Data Producers, reponsibility between these actors is still under discussion
 * Involved in DataCite, europeana, etc.
 * Place for Librarians: responsible to grant connection between the research data and the public

Felix Wu Sharing Open Data
 * Social Informatics
 * Data of Twitter and Facebook are interesting for research
 * Data
 * Content
 * Social Relationships around the content
 * Discussion Interactions over the content
 * Social Intelligence being derived: and its potential impact to our society
 * Open Social Data: Contents attract social interests and possibliy change social structures, comments being produced become part of the content (for al limited time-window)
 * Do we know who read what?
 * Social Informatics API between research community and data produce community

Stephan Shakespeare YouGov
 * New business model: use shared opinions for survey
 * New platform: 2 mio. members, can express themselves in a structred ways

Introduction
The examination of the report "The state of scientific research in Finland 2012" by the Finnish Academy and the ranking of Universities seem to show that there could be a connection between the internet visibility, ranking and the relative citation impact of universities in different countries.

Ranking Web of Universities (RWU)

 * Focus on academic web presence of the universities
 * RWU has analized over 21000 universities, the current ranking covers 12 000 universities.
 * Based on composite indicator, which is built on four indicators:
 * Presence (16,7%): defined as the total number of webpages hostet in the main web domain of the university (as indexed by Google)
 * Impact (50%): evaluated through a "virtual referndum" counting all external links that the University web domain receives from third parties
 * Openness (16,7%): the effort so set up institutional repositories is explicitly recognized and it takes into account the number of rich files published in dedicated websites(according to Google scholar)
 * Excellence (16,7%): analyzes academic papers published in high impact international journals. Only those publications are taken into account that are part of the 10% most cited papers in their respective scientific fields.

Ranking of the University of Jyväskylä

 * January 2013
 * Presence rank 107 (July 2012: 117)
 * Impact rank 609 (411)
 * Openness rank 307 (562)
 * Excellence rank 550 (549)
 * Total rank 357 (299)
 * Interpretation: the highest rank is the presence rank. The university must have done something right in their domain policy.(One domain name policy)
 * Role of the institutional Repository JyX: A site restricted search in Google shows, that over 13% of all search results which indicate to the site jyu.fi have their origin in JyX archive. In Google scholar the effect is more significant: over 87% of the search results come from JyX Arxive. The repository seems to have an important role in University's presence rank. Also the rank in openness seems to develop hand in hand with the respository. Altough the number of evaluated universities has risen up, the Jyväskylä University rank in openness has also risen from 562 to 307. At the same the the rank of JyX archive has developed from 87 to 80.

Ranking Web of Repositories (RWR)

 * January 2013
 * Done by the Spanish research institution "Consejo Superior de Investigaciones Cientificias (CSIC)
 * 1600 repositories all over the world are ranked
 * Latest edition in August 2013

Connection between RWU and RWR

 * Number one ranked universiy is Harvard. Harvard also hosts the number one institutional repository Smithsonian/NASA Astrophysics Data System
 * From other top 5 universities also MIT and University of Michigan have highly respected respositories
 * Universiy of Sao Paulo is the only one outsiede United States/GB that is ranked within Top 20 (19). The repository is ranked at place 8.

Report "The state of scientific research in Finland 2012"

 * Published in October 2012 by the Finnish Academy
 * Analyzing the relative citation impact of Finnish research articles: Publication numbers in Finnish science and research and citations to Finnish articles are at good level.

Comparison of 8 Countries

 * Countries: Switzerland, Netherlands, Denmark, Sweden, Norway, Ireland, Finland and Austria
 * Results:
 * Indices for these countries have remained more or less unchanged: changes at country level happen very slowly.
 * There are good values of Switzerland, Netherlands and Denmark.
 * In Netherlands and Sweden there is a deep connection between the success of universities, ranking in openness and repositories.
 * Finland, Denmark and Norway have improved their placing in openness substantially compared to Switzerland and even the whole RWU. These same countries seem to be in their way up also if we look at the development of relative citation impact in recent years.
 * The position of Switzerland and Austria have weakend in both openness and relative citation impact.
 * Interpretation: One explaining factor could be the Open Access policy of these countries and universities.
 * Impact of Open Access policy:
 * In Finland the dramatic improvement in openness ranking of the University of Helsinki (71 to 5) is at least partly due to their mandate for self-archiving and their development work in repositories.
 * Universities in Jyväskylä and Tampere have strong recommendations for self-archiving and are doing active work in promoting open access.
 * The Danish Open Access Committee published its recommendations for national Open Access policy in 2011. Now the number of top 300 universities in opennness has risen from 2 to 4 in six months. Is it due to proper Open Access policy of the country, coincidence or unreliability of the methodology fo RWU is hard to say yet.

Introduction and Motivation

 * New Opportunities for measuring performance of research publications arise
 * Traditional citation analysis is complemented by usage and social media data
 * Public Library of Science (PLOS)
 * Open Access Publisher
 * Displays several indicators including citations, information on usage and social media activity to every article
 * In this study the potential of PLOS ALM (PLOS Article-Level Metrics) is examined by appling them on FP7 grant-supported research publications. Therefore a set of tools for the statistical computing environment R is used - plosOpenR.

Background and Data

 * PLOS offers two open available APIs that PlosOpenR uses:
 * PLOS Search API: offers access to the fulltexts of all PLOS article published. Search fields correspond to article sections.
 * PLOS ALM API: used to retrieve metrics on PLOS articles.
 * Used Search fields for retrieval: id (DOI), financial_disclosure (Funding acknolwedgement), affiliate (Affiliation of the authors, free text).
 * Analyzied PLOS ALM providers: PLOS, PubMed Central, Citations: PubMed Central, CrossRef, Scopus, Social media events: Twitter, Facebook, Mendeley, CiteULike, PLOS Comments
 * Exploring the research publications in three steps:
 * Retrieve a set of articles through the Search API
 * Collect the metrics for these articles
 * Visualize the metrics
 * PlosOpenR reuses existing tools to query and analyse PLOS research output that belong to the rplos package.
 * rplos is developed by rOpenSci, a collaborative effort to provide R-based applications for facilitating Open Science.
 * After querying the PLOS APIs, PlosOpenR transforms the retrieved JSON and XML outputs into data frame structures to allow easiert statistical analysis within R.
 * PlosOpenR demosntrates different visualisation techniques, documented on the PLOS API webpage.
 * Used Visualisiation techniques:
 * Alternative scatterplots to explore ALM distributions
 * Network visualisations to examine collaboration pattern
 * Choropleth maps displaying author's country affiliation (Thematic Mapping API was used)

Results
FP7 Contribution in the PLOS Domain
 * Query was done on 19 July 2012
 * 2562 candidate publications
 * 1166 PLOS articles referenced at least one FP7 research project
 * Moderate growth of FP7 acknowledgement in most PLOS journals since 2008, strong growth in PLOS ONE.
 * PLOS ONE contained 77,78& of the FP7-acknowledged publications.
 * Compound annual growth rate for PLOS one: 215,35 % from 2009 to 2011
 * 624 FP7 projects were acknowledged (17736 project were exposed in 2012)
 * FP7 grand acknowledgement were unequally distributed over funding programmes, which coheres PLOS' focus on biomedical research and related fiels: 27,96 % of the projects funded within the programme Health Research published at least once in PLOS, eight funding programmes had no references.
 * Proportion of SC39 funded research was higher than the FP7 funding scheme

Article-Level Metrics
 * Every retrieved publication delivered usage date from both PLOS journal website and PubMed Central.
 * Citations: between 43 % (PubMed) and 63% (crossref) of the articles were cited.
 * Social media services mentioned between 8% (comments on PLOS articles) and 81% (Mendeley readerships) of the publications. (Twitter mentions within PLOS ALM started on June 1st, 2012)

Collaboration Patterns
 * 9,52 % of the PLOS publications acknowledged more than one FP7 project.
 * 26,28% of the 624 FP7 projects werde acknowledged together with another FP7 project.
 * 6090 author addresses were obtained.
 * Affiliations originated most frequently from United Kingdom (12, 25%), Germany (11,77%) and USA (11,69%).
 * More than every third publication originated from Western Europe

Discussion & Conclusion

 * Software package plosOpenR allows the exploration of grand-supported research publication in PLOS Journals.
 * The aggregation of alternative science metrics by the PLOS ALM API was demonstrated.
 * Usage data on a daily basis could be retrieved from the PLOS journal website and the disciplinary repository PubMed Central.
 * In the light of quantitative science studies on performance and impact of research publications, this study is limited in various ways.
 * Findings cover only partially all FP7 funding projects due to the disciplinary scope of PLOS.
 * Particular care needs to be taken if future studies rank research articles according to the different metrics in use and develop comparative indiciators that rely on these data.
 * Public media attention has effects on analysing and interpreting research publications.
 * Majority of usage data and social web activity happens in the days and month after publication, citation data are accumulating much more slowly.
 * The set of data should get reanalized at last two years after the last paper in the set has been published.
 * With data sources and visualization methods, plosOpenR provides tools for easy on-time exploration of PLOS ALM in order to identify irregular patterns and motivative qualitative investigation.
 * Future work and studies on PLOS ALM will focus on main problem areas.
 * With the evolving European federation of usage-data providers, OpenAIRE has the potential to provide additional information about usage events and might complement PLOS ALM as PMC already does.
 * Results highlight the importance of openly available research services.

Definition of Coversheets

 * Additional pages added to a resource (full text) in a digital repository
 * Often prepended to the first page rather than appended as an appendix to the document.
 * Examples:
 * JSTOR: provides work behind a cover sheet which provides a standard view of various information about the document, including citation information.
 * Arizona’s DLIST: does not embed a full-page cover sheet, although some citation or versioning information may be provided within some documents.
 * QUT repository: routinely places a cover sheet at the beginning of PDF content.

Use of Coversheets

 * Content submission cover sheets: identifying the submitter and some basic information about the document. These are a form of process management aid, and inherit from the tradition of the printout cover sheet, which was widely used in many institutions to identify documents when printed.
 * Presentational cover sheets: are used to grant a standard layout of documents.
 * Cover sheets as an aid for the researcher: contextual information is kept within a document even if its printed or stored locally.
 * Cover sheets as a means of author identification
 * Cover sheets for data papers: provision of essential indexing information alongside `a package of narrative and references that captures the entire data product’ (for example links to the full dataset).
 * Cover sheets containing copyright information
 * Cover sheets as a visual reference for copyright and permission data
 * Cover sheets as a branding exercise

Good reasons for using coversheets

 * Uniformity
 * Information on Copyright, Versioning, Author, title etc…
 * Linking to other repository information, like policy

Good reasons for not using Cover sheets

 * Time and Resources
 * Perception of interference: content is banned to second page, unwished branding, problems with page numbering, preservation issues

Survey on Coversheets
Method
 * Quantitative survey method
 * Short survey
 * Qualitative discussion approach (interview): small subset of repository managers, in order to enable the authors to identify the key issues and concepts to cover during the survey itself, which was initially piloted with a further subset of subjects.
 * Three page questionnaire: Questions 1 and 2 (page 1) explore the repository system used and the use, if any, of cover sheets.
 * Survey was circulated to three mailing lists with a UK focus and Twitter.
 * Open for 4 day

Results

 * 88 respondents
 * Usage of coversheets:
 * 57% of respondents said that their repository included cover sheets on documents.
 * Additionally 11% of respondents stated that they use them for a subset of documents, made use of unconventional cover sheets (placing them at the back of the document) or were considering making use of cover sheets in future.
 * Two thirds of respondents either use, intend to use or instruct repository users to add cover sheets.
 * Metadata included in coversheets.
 * Title and author, copyright, citation and institution information.
 * Followed by persistent IDs, original venue of publication and the status of the publication.
 * Cover sheets are mainly used on full-text documents and rarely used on metadata records.
 * Creation process of coversheets:
 * 49% of repository managers use an automated process for coversheet creation, of which 8% report the usage of a batch processing approach based on available metadata.
 * Motivation for Coversheets.
 * addition of institutional/departmental branding and document provenance information.
 * followed by citation information and version information.
 * Around a third of this group wrote copyright concerns in a free field of the survey, saying that the provision of a statement set by the publisher is a requirement for allowing self-archiving
 * Reasons for not using cover sheets: lack of demand from repository users (30 %), the question had not arisen, a lack of support from the repository platform, the scalability of the process/the resources involved, both technical and administrative, low priority and identification of alternative strategies Conclusion.

Conclusion

 * The adoption of coversheets is widespread in the UK.
 * A strong argument is the preservation of information but it is not clear, that the preservation has to be done in an embedded coversheet
 * Recommendations for repository managers:
 * Alternative ways to brand like watermarks or stamps
 * Hold metadata elsewhere in the document, for example metadata fields specific to the file format.
 * Review the performance of the coversheet generation process
 * Ensure that the original version of a file is retained
 * Future work in this area should include a broader survey of cover sheet use worldwide, as well as focusing in some detail on the opinions of repository users and contributors

=DAY 2=

European Library

 * Link
 * Data is provided by Archives, Libraries of each member country
 * Data is aggregated nationally
 * Data is delivered to Europeana, the European Library
 * Most visible service is the portal, another important tool is the API
 * Promotes the use and re-use of digital resources in many contexts
 * Ressources aggregated by the EL:
 * National bibliographies
 * Traditional Library catalogues
 * Research collections from national and research libraries
 * EL hosts a centralized index of text resources mainly created by OCR

Resource dissemination and reuse services

 * Search APIs
 * Linked Open Data (available since 2013)
 * Expected benefits: Higher Traffic for data providers, EL as data hub, authority for library data

ARROW - Accessible registries of rights information and Orphan works towards EUROPEANA

 * Tool to facilitate rights information management in any digitisation project involving text and image based works
 * Motivation
 * Support mass digitisation by giving information on the rights
 * Allows to determine for a work:
 * authors, publishers other right holders
 * wether it is an orphan
 * wether it is out of copyright
 * wether it is still commercially available
 * To clarify he rights a complex process is necessary:
 * Determining works a book contains
 * Identify other expressions of these works
 * Identify authors, publishers other right holders
 * Determine the dates of publication at work level
 * Determine wether that works not the book itself is still in commerce
 * If necessary, obtain any licenses from the right holders or collective rights organisations
 * EL is a single access point for resources

Project Enabling the use of research materials form libraries

 * CENDARI: Collaborative European Digital Archive
 * European Cloud: startet in 2013, makes the cultural heritage materials available for research, will setup an infrastructure providing discovery service and tools

Future Developments

 * A research platform will be created together with DARIAH and CESSDA

Definition

 * Metadata scheme: a set of “metadata elements designed for a specific purpose, such as describing a particular type of information resource“
 * Dublin Core Metadata Initiative(DCMI): created new instruments with the aim to adapt the metadata community to the transformations the Semantic Web brought
 * Dublin Core Abstract Model (DCAM): one of this instruments, it is a model for syntax specifications, that present the components and constructs used in DCMI metadata. One of these constructs: Dublin Core Application Profile (DCAP)
 * Dublin Core Application Profil (DCAP):
 * “A generic construct for designing metadata records”
 * Very important construct to implement interoperability
 * --> A method for developing such a construct in order to give DCAP developers a common ground of work.
 * Only guidelines available are in the Singapore Framework and the DCMI Guidelines for DCAP. Problem: they are to brief. No guidelines for life-cycle with standardised activities, no well-defined design criteria with defined techniques
 * Method for the development of DCAP  Me4DCAP (Description: http://hdl.handle.net/1822/), Research is still in progress.

Methodological Approach

 * Based on a design science research (DSR) approach
 * According to A. Hevner DSR has 3 cycles:
 * Relevance Cycle: the “Environment“ supplies the research project with the needed requisites and the application context, and ”defines acceptance criteria for the evaluation of the results
 * Design Cycle: multiple iterations of construction and evaluation before contributions are output into the Relevance Cycle and the Rigor Cycle
 * Rigor Cycle: uses the knowledge base ”of scientific theories and engineering methods" as input for the Design Cycle. The project feeds back the knowledge base with new artifacts and experiences and expertise that define the state of the art in the application domain of the research project.
 * Studying of Best-Practice, Interviews with developers to understand how they work

Description of Me4DCAP V0.1
Work Team
 * 4 types of stakeholders in the DCAP development process: Managers, System Analysts, Metadata Programmers and Final Users,
 * Multidisciplinary team is very important

Me4DCAP approach Conclusion and Future Work
 * Establishes the way through the DCAP development: when activities must take place, how they interconnect, which artefacts they will bring out. It also suggests which techniques could be used to build these artifacts.
 * Iterative process
 * Components that have to be developed (according to the Singapore Framework):
 * Functional Requirements (Component Stage 1)
 * Domain Model (Component Stage 2)
 * Description Set Profile (Component Stage 3)
 * Usage guidelines (optional) (Component Stage 4)
 * Syntax guidelines (optional) (Component Stage 5)
 * Stages: iterative Process by stages, each stage being built on the results of the previous stage:
 * Scope Definition: Scope, organizing the project team, Functional Requirements
 * Construction: Domain Model
 * Development: Description Set Profile
 * Validation: Developed DCAP is validated
 * Artefacts: Me4DCAP suggests artifacts that have to be produced and when they have to be produced. This is defined by the life-cycle development that is iterative. Artifacts being developed at the same time, are together in the same block.
 * Block 1: Vision Statement, Work Plan, Use Cases high level
 * Block 2: Use Case Model, Functional Requirements, Domain Model
 * Block 3: Integration Dossier
 * Block 4: Validation Dossier (in laboratory)
 * Block 5a: Glossary
 * Block 5b: Usage Guidelines, Syntax Guidelines
 * Block 6: Description Set Profile
 * Block 7: Validation
 * Vision Statement:
 * Component Stage 1, Functional Requirements
 * Simple text document with no more than 200 words
 * Describing the boundaries for the DCAP usage
 * Shows, what developers want to reach with the DCAP development
 * Defines the scope of the DCAP
 * Technique: brainstorming with all team members, discussion, documentation in simple sentences
 * Work Plan
 * Component Stage 1, Functional Requirements
 * Goal: time planning of the project activities
 * Serves as guide for the work team
 * Contains start and end date of each phase and the dates when each component stage should be finished
 * Can include information about the responsibilities
 * Text document or Gantt Chart or any type of graph or scheme that the work team finds more convenient
 * Should be built and discussed by all team members
 * Has to be modified as the project evolves
 * Use-Case High Level
 * Component Stage 1, Functional Requirements
 * Use-Case Model
 * Component Stage 1, Functional Requirements
 * Used to develop the Functional Requirements
 * Is composed of
 * UML Use Case diagram with the actors that interact in the Use Cases describing the functionality of the system
 * Set of all detailed Use-Cases
 * Should be developed by the managers
 * Work team should revise the use cases, with the System Analyst members of the work-team helping managers to clarify ideas.
 * Functional Requirements
 * Guide the development of the application profile by providing goals and boundaries
 * Are an essential component of a successful application profile development process
 * May involve managers of services, experts in the materials being used, application developers and potential end users
 * Text document, where general goals are mentioned as well as specific tasks.
 * The develop the Functional Requirements the work-team should red, in group, the Use-Cases to identify which are the functional requirements
 * Short sentence should be used
 * The whole team should discuss and review the final use cases
 * Domain Model
 * Component Stage 2
 * Description of what things your metadata will describe, and the relationship between those things
 * Basic blueprint for the construction of the application profile
 * Based on Component Stage 1 and on the Use-Cases Model artifact
 * Should be developed using an UML class diagram with details suppressed. The diagram identifies the classes of objects and the relationships among them but the methods and attributes are omitted. Alternatively an Entity-Relationship diagram showing entities and relationships can be used.
 * Integration Dossier
 * Component Stage 3
 * Comprises 3 artifacts:
 * Object Role Modeling (ORM/NIAM) diagram data model: contains the classes of the objects (defined in the Domain Model), attributes and attributes’ constraints such as their repeatability, domain and multi-language option. Classes and Objects should have been already described in plain text in the Usage Guidelines Component Stage 4 by the stakeholders.
 * State of the Art: For every class of the domain model the properties have to be defined. An analysis of existing metadata schemes (described in RDF) should be performed to identify useful properties. If no suitable properties can be found in existing schemes new properties can be created. Both steps should be performed by metadata programmers.
 * Document of Integration: shows in a matrix, per line, every attribute and its constraints, described by the properties of the metadata schemes and encoding schemes chosen. Work should be done by the metadata programmers.
 * Validation Dossier:
 * Comprises 3 mandatory artifacts: Validation Report, a filled-in Matrix and a filled-in Questionnaire
 * Validation Report: the Vision Statement is compared to the developed AP. The Work team should make a report (text document) with the conclusion of the meeting and recommendations.
 * Matrix: the AP is applicated on resource samples. The work team should identify a set of resources and final users (chosen by the stakeholders) and the metadata programmers should complete the validation matrix with data referring to each resource. The matrix template should be simple to fill in. Matrix should be accompanied by the 2 Guidelines Component Stage 4 and Stage 5. Example:
 * Questionnaire: Final users and Metadata programmers should answer a set of questions to assess difficulties of the validation process.
 * Description Set Profile
 * Mandatory
 * Details the metadata by developing their design in the DSP language defined by Nilsson
 * Glossary
 * Guidelines:
 * Not mandatory:
 * Usage Guidelines
 * Development starts with the same time as the Domain Model.
 * Provide the “how” and “why”.
 * Offer instructions to those who will create metadata records.
 * Explain each property and anticipate the decisions that must be made in the course of creating a metadata record.
 * Developed by stakeholders and metadata programmers
 * Syntax Guidelines
 * Needs that the Integration Dossier is developed in a certain stage
 * Developed by the metadata programmers
 * Finishing
 * Validation of Development Process should be done
 * Results should be reported to the work team in order to review and access the DCAP definitions.
 * If there is new information the whole DCAP development process should start from Block 1 and every artifact should be checked against the new information.
 * Studies have shown that there is no method to develop such a construct
 * Metadata community needs a common ground of work concerning the development of DCAP.
 * Me4DCAP establishes the way through the DCAP development
 * Future work: validation process of DCAP under the scope of the DSR approach, adaption of Me4DCAP to the needs of the metadata community. Validation will be done using the Focus Group approach

Background

 * 2007: Thesis "Heritage Portals and Cross-Border Data Interoperability"
 * Thesis examined the possibilities of making heritage portal that could enable a cross-border search of heritage datasets in Europe.
 * Conclusions
 * Many digital publications online
 * They can be searched and viewed by general public
 * Only few are available to external researches in a meaningful way so that they can define new search criteria and re-use results in further research
 * Some online publications do consider external researchers, but many don't
 * Portable Antiquities Scheme http:///finds.org.uk: encourage external researchers to sign up and allows exports of XML and KML
 * Vindolanda Tablets Online: http://vto2classics.ox.ac.uk

Suggestions

 * Dividing users in three categories: general public, external researchers, internal researchers
 * Data available through searches in several download formats (at least CSV)for general public
 * Registration Scheme for external researchers, which gives access to data from searchers through dynamic web services and possibly gives the opportunity to suggest new search criteria.
 * Internal Researchers have access to the above and the option of contributing new data to the publication.
 * Data sharing policy first: what to share? How to share? Who to share with?

Application Programming Interface

 * REST Web Service named Apello was developed as a part of the thesis
 * Enables dynamic searches of TEI formatted XML datasets
 * Used to run the Vindolanda Tablets Online 2 Website
 * Can used by external researchers for re-using for the list of words found in these ancient documents as a look-up for their own research for example
 * Build applications for the API and open the API up to other developers