Trip Report NEERI 2010

From MPDLMediaWiki
Jump to navigation Jump to search


  • Conference full title: Networking Event for the European Research Infrastructures (NEERI)
  • Date/Place: 21 October 2010, Vienna

Introduction[edit]

NEERI2010 is the second Networking Event of its kind, providing a follow-up to NEERI2009 held in Helsinki. The goal of NEERI2010 is to exchange ideas on a number of topics relevant for research infrastructures and to clear common ground on the further development and application of these topics. NEERI focuses on what we share and what we can learn from each other. Examples of such commonalities are architectural issues, communication with users and integration of services and tools.

Keynote[edit]

Starting with a very interesting Keynote from Laurent Romary, clarifying the

    Report of the High-Level Expert Group on Scientific Data, October 2010. 
    Riding the Wave: how Europe can gain from the raising tide of scientific data
    File:Laurent Romary Plenary key speaker-HLG-SDI Report ppt - NEERI.pptx

and its impact on the humanities and social sciences.

The Expert group considers of outmost importance for the research infrastructures to establish collaboration in all important organizational and technical aspects, towards a vision of a "scientific e-Infrastructure that supports seamless access, use, re-use and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure – a valuable asset, on which science, technology, the economy and society can advance."

That would enable as well collaboration among researchers, increase the productivity of research, allow for sharing, use and re-use of data, however by preserving data authenticity, integrity and trustworthiness. It is both a challenge and an opportunity to establish proper data management and data integration infrastructures - having in mind the data scale, complexity and diversity and their accelerating growth. Beneficiaries of such established, living and collaborating research infrastructures would be not only the researchers, but also general public, funders and policy makers, as well as enterprises and industry. Therefore, EU and national agencies must define clear strategies and ensure sufficient resources for their implementation.

The Expert group had developed an initial wish list (adapted from the PARADE White Paper) containing minimum requirements that such an infrastructure has to fulfill: long term preservation (bitstream, format migration), persistent identification, standardization of metadata - format- and semantic level interoperability, proper implementation of access rights, enabling large groups of researchers to operate on the data, regular quality assesment and metrics on data usage, availability and reliability to feed back into further improvement of the infrastructure.

Additionally, a list of possible actions to overcome impediments such as financing, trustworthiness, data expertise, usage and complexity of the infrastructures, lack of published data, unwillingness to cooperate accross disciplines.

General mssage - close collaboration between researchers, funders, implementors of research infrastructures and industry has to be established from the very beggining - only then challenges could be addressed successfully.

Connecting the European Grid Infrastructure (EGI) to Research Communities[edit]

  • Steve Brewer, EGI.eu, Chief Community Officer

Steve Brewer gave an overview of the EGI development and how EGI works towards achieving the goal of increasing the number of scientists and research groups that actively use and benefit from EGI. Communities have to be actively supported technically by enabling innovation in technologies (Grids, Clouds, virtualization) , innovation in software (provide reliable and persistent platform) and supporting international research (e.g. ESFRI). Additionally Human networks have to be developed and cultivated, at the end, humans are users of such infrastructures - through both general (training events, material, helpdesks, user and technology meetings) and discipline specific services (e.g. Bio Apps). Continuous definition and verification of user requirements has to be established. Grouping of these activities into Virtual organizations to help better address these issues through setting up Virtual Research Communities.

Certainly an experience that can be re-used as well in Humanities and Social Sciences oriented research infrastructures.

Grids, Clouds and Research Infrastructure (P. Wittenburg)[edit]

  • Peter Wittenburg, Max Planck Institute for Psycholinguistics, Head of the Language Archive

Peter Wittenburg gave a very interesting definition of grid and cloud terms and how would a humanities researcher benefit from it, having in mind the nature of the research in humanities: is highly unpredictable, usually small and focused projects but scattered and diverse data; data has to be sustainable - e.g. it's about a human history. Usually, a lot of technical services, that in general have to be inexpensive. However, there are still issues with willingness of researchers to share their data, most probably due to amount of non-automated and intellectual work contributed in the data - researchers are sensitive to ownership. In general, computing over structured data is not an issue here, rather is the question how to quickly enable tools that can "simulate human mind".

DARIAH and CLARIN are certainly large european e-Infrastructure projects that try to address some of these issues - to enable for reliable, sustainable and trustworthy data storage that ensures the data integrity, authenticity, visibility, accessibility, interpretability, etc. Such data infrastructure must implement accordingly mechanisms for authorization and authentication , but also offer services and tools to work accross scattered data resources.

How did Grid/Cloud related projects contribute to Humanities and Social Sciences (SSH)? Grids are mostly used for data storage. SSH are mostly considered out of scope for the Grids, despite projects such as TextGrid (Germany) - Grids are used mainly for data storage; SSH researchers are not even aware if they are using cloud services or not (this may also be considered as positive outcome).

Whether SSH can benefit from Grids/Coulds is still not completely clear. There are many issues concerning financing, data ownership, long term data accessibility- especially when it comes to usage of Cloud-based services - "is AMAZOOGLE for data what Elseviers are for publications"?. Even if Could based services do not have to be commercial, there is a long way upfront to enable research Clouds unburdened from commercial use. Standardization is certainly an issue (work started at DTMF, see http://www.dmtf.org/standards/cloud)

At the end, it is all about services that shall be offered to the researchers. Still the question is whether SSH related e-Infrastructure projects can make an optimal use of all the knowledge and experience from decades of Grid development?

Grids, Clouds and Research Infrastructure[edit]

  • Bob Jones,CERN, Senior Staff Member

Bob Jones provided insight into the Pan-European computing infrastructure, including high-speed networking (GEANT) - and suggested opportuntities and challenges for how e-infrastructures can evolve in future to satisfy Europe's research communities requirements.

The European E-Infrastructure Forum (EEF, http://www.einfrastructure-forum.eu/ ) is seen as one of the instruments that enable discussing the principles and practices for creation of synergies between various and distributed e-Infrastructures, with a common goal to make them interoperable. The Forum held several workshops in 2009 and 2010 to gather further information about ESFRI project requirements. Total of 28 projects were consulted including 5 from SSH sector (CLARIN, ESS, DARIAH, SHARE, CESSDA). The requirements natively summarize to usage of standards in technology and metadata, interoperability, rights management, advanced services for collections management, multilingual support, linked data and persistent identifiers, etc. These issues are especially complex to address when one considers combination of grids, clouds, supercomputers and volunteer computing as combined e-Infrastructure ecosystem. There are a lot of problems in reality and there is not yet a right answer to it, maybe as a start to aim for interoperability and keep applications agile. The question still remains how could this be applied in the SSH sector (see also P.Wittenburg presentation). Next steps could address the harmonization of already existing services - by enabling them to talk to each other (interoperability, single-sign-on, standards, persistent storage and identification, monitoring, billing and accounting services) as well as improved user support, training and concultancy. Continuous work on the requirement and feedback from users and usage data has to be invested. The process is durable and iterative, and can be strengthen by joint projects and activities.