Trip Report NEERI 2010

From MPDLMediaWiki
Jump to navigation Jump to search


  • Conference full title: Networking Event for the European Research Infrastructures (NEERI)
  • Date/Place: 21 October 2010, Vienna

Introduction[edit]

NEERI2010 is the second Networking Event of its kind, providing a follow-up to NEERI2009 held in Helsinki. The goal of NEERI2010 is to exchange ideas on a number of topics relevant for research infrastructures and to clear common ground on the further development and application of these topics. NEERI focuses on what we share and what we can learn from each other. Examples of such commonalities are architectural issues, communication with users and integration of services and tools.

Keynote[edit]

Starting with a very interesting Keynote from Laurent Romary, clarifying the

   Report of the High-Level Expert Group on Scientific Data, October 2010. 
   Riding the Wave: how Europe can gain from the raising tide of scientific data
   Image:Laurent Romary Plenary key speaker-HLG-SDI Report ppt - NEERI.pptx

and its impact on the humanities and social sciences. The Expert group considers of outmost importance for the research infrastructures to establish collaboration in all important organizational and technical aspects, towards a vision of a "scientific e-Infrastructure that supports seamless access, use, re-use and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure – a valuable asset, on which science, technology, the economy and society can advance." That would enable better collaboration among researchers, increase the productivity of research, allow for sharing, use and re-use of data, however by preserving data authenticity, integrity and trustworthiness. It is both a challenge and an opportunity to establish proper data management and data integration infrastructures - having in mind the data scale, complexity, diversity and their accelerating growth. Beneficiaries of such established, living and collaborating research infrastructures would be not only the researchers, but also general public, funders and policy makers, as well as enterprises and industry. Therefore, EU and national agencies must define clear strategies and ensure sufficient resources for their implementation. The Expert group had developed an initial wish list (adapted from the PARADE White Paper, see http://www.csc.fi/english/pages/parade/whitepaper) containing minimum requirements that such an infrastructure has to fulfill: long term preservation (bit stream, format migration), persistent identification, standardization of metadata - format- and semantic level interoperability, proper implementation of access rights, enabling large groups of researchers to operate on the data, regular quality assessment and metrics on data usage, availability and reliability to feed back into further improvement of the infrastructure. Additionally, a list of possible actions to overcome impediments such as financing, trustworthiness, data expertise, usage and complexity of the infrastructures, lack of published data, unwillingness to cooperate across disciplines.

General message - close collaboration between researchers, funders, implementers of research infrastructures and industry has to be established from the very beginning - only then challenges could be addressed successfully.

Connecting the European Grid Infrastructure (EGI) to Research Communities[edit]

  • Steve Brewer, EGI.eu, Chief Community Officer

Steve Brewer gave an overview of the EGI development and how EGI works towards achieving the goal of increasing the number of scientists and research groups that actively use and benefit from EGI. Communities have to be actively supported technically by enabling innovation in technologies (Grids, Clouds, virtualization), innovation in software (provide reliable and persistent platform) and supporting international research (e.g. ESFRI). Additionally Human networks have to be developed and cultivated, at the end, humans are users of such infrastructures - through both general (training events, material, helpdesks, user and technology meetings) and discipline specific services (e.g. Bio Apps). Continuous definition and verification of user requirements has to be established. Initiatives need to be grouped into Virtual organizations to better address issues through setting up of Virtual Research Communities. Certainly an experience that can be re-used as well in Humanities and Social Sciences oriented research infrastructures.


Certainly an experience that can be re-used as well in Humanities and Social Sciences oriented research infrastructures.

Research infrastructures and DARIAH approach[edit]

  • Sheila Anderson, Kings College London

Sheila started by opening up the notion of data, important for the SSH (Social sciences and Humanities) sector:

  • Data of source
  • data as information - used and developed by communities

Both of these aspects have to be brought together - and this is addressed by DARIAH. There are 2 guiding principles in DARIAH with this respect: architectural participation (with the researchers and not for the researchers) and collective intelligence. This is achieved by initiatives related to encourage communities of practice to use DARIAH infrastructure, and, support concrete research questions while leveraging the infrastructure through experimental seminars, workshops, discussion groups, visualization tools, data mining etc. Collecting intelligence assets, starting to integrate data from isolated information source, and feed back this information to the community - to generate new information and insights. One possibility is to include these activities already in educational institutions via students or masters programs, as well as educating and supporting researchers, federating data with other research infrastructures (e.g. CLARIN). Sheila addressed several questions, such as : • how to make researchers aware of networked research infrastructures and help then to choose which one to use? Here it is important to recognize the role of the local institutions able to offer help and work with virtual centers of competence.

  • How to connect people who are in different countries and thus communicate in different languages?

CLARIN already shown that it is possible to provide some translations - we have to be patient for more results. Situation may also differ from one to another language - we talk about multilingual metadata, keywords, descriptions etc.

  • there is too much information and too much networks, which one to choose?

Not only projects themselves and local institutions, but also funders and scientific evaluators need to raise awareness and endorse the usage of the research infrastructures. Important is also to start visiting the institutions and seeing whom people are asking for help when they need it.

Grids, Clouds and Research Infrastructure (P. Wittenburg)[edit]

  • Peter Wittenburg, Max Planck Institute for Psycholinguistics, Head of the Language Archive

Peter Wittenburg gave a very interesting definition of grid and cloud terms and how would a humanities researcher benefit from it, having in mind the nature of the research in humanities: is highly unpredictable, usually small and focused projects but scattered and diverse data; data has to be sustainable - e.g. it's about a human history. However, there are still issues with willingness of researchers to share their data, most probably due to amount of non-automated and intellectual work contributed in the data - researchers are sensitive to ownership. In general, computing over structured data is not an issue here, rather is the question how to quickly enable tools that can "simulate human mind". DARIAH and CLARIN are certainly large European e-Infrastructure projects trying to address some of these issues - to enable for reliable, sustainable and trustworthy data storage that ensures the data integrity, authenticity, visibility, accessibility, interpretability, etc. Such data infrastructure must implement corresponding mechanisms for authorization and authentication, but also offer services and tools to work across scattered data resources. How did Grid/Cloud related projects contribute to Humanities and Social Sciences (SSH)? Grids are mostly used for data storage in SSH sector, despite projects such as TextGrid (Germany; SSH researchers are not even aware if they are using cloud services or not (this may also be considered as positive outcome). Whether SSH can benefit from Grids/Clouds is still not completely clear. There are many issues concerning financing, data ownership, long term data accessibility- especially when it comes to usage of Cloud-based services - "is AMAZOOGLE for data what Elseviers are for publications"? Even if Cloud-based services do not have to be commercial, there is a long way upfront to enable research Clouds unburdened from commercial use. Standardization is certainly an issue (work started at DTMF, see http://www.dmtf.org/standards/cloud) At the end, it is all about services that shall be offered to the researchers. Still the question is whether SSH related e-Infrastructure projects can make an optimal use of all the knowledge and experience from decades of Grid development?


Grids, Clouds and Research Infrastructure (B. Jones)[edit]

  • Bob Jones,CERN, Senior Staff Member

Bob Jones provided insight into the Pan-European computing infrastructure, including high-speed networking (GEANT) - and suggested opportunities and challenges for how e-infrastructures can evolve in future to satisfy Europe's research communities requirements. The European E-Infrastructure Forum (EEF, http://www.einfrastructure-forum.eu/ ) is seen as one of the instruments that enable discussing the principles and practices for creation of synergies between various and distributed e-Infrastructures, with a common goal to make them interoperable. The Forum held several workshops in 2009 and 2010 to gather further information about ESFRI project requirements. Total of 28 projects were consulted including 5 from SSH sector (CLARIN, ESS, DARIAH, SHARE, CESSDA). The requirements natively summarize to usage of standards in technology and metadata, interoperability, rights management, advanced services for collections management, multilingual support, linked data and persistent identifiers, etc. These issues are especially complex to address when one considers combination of grids, clouds, supercomputers and volunteer computing as combined e-Infrastructure ecosystem. There are a lot of problems in reality and there is not yet a right answer to it, maybe as a start to aim for interoperability and keep applications agile. The question still remains how could this be applied in the SSH sector (see also P.Wittenburg presentation). Next steps could address the harmonization of already existing services - by enabling them to talk to each other (interoperability, single-sign-on, standards, persistent storage and identification, monitoring, billing and accounting services) as well as improved user support, training and consultancy. Continuous work on the requirements and feedback from users and usage data has to be invested. The process is durable and iterative, and can be strengthen by joint projects and activities.


Grids, Clouds and Research Infrastructure (R. Kramer)[edit]

  • Rutger Kramer, DANS, Software Development Coordinator

Rutger Kramer emphasized the possibility for the usage of existing Grid technologies and referenced the CESSDA-PPP Report on usage of Grids and Clouds in the e-Infrastructure (see http://www.cessda.org/project/doc/D11.1b_Sustainability_CESSDA_e-Infrastructure.pdf). Even though the report gives quite comprehensive overview of both Grid and Cloud technologies (possibilities and drawbacks), it focuses on several main use-cases, related to cross-national comparative research, data analysis, data set harmonization, survey producers. The report however does not make clear recommendation for any of grid/cloud technologies and is missing more details on the overlapping part that relates the explored use-cases in a common e-Research infrastructure. Rutger pointed that in general problems are not to be seen solely from the aspect of the possibilities/drawbacks of the existing technologies, rather it is about accelerating the coordination and distribution of efforts in development of standards for registries and information exchange, services (visualization, transcription, etc) and persistent storage into a common platform. Considering grids as such a common platform, future e-Infrastructure projects could use grid single Sign-On scheme or implemented mechanisms to share resources. It certainly needs enhancement and adaptation; however it could be used as a basis for building of applications. Re-using existing grid platform would be less costly and hazardous.