Difference between revisions of "AWOB"

From MPDLMediaWiki
Jump to navigation Jump to search
m (GAVO moved to AWOB: Astronomers’ Workbench: new project title)
m
Line 293: Line 293:
* [http://eprints.soton.ac.uk/42074/ report] on Virtual Research Communitues
* [http://eprints.soton.ac.uk/42074/ report] on Virtual Research Communitues
* [http://archer.edu.au ARCHER project]
* [http://archer.edu.au ARCHER project]
[[Category:Astronomers’ Workbench| ]]
[[Category:MPDL]]
[[Category:MPDL]]
[[Category:Projects]]
[[Category:Projects]]

Revision as of 14:45, 28 January 2009

This is a protected page.

Preparation and planning for shared MPDL project "Scholarly Workbench for Astronomy"

Based on experiences and outcomes of German Virtual Observatory (GAVO)

Contacts@MPE: Jaiwon Kim, Gerard Lemson, Wolfgang Voges


Scenarios[edit]

Collaborative environment[edit]

Enable easy, wiki-like setup of collaborative environment for shared projects. Allow registered users to access the project, the related pages and linked and/or uploaded data. Link the collaborative platform with eSciDoc repository to allow long-term archiving and PIDs for content stored.

Example for shared project workflow (see details in slides (restricted):

  • definition of shared project and objectives
  • definition of required experiments
  • distribution of responsibilities
  • tracking of activities and results: set up experiments, run experiments, produce data, postprocess data, analyse data, extract scientific results
  • share data, combine results
  • produce publication-ready paper (shared authoring)


Components of the collaborative work[edit]

  • Collaborative work is "publication-in-progress" developed in a Wiki environment as output of the research process.
  • Collaborative work comprises
    • textual components - mostly metadata describing textual part of a collaborative work such as: abstract, title, authors, affiliations, keywords, annotations of sources, references, and its structural information such as:subject headings, body sections etc.
    • non-textual components - mostly representative data sets and illustrations to support the scientific results and conclusions which are presented in tables and figures. Example of figures are images, plots, and diagrams etc.
    • integrated external tools - provide links to existing external tools and services for both textual and non-textual components. This facility allows users to work in familiar environments as well as in interactive ways of visualizing and manipulating non-textual components and their underlying data presented in a collaborative work.

Textual components[edit]

Textual components enable to:

  • link to references to preprints, published papers etc.(ADS, arXiv)
  • lookup annotated sources i.e. astronomy objects in databases such as Simbad, NED
  • describe the collaborative work with metadata which could be used for query such as: authors, title, abstract, keywords
  • describe the structure of collaborative work

[Questions(JK) 1. Is a body of each section also a textual component? 2. Is an equation a textual component? 3. Figure/table caption: could it be a metadata of non-textual component?]

Non-textual components[edit]

Non-textual components enable to:

  • visually represent research data and illustrations such as experiment set up. These components could be represented in tables, and figures including images, plots, and diagrams and may have own metadata (e.g. image metadata)

[Question(JK) Could you give some specific examaples of image metadata? Depending on the context types of metadata could be quite different. For example, is it like the size, and file type of an image or more science related metadata, e.g., observation date, location, etc ?]

I think in this case we were more thinking on relly image metadata such as size, resolution etc. The observation dates, location etc. would be candidate for the descriptive metadata of the item with which the image is associated. --Natasa 08:34, 21 October 2008 (UTC)
  • show metadata for an e.g. image

[JK: Please see above]

  • invoke external data collection viewer

[Question(JK) Does this mean to invoke a external tool to view the underlying data?]

Yes, there were some examples provided by you on the last meeting. --Natasa 08:34, 21 October 2008 (UTC)
  • download data related to the component
  • open external tool for visualizing and working with the data
    • for tabular data: e.g. TOPCAT
    • for image data: e.g. Aladin
    • for spectral data: e.g. SpecView, Splat, VOSpec
    • PLASTIC enabled

Integrated external tools[edit]

Integrated external tools enable to link from either textual or non-textual components to existing external astronomical services or tools such as:

  • astronomical paper and data archive services - directly linked to widely used publication, and preprint archives such as ADS, and arXiv, and astronomical data archives such as NED, Simbad, VizieR and SkyServer. Also provide link to registries of astronomical resources which enable users to discover and to get connected to smaller data sets, and services.
  • common analysis environments (IDL, ...)
  • standardized data retrieval services for various types of astronomical data compliant to IVOA standards :

Sharing of content[edit]

Enable privileged users to upload, and /or link and describe data with metadata, comments and notes. In astronomy some data are public which are available to everyone, and some are proprietary to a project, or to a group of collaborators. Usually data from observations become publicly available after a finite time. It is a common practice for collaborators to share their private data and their analysis in astronomy. In order to share private data, it is typical for a group of collaborators to set up a site with login/password protection, and to exchange data files mostly in FITS(Flexible Image Transport System) format with few description via ftp or xxx. For public data it is possible to access them without special permission(JK: Verify it). Due to the lack of proper metadata capture it is often difficult to utilize and to query science products and to manage them in the long term unless it is maintained by large data archives. In order to provide consistent and efficient(?) way of sharing data we provide services to upload data with proper metadata which could be used for querying, and xxxx. We support the following data formats:

  • Standardised format: FITS, VOTable,
  • Custom format: XML(?), Tabular data in comma/tab separated(more input needed)

[Question(JK) 1. Could textual and non-textual components of a collaborative work be a sharing content? Somehow I assume that content in this section is limited to astronomical data. 2. Do we need to describe public/private data, as well as raw data/science product somewhere? 3. I put a few points that might be appeared/clarified on this section. Please ignore messiness. ]

Supported astronomical data types are:

Types of Astronomical Data[edit]

  • images (from observations, simulations, and etc )
  • spectra
  • source catalog
  • More inputs(Any data which could be in FITS binary table)
  • time series
  • good time intervals
  • light curves
  • source extraction(?)

Auxiliary data types[edit]

  • diagram
  • flow chart
  • illustration
  • publication (textual components)

Metadata to be supported[edit]

  • Bibliographic metadata
    • title, author, abstract, subject heading, journal metadata
  • Structural metadata/elements
    • section/TOC, annotation, footnote, equation, caption, references
  • Astronomical metadata
    • FITS keywords, values, and comments
    • tabular data column name
    • xxxx
  • Other
    • provenance (input files, make files, plotting scripts, analysis code, simulation code, ...)

[(JK) analysis code, and simulation code may be moved to Astronomical metadata]

    • log files
    • curation (more input needed)
    • PIDs (ADS, IVOA)
    • IVOA standards (VOTable, UCD, UTYPE, Data models, data access protocol, ...)

[Comments(JK) I don't think IVOA standards belong here]

Shared Authoring[edit]

Author tools are provided to enable shared and standardised authoring. Authors are supported in developing publication-ready papers.

  • Provision of text editor (emacs? TeX IDEs?)
  • Import LaTeX article and conversion to html (incl. figures, tables)
  • Templates for publication-ready papers (metadata attachments, links, figures, captions)
  • allow publication-ready figures from visualisation tools

[Question(JK) 1. How does one distinguish 'publication-ready' from 'publication-in-progress'? Does publication-ready mean to review the result of 'publication-in-progress' phase?

MPDL project - draft[edit]

AWOB – Astronomers’ Workbench

Summary[edit]

Within astronomy community, the need for a user-centered community platform to work collaboratively with research data has been identified. The project will be planned in 3 phases: The first phase focuses on building a demonstrator community platform together with two project partners (MPI extraterr. Physik, MPI Astropyhsik), which allows shared work during a complete scientific project workflow. It is integrated in the eSciDoc infrastructure and integrates external astronomical services (such as catalogs, databases) and integrates necessary discipline-specific tools to visualize and manipulate the externally stored research data. The second phase will focus on technical consolidation, dissemination and negotiations with other MPIs to join for project extensions. The third phase will focus on technical consolidation and extensions needed for the other partner institutes. In addition, technical aspects regarding long-term archiving of discipline-specific research data will be considered.

Background[edit]

  • Results of German Astrophysical Virtual Observatory (GAVO)
    • make results (data sets and services) of astronomical research easily available to community
    • faciliate standardised publication of results (PIDs, Virtual Observatory standards, long-term archiving)
    • focus on interoperability to enable networking (standards in use: IVOA)and automated discovery and re-use
    • make use of standards-aware client tools and services (for cross-matching, visualisation, combination, data mining etc.)
  • Re-use of data leads to more references and scientific improvements => proof of concept Millenium Run
    • community-based quality control: errors discovered by others have improved data quality
    • still, as no formal revisions were made, old/original data was lost
  • currently, no (or limited) possibilities to add original data to publication, only representations/shortened examples:
    • e.g. only image representations of multi-dimensional data
    • e.g. only representative samples of large collections (images, spectra, source catalogs)
    • e.g. only static data
  • enable the shift from large data centers/resource registries (based on IVOA, formal, machine-readable, homoegeneous) to scientific practice, i.e. collaboration and publications (informal, human-readbale, heterogeneous)

Needs[edit]

  • online publications linked to/from online published data sets
    [Question(JK) Does 'online published data sets' mean the data sets presented in a publication(paper)?]
  • networking through standardisation
  • collaboration enabled
  • focusing on scientific practice (collaboration, publication), by re-using existing data centers and resource registries, existing standards, and adding scientific "workbench-environment"
    • no interruption of daily practice
    • faciliate publishing of data
  • online environment should support
    • collaborative authoring for publications in virtual organisations
    • explicit integration of data sets used for/in the final publication(s) by either uploading original data or linking to external data sets
    • annotation of resources with metadata and identifier according to IVOA standard
    • value-added services on known data types (search, mining, visualization, analysis)
    • interfaces to external archives/registries/catalogs via standard protocols
    • integration of client tools (needed and known in community)
    • long-term preservation of resources (publications, data, services)
    • registration of resources in IVOA standard registries

Preparatory and related work[edit]

  • GAVO
    • stable storage and curation of data products needed
    • stabe environment for deploying Virtual Observatory protocols and other value-added services
  • IVOA standard
  • AstroGrid (?)

Wider context/Re-use for others[edit]

  • Long-term storage of data sets used in a publication (cf. deposit mandate?)
  • Open access to all results of scientific research online (cf. Berlin declaration?)
  • Showcase for added value of implemented standards
    • mandated by some funding agencies
    • IVOA dataset identifier in use by ADS (main portal for astronomers)
  • Integration of standards, stable infrastructure and web2.0 technologies to facilitate dynamic and collaborative environments (cf. eSciDoc?)
  • Re-use for astronomy community within MPS (institutes are listed under organizational )

Work description[edit]

Pilot phase[edit]

  • Set-up community platform for creation of shared projects, registration of users, assign privileges
    • Analysis of existing community-based platforms for linking community-environment to eSciDoc repository
    • Basic user management
      • Author access - authors of the project
      • Administrator access - project coordinator
      • Public access - public users
    • Linking
      • to external data sets/services (URL based)
      • to eSciDoc resources (Wiki extension to support special eSciDoc tag or URL based)
  • Basic integration of community platform with eSciDoc pilot solution
    • enable upload (or referencing) and description of data
    • enable invocation of external selected visualization tool from eSciDoc pilot solution (e.g. for FITS data)
    • integrate arXiv and ADS(if possible) as sources for fetching publications, pre-prints
  • Explore possibilities for federated search (within community platform, eSciDoc repository, 1-2 external services)

ToDo[edit]

Clarify:

  • which data to be supported in pilot phase
  • precise functional requirements for data management (Scenario level)
  • eSciDoc managed vs externally referenced data
  • formats (e.g. FITS) and how they should be supported (e.g. storage, search, visualization via external tools, etc.)
  • types of annotated sources and relating them to external services
  • available external client tools for demonstration (quick win)

Work distribution[edit]

Workpackages based on pilot approach

  • Wiki selection
  • Basic user management
  • Demonstrator solution
    • Architecture
      • main components (wiki, eSciDoc)
      • interaction between Wiki, eSciDoc repository, eSciDoc solutions
      • identification of existing services to be involved, evtl. modification
      • checking if existing eSciDoc solutions (PubMan) can be re-used and identify necessary modifications
    • Implementation

Required resources[edit]

Phase Human Resources Costs
Phase 1-2 1 FTE (E13) for 8 months for implementation of the demonstrator 63T€ p.a.= 42T€
Phase 1-3 0,5 FTE (E13) for coordination, outreach in MPG/other sections 32T€ p.a.= 84T€
Phase 3 3x 1FTE (E13) for 24 months for development of consolidated solution 126T€ p.a.=252T€
Phase 3 1 FTE (E13) for 12 months for long-term archiving aspects 63T€
Phase 3 1 FTE (E13) for 24 months for long-term archiving aspects 63T€ p. a. = 126T€
Phase 3 2x 0,5 FTE (E13) for 24 months for partner-specific extensions 63T€ p.a. = 126T€

Organizational[edit]

Proposed project duration

  • Overall: 32 months
  • Phase 1: 6 months
  • Phase 2: 2 months
  • Phase 3: 24 months

Overall costs (human resources, hardware, other)

  • Human Resources
    • SUM Total: 693T€
    • SUM 2009: 63T€

Potential Partners

  1. Phase 1:
    • MPI Extraterrestrische Physik (CPT)
    • MPI Astrophysik (CPT)
  2. Phase 2/3:
    • MPI Astronomie (CPT)
    • MPI Gravitationsphysik (CPT)
    • MPI Kernphysik (CPT)
    • MPI Physik (CPT)
    • MPI Radioastronomie (CPT)
    • MPI Sonnensystemforschung (CPT)

Responsible for proposal

Meetings[edit]

15th sept 2008[edit]

  • first brainstorming at MPDL/Munich

25th sept 2008[edit]

  • updated presentation by Gerard/Jaiwon/Wolfgang (see under SVN of MPDL (restricted)
  • outcome:
    • First draft of MPDL project proposal until 10th of oct (Ulla, Natasa), focusing on requirements and approach
    • First draft of eSciDoc HowTo for definition of high-level requirements

References/links[edit]