AWOB

From MPDLMediaWiki
Revision as of 13:29, 20 October 2008 by Jkim (talk | contribs) (→‎Textual components: Added that the collaborative works's metadata could be potentially used for query.)
Jump to navigation Jump to search

This is a protected page.

Preparation and planning for shared MPDL project "Scholarly Workbench for Astronomy"

Based on experiences and outcomes of German Virtual Observatory (GAVO)

Contacts@MPE: Jaiwon Kim, Gerard Lemson, Wolfgang Voges


Scenarios[edit]

Collaborative environment[edit]

Enable easy, wiki-like setup of collaborative environment for shared projects. Allow registered users to access the project, the related pages and linked and/or uploaded data. Link the collaborative platform with eSciDoc repository to allow long-term archiving and PIDs for content stored.

Example for shared project workflow (see details in slides (restricted):

  • definition of shared project and objectives
  • definition of required experiments
  • distribution of responsibilities
  • tracking of activities and results: set up experiments, run experiments, produce data, postprocess data, analyse data, extract scientific results
  • share data, combine results
  • produce publication-ready paper (shared authoring)


Components of the collaborative work[edit]

  • Collaborative work is "publication-in-progress" developed in a Wiki environment as output of the research process.
  • Collaborative work comprises
    • textual components - mostly metadata describing textual part of a collaborative work such as: abstract, title, authors, affiliations, keywords, annotations of sources, references, and its structural information such as:subject headings, body sections etc.
    • non-textual components - mostly scientific results, and representative data sets and illustrations to support the scientific results, and conclusions presented in tables and figures. Example of figures are images, plots, and diagrams etc.
    • integrated external tools - interactive tools for visualization of non-textual components and manipulation of underlying data of these components, querying of remote archives etc.

Textual components[edit]

Textual components enable to:

  • link to references to preprints, published papers etc.(ADS, arXiv)
  • lookup annotated sources i.e. astronomy objects in databases such as Simbad, NED
  • describe the collaborative work with metadata which could be used for query such as: authors, title, abstract, keywords
  • describe the structure of collaborative work

Non-textual components[edit]

Non-textual components enable to:

  • visualy represent research data. This visual representation may have own metadata (e.g. image metadata)
  • show metadata for an e.g. image
  • invoke external data collection viewer
  • download data related to the component
  • open external tool for visualizing and working with the data
    • for tabular data: e.g. TOPCAT
    • for image data: e.g. Aladin
    • for spectral data: e.g. SpecView, Splat, VOSpec
    • PLASTIC enabled

Integrated external tools[edit]

Integrated external tools enable to link from either textual or non-textual components to existing external astronomical services or tools such as:

Sharing of content[edit]

Enable for privileged users to upload, and /or link and describe data with metadata, comments and notes.

  • Standardised data: FITS, VOTable, Spectra, SQL query results
  • Custom data (more input needed)

Types of data[edit]

  • images (radio, optical, x-ray)
  • images (i.e.simulation)
  • spectra
  • source catalog
  • plot (i.e. spectrum)
  • diagram
  • flow chart
  • illustration
  • table (i.e. source catalog)
  • publication (textual components)

Metadata to be supported[edit]

  • Bibliographic metadata
    • title, author, abstract, subject heading, journal metadata
  • Structural metadata/elements
    • section/TOC, annotation, footnote, equation, caption, references
  • Other
    • provenance (input files, make files, plotting scripts, analysis code, simulation code, ...)
    • log files
    • curation (more input needed)
    • PIDs (ADS, IVOA)
    • IVOA standards (VOTable, UCD, UTYPE, Data models, data access protocol, ...)


Shared Authoring[edit]

Author tools are provided to enable shared and standardised authoring. Authors are supported in developing publication-ready papers.

  • Provision of text editor (emacs? TeX IDEs?)
  • Import LaTeX article and conversion to html (incl. figures, tables)
  • Templates for publication-ready papers (metadata attachments, links, figures, captions)
  • allow publication-ready figures from visualisation tools

MPDL project - draft[edit]

Summary[edit]

  • online publications linked to/from online published data sets
  • networking through standardisation
  • collaboration enabled
  • focusing on scientific practice (collaboration, publication), by re-using existing data centers and resource registries, existing standards, and adding scientific "workbench-environment"
    • no interruption of daily practice
    • faciliate publishing of data
  • online environment should support
    • collaborative authoring for publications in virtual organisations
    • explicit integration of data sets used for/in the final publication(s) by either uploading original data or linking to external data sets
    • annotation of resources with metadata and identifier according to IVOA standard
    • value-added services on known data types (search, mining, visualization, analysis)
    • interfaces to external archives/registries/catalogs via standard protocols
    • integration of client tools (needed and known in community)
    • long-term preservation of resources (publications, data, services)
    • registration of resources in IVOA standard registries

Background[edit]

  • Results of German Astrophysical Virtual Observatory (GAVO)
    • make results (data sets and services) of astronomical research easily available to community
    • faciliate standardised publication of results (PIDs, Virtual Observatory standards, long-term archiving)
    • focus on interoperability to enable networking (standards in use: IVOA)and automated discovery and re-use
    • make use of standards-aware client tools and services (for cross-matching, visualisation, combination, data mining etc.)
  • Re-use of data leads to more references and scientific improvements => proof of concept Millenium Run
    • community-based quality control: errors discovered by others have improved data quality
    • still, as no formal revisions were made, old/original data was lost
  • currently, no (or limited) possibilities to add original data to publication, only representations/shortened examples:
    • e.g. only image representations of multi-dimensional data
    • e.g. only representative samples of large collections (images, spectra, source catalogs)
    • e.g. only static data
  • enable the shift from large data centers/resource registries (based on IVOA, formal, machine-readable, homoegeneous) to scientific practice, i.e. collaboration and publications (informal, human-readbale, heterogeneous)

Needs[edit]

Preparatory and related work[edit]

  • GAVO
    • stable storage and curation of data products needed
    • stabe environment for deploying Virtual Observatory protocols and other value-added services
  • IVOA standard
  • AstroGrid (?)

Wider context/Re-use for others[edit]

  • Long-term storage of data sets used in a publication (cf. deposit mandate?)
  • Open access to all results of scientific research online (cf. Berlin declaration?)
  • Showcase for added value of implemented standards
    • mandated by some funding agencies
    • IVOA dataset identifier in use by ADS (main portal for astronomers)
  • Integration of standards, stable infrastructure and web2.0 technologies to facilitate dynamic and collaborative environments (cf. eSciDoc?)
  • Re-use for astronomy community within MPS
    • MPI Astronomie (Heidelberg)
    • MPI Astrophysik (Garching)
    • MPI extraterrestrische Physik (Garching)
    • MPI Gravitationsphysik (Golm)
    • MPI Kernphysik (Heidelberg)
    • MPI Physik (München)
    • MPI Radioastronomie (Bonn)
    • MPI Sonnensystemforschung (Katlenburg-Lindau)

Work description[edit]

Pilot phase[edit]

  • Set-up community platform for creation of shared projects, registration of users, assign privileges
    • Analysis of existing community-based platforms for linking community-environment to eSciDoc repository
    • Basic user management
      • Author access - authors of the project
      • Administrator access - project coordinator
      • Public access - public users
    • Linking
      • to external data sets/services (URL based)
      • to eSciDoc resources (Wiki extension to support special eSciDoc tag or URL based)
  • Basic integration of community platform with eSciDoc pilot solution
    • enable upload (or referencing) and description of data
    • enable invocation of external selected visualization tool from eSciDoc pilot solution (e.g. for FITS data)
    • integrate arXiv and ADS(if possible) as sources for fetching publications, pre-prints
  • Explore possibilities for federated search (within community platform, eSciDoc repository, 1-2 external services)

ToDo[edit]

Clarify:

  • which data to be supported in pilot phase
  • precise functional requirements for data management (Scenario level)
  • eSciDoc managed vs externally referenced data
  • formats (e.g. FITS) and how they should be supported (e.g. storage, search, visualization via external tools, etc.)
  • types of annotated sources and relating them to external services
  • available external client tools for demonstration (quick win)

Work distribution[edit]

Workpackages based on pilot approach

  • Wiki selection
  • Basic user management
  • Demonstrator solution
    • Architecture
      • main components (wiki, eSciDoc)
      • interaction between Wiki, eSciDoc repository, eSciDoc solutions
      • identification of existing services to be involved, evtl. modification
      • checking if existing eSciDoc solutions (PubMan) can be re-used and identify necessary modifications
    • Implementation

Required resources[edit]

  • new staff at institute(s)
  • new stafff at MPDL
  • overall costs (human resources, hardware, other)

Organisational[edit]

  • Institutes involved

check possibiltity of having a pilotphase with one/two institutes at start, to deliver quick and convincing results. After first showcase, other institutes can join.

  • Responsible for proposal
  • Required budget (total, annual)

Meetings[edit]

15th sept 2008[edit]

  • first brainstorming at MPDL/Munich

25th sept 2008[edit]

  • updated presentation by Gerard/Jaiwon/Wolfgang (see under SVN of MPDL (restricted)
  • outcome:
    • First draft of MPDL project proposal until 10th of oct (Ulla, Natasa), focusing on requirements and approach
    • First draft of eSciDoc HowTo for definition of high-level requirements