Difference between revisions of "AWOB"
Line 16: | Line 16: | ||
*definition of shared project and objectives | *definition of shared project and objectives | ||
*definition of required experiments | *definition of required experiments | ||
*distribution responsibilities | *distribution of responsibilities | ||
*tracking of activities and results: set up | *tracking of activities and results: set up experiments, run experiments, produce data, postprocess data, analyse data, extract scientific results | ||
*share data, combine results | *share data, combine results | ||
*produce publication-ready paper (shared authoring) | *produce publication-ready paper (shared authoring) |
Revision as of 14:24, 17 October 2008
This is a protected page.
Preparation and planning for shared MPDL project "Scholarly Workbench for Astronomy"
Based on experiences and outcomes of German Virtual Observatory (GAVO)
Contacts@MPE: Jaiwon Kim, Gerard Lemson, Wolfgang Voges
Scenarios[edit]
Collaborative environment[edit]
Enable easy, wiki-like setup of collaborative environment for shared projects. Allow registered users to access the project, the related pages and linked and/or uploaded data. Link the collaborative platform with eSciDoc repository to allow long-term archiving and PIDs for content stored.
Example for shared project workflow (see details in slides (restricted):
- definition of shared project and objectives
- definition of required experiments
- distribution of responsibilities
- tracking of activities and results: set up experiments, run experiments, produce data, postprocess data, analyse data, extract scientific results
- share data, combine results
- produce publication-ready paper (shared authoring)
Components of the collaborative work[edit]
- Collaborative work is "publication-in-progress" developed in a Wiki environment as output of the research process.
- Collaborative work comprises
- textual components - mostly metadata such as: abstract, title, authors, subject headings, structural information such as:body sections, annotations of sources etc.
- non-textual components - images, data tables, plots, diagrams, catalogs etc. - derived from and related with the research data
- integrated external tools - visualization of sources, querying of remote archives etc.
Textual components[edit]
Textual components enable to:
- follow references to preprints, published papers etc.(ADS, arXiv)
- lookup annotated sources i.e. astronomy objects in databases such as Simbad, NED
- describe the collaborative work with metadata such as: authors, abstract, subject headings
- describe the structure of collaborative work
Non-textual components[edit]
Non-textual components enable to:
- visualy represent research data. This visual representation may have own metadata (e.g. image metadata)
- show metadata for an e.g. image
- invoke external data collection viewer
- download data related to the component
- open external tool for visualizing and working with the data
- for tabular data: e.g. TOPCAT
- for image data: e.g. Aladin
- for spectral data: e.g. SpecView, Splat, VOSpec
- PLASTIC enabled
Integrated external tools[edit]
Integrated external tools enable to link from either textual or non-textual components to existing external astronomical services or tools such as:
- astronomical services - directly linked such as ADS, arXiv, NED, Simbad, VizieR, SkyServer or to enable discovery in the registries of astronomical sources
- common analysis environments (IDL, ...)
- services for retrieval of image data ( Simple Image Access specification SIA )
- services for retrieval of spectra (Simple Spectral Access Protocol SSA)
- services for retrieval records from catalogs (Simple Cone Search SCS)
- simulation database SimDB (Simulation Data model and Simulation data Access Protocol SimDAP)
- invoke queries on external services that support Query languages such as ADQL/TAP
Sharing of content[edit]
Enable for privileged users to upload, and /or link and describe data with metadata, comments and notes.
- Standardised data: FITS, VOTable, Spectra, SQL query results
- Custom data (more input needed)
Types of data[edit]
- images (radio, optical, x-ray)
- images (i.e.simulation)
- spectra
- source catalog
- plot (i.e. spectrum)
- diagram
- flow chart
- illustration
- table (i.e. source catalog)
- publication (textual components)
Metadata to be supported[edit]
- Bibliographic metadata
- title, author, abstract, subject heading, journal metadata
- Structural metadata/elements
- section/TOC, annotation, footnote, equation, caption, references
- Other
- provenance (input files, make files, plotting scripts, analysis code, simulation code, ...)
- log files
- curation (more input needed)
- PIDs (ADS, IVOA)
- IVOA standards (VOTable, UCD, UTYPE, Data models, data access protocol, ...)
[edit]
Author tools are provided to enable shared and standardised authoring. Authors are supported in developing publication-ready papers.
- Provision of text editor (emacs? TeX IDEs?)
- Import LaTeX article and conversion to html (incl. figures, tables)
- Templates for publication-ready papers (metadata attachments, links, figures, captions)
- allow publication-ready figures from visualisation tools
MPDL project - draft[edit]
Summary[edit]
- online publications linked to/from online published data sets
- networking through standardisation
- collaboration enabled
- focusing on scientific practice (collaboration, publication), by re-using existing data centers and resource registries, existing standards, and adding scientific "workbench-environment"
- no interruption of daily practice
- faciliate publishing of data
- online environment should support
- collaborative authoring for publications in virtual organisations
- explicit integration of data sets used for/in the final publication(s) by either uploading original data or linking to external data sets
- annotation of resources with metadata and identifier according to IVOA standard
- value-added services on known data types (search, mining, visualization, analysis)
- interfaces to external archives/registries/catalogs via standard protocols
- integration of client tools (needed and known in community)
- long-term preservation of resources (publications, data, services)
- registration of resources in IVOA standard registries
Background[edit]
- Results of German Astrophysical Virtual Observatory (GAVO)
- make results (data sets and services) of astronomical research easily available to community
- faciliate standardised publication of results (PIDs, Virtual Observatory standards, long-term archiving)
- focus on interoperability to enable networking (standards in use: IVOA)and automated discovery and re-use
- make use of standards-aware client tools and services (for cross-matching, visualisation, combination, data mining etc.)
- Re-use of data leads to more references and scientific improvements => proof of concept Millenium Run
- community-based quality control: errors discovered by others have improved data quality
- still, as no formal revisions were made, old/original data was lost
- currently, no (or limited) possibilities to add original data to publication, only representations/shortened examples:
- e.g. only image representations of multi-dimensional data
- e.g. only representative samples of large collections (images, spectra, source catalogs)
- e.g. only static data
- enable the shift from large data centers/resource registries (based on IVOA, formal, machine-readable, homoegeneous) to scientific practice, i.e. collaboration and publications (informal, human-readbale, heterogeneous)
Needs[edit]
[edit]
- GAVO
- stable storage and curation of data products needed
- stabe environment for deploying Virtual Observatory protocols and other value-added services
- IVOA standard
- AstroGrid (?)
Wider context/Re-use for others[edit]
- Long-term storage of data sets used in a publication (cf. deposit mandate?)
- Open access to all results of scientific research online (cf. Berlin declaration?)
- Showcase for added value of implemented standards
- mandated by some funding agencies
- IVOA dataset identifier in use by ADS (main portal for astronomers)
- Integration of standards, stable infrastructure and web2.0 technologies to facilitate dynamic and collaborative environments (cf. eSciDoc?)
- Re-use for astronomy community within MPS
- MPI Astronomie (Heidelberg)
- MPI Astrophysik (Garching)
- MPI extraterrestrische Physik (Garching)
- MPI Gravitationsphysik (Golm)
- MPI Kernphysik (Heidelberg)
- MPI Physik (München)
- MPI Radioastronomie (Bonn)
- MPI Sonnensystemforschung (Katlenburg-Lindau)
Work description[edit]
Pilot phase[edit]
- Set-up community platform for creation of shared projects, registration of users, assign privileges
- Analysis of existing community-based platforms for linking community-environment to eSciDoc repository
- Xwiki
- Wiki2Fedora used for MatDL/NSDL
- other wiki software to be considered?
- Basic user management
- Author access - authors of the project
- Administrator access - project coordinator
- Public access - public users
- Linking
- to external data sets/services (URL based)
- to eSciDoc resources (Wiki extension to support special eSciDoc tag or URL based)
- Analysis of existing community-based platforms for linking community-environment to eSciDoc repository
- Basic integration of community platform with eSciDoc pilot solution
- enable upload (or referencing) and description of data
- enable invocation of external selected visualization tool from eSciDoc pilot solution (e.g. for FITS data)
- integrate arXiv and ADS(if possible) as sources for fetching publications, pre-prints
- Explore possibilities for federated search (within community platform, eSciDoc repository, 1-2 external services)
ToDo[edit]
Clarify:
- which data to be supported in pilot phase
- precise functional requirements for data management (Scenario level)
- eSciDoc managed vs externally referenced data
- formats (e.g. FITS) and how they should be supported (e.g. storage, search, visualization via external tools, etc.)
- types of annotated sources and relating them to external services
- available external client tools for demonstration (quick win)
Work distribution[edit]
Workpackages based on pilot approach
- Wiki selection
- Basic user management
- Demonstrator solution
- Architecture
- main components (wiki, eSciDoc)
- interaction between Wiki, eSciDoc repository, eSciDoc solutions
- identification of existing services to be involved, evtl. modification
- checking if existing eSciDoc solutions (PubMan) can be re-used and identify necessary modifications
- Implementation
- Architecture
Required resources[edit]
- new staff at institute(s)
- new stafff at MPDL
- overall costs (human resources, hardware, other)
Organisational[edit]
- Institutes involved
check possibiltity of having a pilotphase with one/two institutes at start, to deliver quick and convincing results. After first showcase, other institutes can join.
- Responsible for proposal
- Required budget (total, annual)
Meetings[edit]
15th sept 2008[edit]
- first brainstorming at MPDL/Munich
25th sept 2008[edit]
- updated presentation by Gerard/Jaiwon/Wolfgang (see under SVN of MPDL (restricted)
- outcome:
- First draft of MPDL project proposal until 10th of oct (Ulla, Natasa), focusing on requirements and approach
- First draft of eSciDoc HowTo for definition of high-level requirements