Talk:MPDL Demonstrator Marine Microbiology

MPDL

Initial discussions

 * PubMan for publications about genome, probes etc.
 * Additional metadata in pubman on location, environment, sequences, probes etc.
 * Project time: 2-4 weeks
 * possible integration with web searches via locators? e.g.

http://www.megx.net/gms/geographic-blast/job/54eae3fc5d43cc565a00745a6d131b4f/0.291878522839397/result.html#zoom=0&lat=0&lon=0&layers=BFF0FFFTFFF http://beta.arb-silva.de/browser/lsu/FJ805885
 * Input from Meeting at MPI (Malte, Michael)
 * creating a collection of different publications (which might or might not be visibile to others) and then allowing for certain typed-tagging. Some of this tagging could be done by algorithms from the pdf (like perhaps for the geolocation), probably most of it would have to be entered manually by a certain role.
 * Alternatives
 * It could be a good idea to offer something, which could be customized easily for other tags by the other institutes, so we can show the potential.
 * CoNE integration - something similar to Imeji profile editor
 * a new Imeji instance, which can just deal with remote-content (only metadata managed) and allows for the very same functionality like now for publications (album feature - maybe at some point also possible and needed)
 * integrate local systems by "knowing" about their url components to allow for automatic direct linking of geolocations or genomes found into external systems.
 * special area in item-display in pubman, where we, once we have some specific metadata, display some information from external systems ad-hoc to give more information about the item in pubman.

Demonstrator proposal
(Michael, Natasa) With this kind of demonstrator we enter a new era of "publications linked to entities around which they are published": we would start with smaller extension of PubMan and try to link to the original entities which are usually located at the institute or in external databases. These entities may be RNA sequences, Astronomical sources, particular other subjects of research etc. Note: the word Entity is used as there could be different types of things about which publications talk, these are certainly entities for themselves. One may think of better naming - not critical for now. -- I would better call that resource or, more specifically, primary data. MFranke 14:00, 18 November 2010 (UTC)

As these entities are usually maintained externally, we would not maintain their data in CoNE, unless explicitly required and maintained by the institute/users. In this case no modifications to the underlying proposals are needed, as CoNE would be treated same as any other entity storage service.


 * Publications will be stored in publication repository (PubMan)
 * Publication metadata will be extended with an additional property - primary data reference.
 * Primary data references consist of a general set of metadata and optionally, specific set of metadata (depending on Entity type and user requirements).
 * General set of metadata: title, identifier, classification (subject), project, description, creator (standard person/organization), type of entity (see below CoNE integration) -- IMO creator is too specific to make it into the general set MFranke 14:00, 18 November 2010 (UTC)
 * Specific set of metadata are entity-type dependent and is optional. This would have to be checked with the institute directly. Note: there is a strong recommendation to RDFize the latter, to enable linked data. -- Who is recommending that? MFranke 14:00, 18 November 2010 (UTC)
 * Natasa recommends it for now.. in overall there are very interesting tools and possibilities to make queries and also various visualizations if data are simply RDFized. --Natasa 12:42, 26 November 2010 (UTC)

CoNE integration

 * a vocabulary "Entity types" will be created in CoNE. This vocabulary holds entity types that can be related to a publication.
 * Vocabulary metadata:
 * Entity type name
 * Description
 * Related subject classification
 * DaaS service for fetching of entity metadata (pattern URLs)
 * additional "viewer" services (one or more depending on the entity type, pattern URLs)
 * etc. will be detailed in the implementation
 * Subject classification for the Entities can be imported as additional subject classification vocabulary and maintained on regular basis - from the institute that uses it

PubMan extensions

 * Submission
 * additional entry mask for related entities (will be invoked similar like Local Tags from Publication item)
 * user may relate one or more entities of different type to a publication (potentially context-level-setting on allowed entity types could be applied - via the context admin descriptor)
 * when user selects an entity type and enters the entity identifier (or URL) - PubMan will fetch the metadata for this entity type based on the settings in CoNE (involved services: DaaS, Transformation) and populate the entry form. Note: prerequisite is that there are online services from where data on related entities can be fetched.
 * View Item
 * Related entities can be displayed in separate tab (preliminary proposal, finally to be checked with GUI)
 * For each of the entities additional icons can be served to invoke external services (as many as they are) - see CoNE integration - definitions would be provided in CoNE


 * Search
 * stylesheet has to be extended to index additional metadata records together with the publications (easy)
 * browse-by specific subject classification in use for publications would also be possible (already possible now)
 * as of next core service release, there would be a possibility to define arbitrary search indexes, therefore we will have the possibility to have a separate search index for entities only
 * the last one would enable in future browse/search by entity types/entities

ToDos

 * Check what are the services institute may offer (exports of metadata, export of subject classifications etc.)
 * Agree on the proposal

Demonstrator proposal decision

 * additional set of ideas is developed for the demonstrator proposal
 * see Annotator service

Important things

 * Chemistry - analysis, analytical chemistry, nucleic acid analyses, genome analysis and metagenomics, field work, mathematical simulations, experimental visualizations
 * Publications
 * Expeditions
 * Projects (experimental - field research and probes)
 * Locations
 * Facilities and methods
 * RNA sequences for all three domains of life (Bacteria, Archaea and Eukarya)
 * FISH and probes - online resource for information on the identification of individual microbial cells by fluorescence in situ hybridization (FISH) with ribosomal RNA-targeted oligonucleotide probes
 * Protocols of the SILVA ribosomal database project (e.g. http://www.arb-silva.de/fileadmin/graphics_fish/SILVA_FISH_protocols_fixation_100618.pdf)
 * Marine ecological genomics, integrating genomic, metagenomic and ribosomal RNA data with primary environmental data and curated metadata
 * SILVA - on-line resource for quality checked and aligned ribosomal RNA sequence data, free for academic use
 * contextual data (data about data), see also http://www.arb-silva.de/projects/contextual-data/
 * contextual data, see also http://gensc.org/gc_wiki/index.php/MIENS
 * What is the difference between these two contextual data? --Kristina 11:14, 25 January 2011 (UTC)
 * Blogs from the expeditions and probes see http://wissenschafts-blog.abendblatt.de

--> Actually I do not understand the purpose of this list (important things) here. Some things are only terms, others are linked to some data. I think it would be usefull to explain the things listed here more detailed, because at the moment, I do not understand this part. --Kristina 11:14, 25 January 2011 (UTC)

MFranke 14:08, 3 February 2011 (UTC) -- I would suggest to involve UIE/SvM to define the workflow and the graphical interfaces.

MFranke 14:08, 3 February 2011 (UTC) -- This would require CitMan to query the Annotator store once for each item. I fear this might affect its performance. Maybe we can find another way to feed web pages with that information.
 * this might be an issue for sure. The question is if Citman needs to query the Annotator store, or we need to provide proper export that queries the annotator store upfront, and feeds the citman already with necessary data. We shall think of technicalities once we start the implementation. In ideal situation, we shall have single XML/RDF to serve to CitMan --Natasa 12:47, 9 February 2011 (UTC)