Trip Report: Elsevier Visit 13th September 2007
Participants: Curt E. Kohler, Anita de Waard, Barbara Kalumenos, Malte Dreyer, Ulla Tschida, Inga Overkamp
Introducing "Elsevier Labs" & their academic collaborations[edit]
Motto: "Scientists publish everywhere - and need to get back all relevant things whenever they need"
Strategy: Establish a web of entities to improve search, retrieval and interlinking of resources
Areas of interest:
- semantic entities, e.g. publications, structures, proteins, etc.
- author support, e.g. to help tagging the content semantically
- research data - become more important and need to be stored as well as related to other entities
- desktop tools, e.g. for search and retrieval
- virtual communities, e.g. for scientific collaboration
Exemplary Projects:
- BioImage (UK): to come for a metadata standard for images
- DOPE for Economics (University of Mannheim): - semantic entities. visualizing content concepts in economic
- Metadata Madness (neuroscience editors): authoring tool to semantically enrich neuroscience publications by bibliographic references, biological references, multimedia entities
- OKKAM (many partners): architecture to build a global web of entities
- Pragmatic Research Article (University of Utrecht). Final goal: To develop a structure to find out how the knowledge is derived from/represented in a publication
- Analyzing the storyboards of scientific papers. Finding: 3-5 episodes and then resolution.
- Analyzing rhetorical moves. Finding: Problems are described in "past" forms
Metadata in the SD publication process[edit]
In ScienceDirect each publication is represented by one main document (in sgml/xml – with few structured parts inside)
Following metadata are available to represent the article front matter:
- doi/pii
- keywords
- authors
- affilations
- pagination
- document type
- article title (fulltex,t abstract & pdf)
- dates
- issn
- article title
- unique issue key
- email addresses
In addition, SD calculates some additional metadata from the fulltext by applying pattern matching software. Entitites in the full text, which are (what was asked for), e.g.
- dois/piis, arxiv preprint references
- urls
- DNA sequence references, chemical structures, EMBL nucleotide sequences, OMIM (NCBI)
Afterwards, the copy editors are asked to confirm the suggestions and the document structure is modified to insert markup. Not all e-products leverage all of these entities.
Side tracks & Result[edit]
- Anita pointed us to: CATCH (Continuous Access to Cultural Heritage) a dutch funding program to develop ontologies to describe cultural heritage objects
- Get in contact again, when biomedical/neuroscience issues come up (e.g. protein identification, etc.)
- Metadata extraction is of big interest for MPS (especially for MPS)