Trip Report: Elsevier Visit 13th September 2007

Participants: Curt E. Kohler, Anita de Waard, Barbara Kalumenos, Malte Dreyer, Ulla Tschida, Inga Overkamp

Summary: "Elsevier Labs" (formerly known as "Advanced Technology Group") is a group of the publisher Elsevier Elsevier Science Ltd. The group is responsible for development for new products as well as evaluating new technologies. They do have interest in next-generation architectures for scholarly applications. Their visit to MPDL was driven by the interest to learn more about eSciDoc and explore potential areas of cooperations.

Result: Although we found a range of issues where both institutions are interested in, no concrete project was envisioned. Anyway, we might keep in contact for further information exchange.

Introducing "Elsevier Labs" & their academic collaborations[edit]

Motto: "Scientists publish everywhere - and need to get back all relevant things whenever they need"

Strategy: Establish a web of entities to improve search, retrieval and interlinking of resources

Areas of interest:

semantic entities, e.g. publications, structures, proteins, etc.
author support, e.g. to help tagging the content semantically
research data - become more important and need to be stored as well as related to other entities
desktop tools, e.g. for search and retrieval
virtual communities, e.g. for scientific collaboration

Exemplary Projects:

BioImage (UK): to come for a metadata standard for images
DOPE for Economics (University of Mannheim): - semantic entities. visualizing content concepts in economic
Metadata Madness (neuroscience editors): authoring tool to semantically enrich neuroscience publications by bibliographic references, biological references, multimedia entities
OKKAM (many partners): architecture to build a global web of entities
Pragmatic Research Article (University of Utrecht). Final goal: To develop a structure to find out how the knowledge is derived from/represented in a publication
- Analyzing the storyboards of scientific papers. Finding: 3-5 episodes and then resolution.
- Analyzing rhetorical moves. Finding: Problems are described in "past" forms

Metadata in the SD publication process[edit]

In ScienceDirect each publication is represented by one main document (in sgml/xml – with few structured parts inside)

Following metadata are available to represent the article front matter:

doi/pii
keywords
authors
affilations
pagination
document type
article title (fulltex,t abstract & pdf)
dates
issn
article title
unique issue key
email addresses

In addition, SD calculates some additional metadata from the fulltext by applying pattern matching software. Entitites in the full text, which are (what was asked for), e.g.

dois/piis, arxiv preprint references
urls
DNA sequence references, chemical structures, EMBL nucleotide sequences, OMIM (NCBI)

Afterwards, the copy editors are asked to confirm the suggestions and the document structure is modified to insert markup. Not all e-products leverage all of these entities.

Elsevier and Long Term Archiving[edit]

The long term strategy of Elsevier LTD comprises of

Dark archives, e.g. at the Koninklijke Bibliotheek in the Netherlands. Dark archives are not available for public access, but can be opened to all Elsevier customers in case Elsevier runs bankrupt
"de facto" archives, that means installations of the SD software run in library consortias (e.g. Hebis)
participation in (C)LOCKSS and Portico

Side tracks & Result[edit]

Anita pointed us to: CATCH (Continuous Access to Cultural Heritage) a dutch funding program to develop ontologies to describe cultural heritage objects
Get in contact again, when biomedical/neuroscience issues come up (e.g. protein identification, etc.)
Metadata extraction is of big interest for MPS

Trip Report: Elsevier Visit 13th September 2007

Contents

Introducing "Elsevier Labs" & their academic collaborations[edit]

Metadata in the SD publication process[edit]

Elsevier and Long Term Archiving[edit]

Side tracks & Result[edit]

Navigation menu

Search