Difference between revisions of "Living Sources in Lexical Description"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 2: Line 2:




== General ideas about the Living Sources concept ==
== The Living Sources concept ==


Motivation: Why do we need a Living Sources concept?
Summary: Living Sources is an infrastructure to publish scientific data.


'''current situation:'''
=== Motivation: Why do we need a Living Sources concept? ===


scientists do not disclose research data. They do not publish them because of:
'''current situation'''
*quality reasons (data collection is not finished, is not completed, etc.)
 
*fear of plagiarism  
In contrast to the common practice of publishing and discussing research results, currently most scientists do not disclose the underlying research data. They do not make them available to a wider audience because of various reasons:
*scientific recognition is not clear
*quality reasons (e.g. the data collection is not finished, it is not properly cross-checked, or the data is not complete)
*publication of methodology of collection
*fear of plagiarism (others might not properly acknowledge the data)
*loss of control over interpretation (others might misunderstand the data, with undeserved blame being cast on the original creator of the data)
*loss of primacy of discovery (others might come up with important discoveries that the original creator also observed, but did not have time to work out and publish)
*limited scientific recognition for making data available
*failure to see wider applicability of data ("Why would anybody be interested in this?")


'''Solution:'''
'''Solution:'''

Revision as of 09:33, 19 May 2008

This is a protected page.


The Living Sources concept[edit]

Summary: Living Sources is an infrastructure to publish scientific data.

Motivation: Why do we need a Living Sources concept?[edit]

current situation

In contrast to the common practice of publishing and discussing research results, currently most scientists do not disclose the underlying research data. They do not make them available to a wider audience because of various reasons:

  • quality reasons (e.g. the data collection is not finished, it is not properly cross-checked, or the data is not complete)
  • fear of plagiarism (others might not properly acknowledge the data)
  • loss of control over interpretation (others might misunderstand the data, with undeserved blame being cast on the original creator of the data)
  • loss of primacy of discovery (others might come up with important discoveries that the original creator also observed, but did not have time to work out and publish)
  • limited scientific recognition for making data available
  • failure to see wider applicability of data ("Why would anybody be interested in this?")

Solution:

  • securing of scientific recognition and citability
  • incremental publication possible

New possibilities:

  • comments on individual datapoints (discussion)
  • open peer review scheme

Strategy/How to proceed

  • Bottom up identification of a field that is in need of a concept like Living Sources (science driven!)

What qualifies the Living Sources idea?:

  • High level quality
  • Support from scientists
  • Editiorial board (technical checks, organisation of field)
  • Peer review (content check)
  • bonus: already various material available, but not (no start from scratch)

Two complementary scenarios:

  • Build-up of a technical infrastructure which enhances the usability of datasets (one stop shop, comparability, searchability, persistence, etc. Envisioned user group: scientists who look for a hosting environment)
  • Standards of interoperability of data portals/journals/archives with a common seach engine/browser-like tool (envisioned user group: scientists who want to keep a strong hold on their data)
  • Persistance of data is secured for data submitted to the system (grid-like backup)

Living Sources in Lexical Description[edit]

First implementation of the Living Sources concept

Scientific scope[edit]

  • Lexical data, view on language description and analysis
  • Linguistics (Psycholinguistics, Ethnolinguistics, Lexicography, Terminology, Dialectology, Computational Linguistics)

Infrastructure[edit]

Technical issues:

  • Formats (TMF, LMF, TEI/dic.)
  • Technical infrastructure: Lexus (MPI for Psycholinguistics, Nijmegen), eSciDoc
  • Unique identification of data objects
  • Direct reusability of data (local databases, linking of databases)
  • Formats for commentaries
  • Formats for orthography profiles
  • Citation structure (receipts, recipy, granularity)

Means[edit]

Needed man-power: Lexical Curator


Functional specification/Requirements[edit]

Submission:

Required information, seen as a preface:

  • scientific background/research field
  • editorial background/rational of the data
  • selection criteria: e.g. sampling, fields, etc.
  • data category/use of data: e.g. ODD specification, schema, specification of orthography, terminology specification etc.
  • links to other databases/sources

Required informtion about the data itself:

  • upload vs. URL
  • upload on Lexus
  • fulltext/XML
  • webservice

Concept of an open submission and peer review[edit]

step 1: Technical check (by editors)

  • (possibly closed) submission to editors
  • editorial check on technical issues (data structure, terminology, preface, etc.)
  • possible retraction for scientific check
  • data remain submitted (possibly with restriced access)
  • these steps can be iterated (each iteration should be time-restriced)

step 2: Content check (by peers)

  • open peer-review submission (time-restriced)
  • critical assessment about submission as a whole (i.e. commentary on preface, not on individual entries) decide on acceptance. Should be seen separate from commentary on individual entries of the data.
  • individual errors/shortcomings can and should be corrected, but should not ban possible publication.
  • result: publicated database meaning "the principle of collecting and organising data is good, though there might be discussion about individual items"
  • different publication status: e.g. "wordlist", "wordform collection (including frequencies, collocations, etc)", "wortfeld", "language-particular dictionary", "comparative dictionary"

Once a submission has passed the technical step (which is actually already a large hurdle for many traditional lexicographers), a submission is technically published. We would like to encourage people to publish smaller amounts of data, but such smaller datasets of course should be distinguished from large publications (for example complete dictionaries). To allow for different kinds of publications, some kind of stratification is needed. This stratification of publication will happen through the (open) peer review system.

The two basic modes of publications are "Wordlist" (for onomasiological submissions) and "Wordform collection (for semasiological submissions). These 'stamps' are given after the technical check, and it is thus actually not very rewarding to have just one of these labels (cf. a lower-rate journal). To get into one of the more rewarding categories,


step 3: living commentary and growth of data

  • addition of more data, corrections, versions
  • discussion about individual items (not time-restricted)

Open issue:[edit]

  • check if Living Reviews infrastructure for the peer review process can be re-used
  • need of a sampling strategy on the data
  1. sample of full entries
  2. full overview of specific fields (e.g. all parts of speech, all etymological fields)

Rights[edit]

  • Open Access
  • Creative Commons Licence for data and metadata (by default: attribution)
  • No copyright transfer
  • agreement with authors that Living Sources in Lexical Description has the rights (to store) and distribute the data under the Creative Commons Licence


Miscellaneous[edit]

  • possibility of third party commentaries by any registered user


Support[edit]

  • Potential scientific support from MPI for Psycholinguistics, Nijmegen, MPI for Evolutionary Anthropology, Leipzig and other Max Planck Instituts
  • Potential financial support: ESF call BABEL, Volkswagenstiftung, Heinz-Nixdorf-Stiftung

Other[edit]

  • applied for domains livingsources.org, livingsources.com, livingsources.eu (request processed by AEI Potsdam)