Difference between revisions of "Talk:Service for Control of Named Entities"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 227: Line 227:


: Series titles may be handled via the journal authority file as well --[[User:Inga|Inga]] 12:19, 28 December 2007 (CET)
: Series titles may be handled via the journal authority file as well --[[User:Inga|Inga]] 12:19, 28 December 2007 (CET)
== Potential external sources ==
The tables give an overview of potential sources of controlled named entities which are of interest. The information given in the tables reflects the current situation and has to be updated from time to time. The tables are in stage "work in progress" and other sources might be added.
=== Person ===
Person(s), i.e. full name of persons (authors, editors, referees, etc.)
{|style="font-size=50%" border="1"
! Name of service                               
! Scope
! Info
! Formats supported
! Interfaces
! Costs
! Access
|-
| Library of Congress Name Authority Service 
| To be evaluated in detail
(likely not to cover too many MPG authors)
| [http://www.oclc.org/research/researchworks/authority/default.htm Introduction]
[http://alcme.oclc.org/eprintsUK/services/NACOMatch?wsdl WSDL]
http://authorities.loc.gov
|MARCXML
| SOAP
WSDL
|Records are free of charge<ref name="loc">Information from web site: "users do not have to register or request permission to search, save, print, or email the LC authority records. The only limitation is that authority records may only be saved, printed or emailed one at a time."</ref>
|via [http://authorities.loc.gov web site]
|-
|Personennormdatei (PND)
|ca. 2,6 mio names (1 mio with individualized records)
To be evaluated in detail (likely not to cover too many MPG authors)
| [http://www.ddb.de/standardisierung/normdateien/pnd.htm Introduction]
|MAB2
USMARC
SUTRS
|Z39.50
|PND, GKD and SWD only in combination available
[http://www.ddb.de/service/pdf/normdaten_cd.pdf costs]
|CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July <br/> PND is  licensed in MPS, database is available via the Aleph server<ref>https://dev.livingreviews.org/projects/vlib/wiki/AuthFiles</ref>
|-
|Virtual International Authority File (VIAF)
|First prototype covers LC and DNB personal name authority and related bibliographic records
|[http://www.oclc.org/research/projects/viaf project web site]
|MARC21 (?)
|
|
|Prototype system available at:
http://viaf.org
|-
|Computer Science Bibliography (DBLP)
|Computer Science
|http://dblp.uni-trier.de
|HTML, XML
|
|
|
|-
|Wikipedia Persondata
|
|[http://en.wikipedia.org/wiki/Wikipedia:Persondata info] [http://download.wikimedia.org/ data dump]
|HTML
|
|
|
|}
=== Corporate body ===
{|style="font-size=50%" border="1"
! Name of service                               
! Scope
! Info
! Formats supported
! Interfaces
! Costs
! Access
|-
|Körperschaftsnormdatei (GKD)
|More than 1 mio records (german&foreign corporate bodies and conferences)
|[http://www.ddb.de/standardisierung/normdateien/gkd.htm Introduction]
|MAB2
|Z39.50
|see PND
|see PND
|}
=== Journal ===
{|style="font-size=50%" border="1"
! Name of service                               
! Scope
! Info
! Formats supported
! Interfaces
! Costs
! Access
|-
|Zeitschriftendatenbank (ZDB)
|ca. 1,3 mio records
|[http://www.zeitschriftendatenbank.de/datendienste/index.html Introduction]
|MAB2, UNIMARC, SUTRS
|Z39.50
|
|It has to be clarified with the GWDG if a tailored version of the ZDB (only listing MPG licensed journals) is available.
|-
|ISSN Register
|1.284.413 records (2006)
|http://www.issn.org
|MARC21, UNIMARC
|Z39.50
|[http://www.issn.org/files/active/0/Order%20form%202008%20and%20Licence%20Agreement.pdf costs]
|Access via the ISSN portal or Z39.50 or via a combined web access Z39.50 and ISSN portal
|}
=== Rights ===
{|style="font-size=50%" border="1"
! Name of service                               
! Scope
! Info
! Formats supported
! Interfaces
! Costs
! Access
|-
|SHERPA/RoMEO
Publishers copyright policies&self-archiving
|340 publishers (July 2007)
|http://www.sherpa.ac.uk/romeo.php
|XML
|[http://www.sherpa.ac.uk/romeo/api.html Prototype API]
|[http://www.sherpa.ac.uk/romeoreuse.html Conditions of re-use]
|[http://www.sherpa.ac.uk/romeo/api.html Prototype API]
|-
|Directory of Open Access Journals (DOAJ)
|2.987 journals, 164.284 articles (5th of December 2007)
|http://www.doaj.org
|XML
|OAI-PMH
|[http://www.doaj.org/doaj?func=loadTempl&templ=faq#restrictions Conditions of re-use]
|[http://www.doaj.org/doaj?func=loadTempl&templ=faq#metadata OAI-PMH]
|}

Revision as of 08:50, 16 April 2008

This is a protected page.

Information on this page is in stage „work in progress“ or needs to be discussed.

Prototype service for controlled named entities - journal names[edit]

(work in progress!)

Process[edit]

  1. select an authority file (corporate bodies, journals, authors) and available external source
    Done: decision for journal names, use (enriched) edoc data
  2. create (import) data locally into an authority file from a selected source
  3. check for possible providers for look-up services for <rights> and <subject> (DOAJ? romeo? EZB?, Ulrichs?).
    Discussion: see below
  4. implement the referencing from the PubMan edit interface (enable automatic grow of the authority file for start when reference is not done)
  5. create very simple viewer/editor for the authority file data
  6. get feedback from potential pilot users
  7. modify/add functionalities based on the functional and technical feedback
  8. extend the prototype with another authority file and repeat the steps 2-5

Descriptive Metadata[edit]

For the selection of the descriptive metadata the main focus has been set on the minimum level of information that is needed to disambiguate entities. The list of descriptive metadata elements is extendable by new elements.

Metadata elements:

  • Journal title [1]

The name of the journal (e.g. "Journal of the ACM")

  • Alternative title [0-n]

Any alternative name or abbreviation of the journal

Remark Inga: Tagging of abbreviations as such? Indicating the origin of abbreviation if known?
  • Publisher [0-n?]

The name of the institution that publishes the journal

  • Identifier [0-n]

Any external identifier (e.g. ISSN, EZB-ID, ZDB-ID)

Schema has to be indicated

  • Locator [0-1?]

Locator of the authority file source

Question Inga: Do we mean an URL pointing to the record?
  • Rights [0-n]

Statement on open access availability

Discussion: see below

  • Subject [0-n]

Subject/domain field of the journal

Possible Relations:

  • isSuccessorOf
  • isPredecessorOf

Rights statement for journals[edit]

Update on <rights>: as there is no requirement from Christoph/Anja for rights statements on journal level, we can choose whatever provider/Whatever information. I would opt for DOAJ, as it gives at least clear indication, which journals are OA, although no information on "Green" road publishers. disadvantage romeo/sherpa: indicates on publisher level, but not on journal level.--Ulla 16:16, 11 January 2008 (CET)

Requirement: The information collected under PubMan OA Statistics provide no clear picture for what kind of request the right statements are required. Is the goal either to receive the information if specific articles are open access or if the journal supports oa-publishing (for all articles? for some articles? via author-pays model?). We probably should avoid to include rights information until we have a clearer picture. --Inga 17:38, 18 January 2008 (CET)

Values: How do we populate i.e. what value has the rights metadata in the journal if the journal is OA (in accordance with DOAJ? (there are statements like: http://www.doaj.org/doaj?func=loadTempl&templ=faq#definition)--Natasa 15:41, 18 January 2008 (CET)

An overview on OA levels is provided in the wikipedia article on Open access journals:

Level Explanation Value Source
1 Journals entirely open access gold DOAJ
2 Journals with research articles open access ?? no source
3 Journals with some research articles open access ?? no source
4 Journals with some articles open access and the other delayed access ?? no source
5 Journals with delayed open access ?? no source
6 Journals permitting self-archiving of articles. green sherpa/RoMEO

Some thoughts on DOAJ

  1. DOAJ is a directory of open access scientific and scholarly journals. Each month new journals are added and existing journals are deleted from the repository. Therefore, rights information from DOAJ need to be updated regularly. Note: The oai-pmh repository does not maintain information about deletions.
  2. By definition, DOAJ does not list journals which use embargo periods (e.g. many Highwire journals) or which only provide parts of their content under oa condition (e.g. some BMC journals or backfiles with costs?).
  3. Therefore: DOAJ can be used to check if an journal is "on the golden road to OA". According to the DOAJ definition, this information could be escalated to all articles published in the journal. To avoid continuous updates, the information may rather be fetched dynamically than physically stored in pubman. If no information is available, this does not necessarily mean that the journal does not provide OA articles.

Background information[edit]

RDF schema: http://schemas.library.nhs.uk/ApplicationProfile/Journal.rdf

This looks quite comprehensive and we just need a small subset . After 10 minutes analyzing the schema, I'm not sure how the identifiers are further encoded (ISSNURL?). My vote: too complex, reduce it to minimum? --Inga 16:47, 29 November 2007 (CET)

NLM DTD: http://dtd.nlm.nih.gov/publishing/tag-library/2.3/n-z4u0.html

Other candidates[edit]

Potential other candidates for normalized metadata entries have to be discussed further and maybe to be defined with pilots.

Person[edit]

Potential metadata elements[edit]

Complete name The complete name of a person, usually a concatenation of given names and family name

Remark Inga: I would assume that the given name can be automatically be generated from given name and first name. Therefore I wouldn't consider this as additional element --Inga 12:19, 28 December 2007 (CET)

Given name A given name of a person

Family name The family name of a person

Alternative name Any alternative name used for the person

Title The title or peerage of a person in one string

Pseudonym The pen or stage name of a person

Remark Sabine: Can pseudonym also be covered by alternative name?
Remark Ulla: Let's assume: yes
Remark Inga: But in this case we should again consider a typization of alternative names --Inga 12:19, 28 December 2007 (CET)

Affiliation The institution the person was affiliated to when creating the item

Remark Inga: The above information is only available in the context of one publication. Therefore, I would suggest to point to the affiliation, the person is currently working for. In addition, we might list all former (known) affiliations as well. --Inga 12:23, 28 December 2007 (CET)

Identifier Identifier in the Personennormdatei, provided by the Deutsche Nationalbibliothek

Remark Sabine: IMO other identifier should be allowed as well (e.g. Identifier of Library of Congress Name Authority File)
Remark Ulla: Can be modified if needed
Remark Inga: Schema has to be indicated --Inga 12:19, 28 December 2007 (CET)

Email Email address of the person (e.g. will allow users to send an email to the author asking for the fulltext in case it is not available)

Remark Sabine: I am not sure whether the email address is an important information for all persons or for registered PubMan users only. The handling of this "private data" has also be clarified.
Remark Traugott: Problem of regularly updating the email address
Remark Ulla: IMO updating controlled vocab. is always a challenge, not only for emails...?
Remark Inga: I agree with Sabine, email address is of special importance for eSciDoc users only. If we don't find a reliable source for importing this information regularly (PND?), we shouldn't try to maintain this information ourself --Inga 12:19, 28 December 2007 (CET)

Homepage The location of a personal homepage (e.g. in case fulltext is available via personal homepage)

Remark Sabine: same as for Email

Date of Birth

Place of Birth

Date of Death

Place of Death

Resources[edit]

MPG Units[edit]

Potential metadata elements[edit]

Complete name In Englisch and/or German?

Alternative name

Place

Address

Homepage

Resources[edit]

Conferences/events[edit]

Potential metadata elements[edit]

Title The name of the event (e.g. Symposium on Theory of Computing)

Alternative title Any alternative name of the event

Abbreviation Abbreviated name of the event (e.g. STOC)

Start date Start date of the event

End date End date of the event

Place Place where the event took place

Invitation status The information if the creator was explicitly invited

Remark Sabine: Should this information be stored in controlled metadata record?
Remark Ulla: No, not to my understanding
To my understanding, the invitation status can only be specified for each talk individually and is therefore no generic metadata for the conference --Inga 12:19, 28 December 2007 (CET)

Keywords, classifications, thesauri[edit]

→ see cpt_pubman_classifications

Title of Source (e.g. Series titles)[edit]

Series titles may be handled via the journal authority file as well --Inga 12:19, 28 December 2007 (CET)

Potential external sources[edit]

The tables give an overview of potential sources of controlled named entities which are of interest. The information given in the tables reflects the current situation and has to be updated from time to time. The tables are in stage "work in progress" and other sources might be added.

Person[edit]

Person(s), i.e. full name of persons (authors, editors, referees, etc.)

Name of service Scope Info Formats supported Interfaces Costs Access
Library of Congress Name Authority Service To be evaluated in detail

(likely not to cover too many MPG authors)

Introduction

WSDL http://authorities.loc.gov

MARCXML SOAP

WSDL

Records are free of charge[1] via web site
Personennormdatei (PND) ca. 2,6 mio names (1 mio with individualized records)

To be evaluated in detail (likely not to cover too many MPG authors)

Introduction MAB2

USMARC SUTRS

Z39.50 PND, GKD and SWD only in combination available

costs

CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July
PND is licensed in MPS, database is available via the Aleph server[2]
Virtual International Authority File (VIAF) First prototype covers LC and DNB personal name authority and related bibliographic records project web site MARC21 (?) Prototype system available at:

http://viaf.org

Computer Science Bibliography (DBLP) Computer Science http://dblp.uni-trier.de HTML, XML
Wikipedia Persondata info data dump HTML

Corporate body[edit]

Name of service Scope Info Formats supported Interfaces Costs Access
Körperschaftsnormdatei (GKD) More than 1 mio records (german&foreign corporate bodies and conferences) Introduction MAB2 Z39.50 see PND see PND

Journal[edit]

Name of service Scope Info Formats supported Interfaces Costs Access
Zeitschriftendatenbank (ZDB) ca. 1,3 mio records Introduction MAB2, UNIMARC, SUTRS Z39.50 It has to be clarified with the GWDG if a tailored version of the ZDB (only listing MPG licensed journals) is available.
ISSN Register 1.284.413 records (2006) http://www.issn.org MARC21, UNIMARC Z39.50 costs Access via the ISSN portal or Z39.50 or via a combined web access Z39.50 and ISSN portal

Rights[edit]

Name of service Scope Info Formats supported Interfaces Costs Access
SHERPA/RoMEO

Publishers copyright policies&self-archiving

340 publishers (July 2007) http://www.sherpa.ac.uk/romeo.php XML Prototype API Conditions of re-use Prototype API
Directory of Open Access Journals (DOAJ) 2.987 journals, 164.284 articles (5th of December 2007) http://www.doaj.org XML OAI-PMH Conditions of re-use OAI-PMH
  1. Information from web site: "users do not have to register or request permission to search, save, print, or email the LC authority records. The only limitation is that authority records may only be saved, printed or emailed one at a time."
  2. https://dev.livingreviews.org/projects/vlib/wiki/AuthFiles