Control of Named Entities

From MPDLMediaWiki
Jump to: navigation, search
eSciDoc Solutions
PubMan:
Overview · Functionalities
Interfaces · Support
Faces:
Overview · Functionalities
Scope · Support
ViRR:
Overview · Functionalities
Scope · Support
imeji
Digitization Lifecycle
edit


The aim of this page is to collect information concerning control of named entities in the context of the eSciDocEnhanced Scientific Documentation Solution for publication data management PubMan. The page contains general information of envisioned PubManPublication Management services in this domain. Functional and technical specifications can be found on the page called Service for Control of Named Entities.


Introduction

The control of named entities is important in order to manage and retrieve metadata of high quality, have consistent data on the system and form the basis for excellent search results. In the field of library and information science the creation and maintenance of controlled named entities (so called authority files/authority records) are well established and follow special guidelines and special workflows. Usually such authority records are maintained as separate records and are linked to other records. An authority record normally contains - among others - the authorized name of e.g. a person, name variants, additional information e.g. to disambiguate the person from other persons with the same name, and information about relationships between the record and other records. The benefit of authority files is to establish and to offer uniform access points e.g. for persons, to group together the various works of this person, and to create consolidated indexes. Furthermore additional information like alternative name variants of a person have only to be maintained and curated in one record (i.e. authority record) but permit to search under all name variants of the person and to retrieve all records of this person independently of the name variante that is used.

Scope

For PubMan several metadata elements are candidates for control of named entities and the generation of controlled lists. The information maintainted for all the mentioned metadata elements depends on various factors:

  • the kind of information that is needed to identify the relevant metadata value (e.g. during submission),
  • the kind of information that is needed for searching and browsing,
  • the source used (i.e. what information is actually offered),
  • the usage in the system,
  • the quantity of information that should be filled in by the users and should be maintained and stored in the system
  • and finally the kind and quantity of information that is relevant within the scope of PubManPublication Management.

Controlled metadata records should mainly contain descriptive metadata. The selection of potential descriptive elements depends - as mentioned above - on various factors: the source of controlled metadata values, what kind of values has to be filled in manually and what kind of values can be filled in via batch operations, the usage of the data, the quantity of information that should be stored and maintainted in PubManPublication Management, and the question for which elements exist a controlled predefined value list and for which elements values are freely definable. The list of descriptive metadata elements should be extendable by new elements.

The publication item stored and maintained in PubManPublication Management should contain all important information to constitute a coherent item itself. Publication items might be linked with the database on controlled named entities or other (external) sources in order to enrich the publication item with additional information (e.g. rights statements for journals).

Benefits

The benefits of controlled named entities in the context of PubManPublication Management are as follows:


Search/retrieval: controlled named entities guarantee accurate research results and allow its refinement.

Consolidate indexes/browsing: controlled named entities allow the generation of consolidate indexes and enable the users to browse by e.g. person or organizational unit.

Metadata entry: controlled named entities facilitate the metadata entry for publication items. During submission the user may select a controlled named entity from a controlled list for certain metadata fields.

List of references: controlled lists enable the generation of referenzing lists (e.g. for authors or institutes).

Consistency: controlled named entities foster consistent data on the system.

Data enrichment: controlled named entities allow the enrichment of data with additional information (e.g. ISSNInternational Standard Serial Number number). The data enrichment is also relevant for export functionalities.

Batch operations: controlled named entities facilitate the performance of batch operations (e.g. switch the access status of publication items containing a certain publisher from private to public).

Linking: controlled named entities allow to link from publication items to other sources (e.g. link from publication item to holding information via OpenURLOpen Uniform Resource Locator).

Improved interoperability: controlled named entities enable better information integration either in eSciDocEnhanced Scientific Documentation itself or via other service providers which have harvested the PubManPublication Management data or provide meta-searches.

Add ons: the handling of controlled named entities are important for certain functionalities and services offered by PubManPublication Management (e.g. web service for organizational units, creation of researcher pages).



Application models

In theory there are various models of how to create and maintain controlled named entities. As an overall remark it can be stated that it is most likely that not for all metadata elements that are candidates for controlled named entities (e.g. person names) an external authority record is available. This means that

a) controlled named entities and not controlled named entities will coexist and/or

b) internally controlled named entities have to be created and maintained.

What kind of application model is selected for the PubManPublication Management services on controlled named entities will be stated in the respective functional specification.


External sources (import of complete authority system)

Critical factors:

  • Data of external sources are constantly updated and extended. The risk is to get inconsistencies between external sources and local copies. This means that this application model is primarily practicable for data that are not subject to constant changes (e.g. classifications).
  • Legal situation: it has to be clarified if external data can be stored and maintained in PubManPublication Management.
  • Imported external authority file records should contain information about its source and version. This allows to specify the quality/reliability of the data during retrieval.


External controlled metadata values one at a time (import of values, not of complete authority system, e.g. via web services)

Critical factors:

  • It has to be clarified if for all relevant external sources an appropriate web service is provided (e.g. web service for PNDPersonen Normdatei?).
  • Imported external authority file records should contain information about its source and version. This allows to specify the quality/reliability of the data during retrieval.


Build-up of controlled metadata values (within PubManPublication Management/eSciDocEnhanced Scientific Documentation and with internal QAQuality Assurance process)

Critical factors:

  • Controlling/QAQuality Assurance process has to be set up internally. It has to be clarified to what extend standardization efforts and the consideration of national and international guidelines should be applied.
  • Creation and maintenance of controlled named entities is time consuming, expensive and requires trained staff.
  • Licensing: it has to be clarified if in Germany a special license is needed to build up databases for persons (e.g. controlled named entities for person names).


Referencing to external sources (e.g. via IDIdentifier)

Critical factors:

  • Controlled named entities are not stored and maintained in the system. Only a reference (e.g. IDIdentifier) links to the external source where the entities are maintained. In case external source is not available, values are not accessible.


Hybrid application models:


Initial import of external sources as a start content. Data will be further maintained and extended in PubManPublication Management and be merged with internally created controlled metadata values.

Critical factors:

  • It has to be clarified who is the rights holder of the data and if storage and further editing is permitted.
  • It has to be decided whether data from external sources can/should be edited or not.
  • Imported external authority file records should contain information about its source and version. This allows to specify the quality/reliability of the data during retrieval.
  • Internally created controlled named entities should be marked to facilitate further internal QAQuality Assurance process.


Shared/combined use of regularly harvested external sources and internally created controlled named entities. This can be done either by integrating harvested external sources in the internally build controlled named entities or by downloading and integrating external authority file records one at a time (via web services).

Critical factors:

  • It has to be clarified who is the rights holder of the data and if storage, maintenance and further editing is permitted.
  • Duplicate detection has to be secured.
  • Handling and procedure of updating/regular scheduled harvest of external sources has to be specified.

Methods of gathering external sources

  • Access to web service interface of external sources and integration of selected records.
  • Import of external sources (e.g. as start content) and its integration in to the system.
  • Regular scheduled harvest of external sources and its integration in to the system.


Management of controlled named entities

The management of controlled named entities depends - among others - on the chosen application model and especially on the kind of data (external data or internally created data) that has to be administered. It also depends on the user group and the respective usage scenarios. The management of controlled named entities will be specified in detail in the functional specification of each PubManPublication Management service for controlled named entities. The listing below gives only a rough overview of the basic functionalities that have to be supported by the system:

For the creation and/or administration of controlled named entities at least the following functionalities have to be supported (depending on the chosen application model):

  • creation (only valid for internally built controlled metadata records), editing, searching and deletion/deactivation of controlled named entities
  • exporting of controlled named entities (e.g. as XMLExtensible Markup Language or csv file)

For the user (e.g. depositor) of the system at least the following functionalities have to be provided:

  • searching, displaying, and selection of controlled named entities.

Customization/Usage

There are several metadata elements which are candidates for controlled named entities. The controlled list of named entities will depend not only on the respective metadata elements but also on other factors like collection or user profil. Therefore the system has to provide customization options on various levels.

Implementation

The MPDLMax Planck Digital Library developed an independent service for the control of named entities (CoNEControl of Named Entities).

Implementation Details and corresponding functionalities can be found here: CoNE

Further Reading

  • IFLAInternational Federation of Library Associations and Institutions Working Group on Functional Requirements and Numbering of Authority Records (FRANARFunctional Requirements and Numbering of Authority Records): Functional Requirements of Authority Data: A conceptual Model, Draft 2007-04-01 pdf document http://www.ifla.org/VII/d4/wg-franar.htm (FRANARFunctional Requirements and Numbering of Authority Records)
  • IFLAInternational Federation of Library Associations and Institutions UBCIM Working Group on Minimal Level Authority Records and ISADN: Mandatory Data Elements for Internationally Shared Resource Authority Records http://www.ifla.org/VI/3/p1996-2/mlar.htm
  • Danskin, Dixon, Docherty, Hill, Moore: A review of the current landscape in relation to a proposed Name Authority Service for UKUnited Kingdom repositories of research outputs. Prepared for the JISCJoint Information Systems Committee UK Names Project. June 2008. PDF