Difference between revisions of "Talk:Control of Named Entities"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 1: Line 1:


== Open Issues ==
'''Metadata'''
*Agree on a list of potential candidates for authority files. ''Note: If a generic mechanism like CDS Invenio's knowledge base would be implemented no such list would be needed in advance.''
*Define what kind of descriptive elements an authority record should contain. Descriptive elements may differ from authority file to authority file and should therefore be defined individually.
*Decide whether an IR-item should be self-contained or not. ''Question: What does self-contained mean? Even right now, IR items are not self-contained in the sense that they contain all relevant metadata values, because other repository objects like creators are only referenced.''
*Define how to map authority files to a MD element in a specific MDS. Note: for every MD element the system supports authority files for, we probably need to specify a list of descriptive information available for the authority file (e.g. journal names: title, translation of title, title abbreviations, ISSN, eISSN, etc. persons: last name, first name, etc.)
*Specify linking between IR items and authority records via ID.
*Specify linking of different authority files/databases (e.g. user database - personal name authority file - affiliation authority file).
*Describe selection of authority record (Depositor during submission? Free-text field? System suggests authority record while Depositor fills in information? Depositor may search within authority file database and selects an record?).
'''Handling of authority files'''
*Specify assignment of items to an authority record and when it will take place (while submission by selecting an authority record from the selection list? While submission by accepting an authority record selected by the system? While FQA?).
*Describe administration and control of authority files (who is allowed to create, edit, delete, redirect, and authorize authority records? (see proposal of new role AF-Editor).
*Define what will happen in case no appropriate authority record is available.
*Specify if different kinds of authority files require different handling.
*Clarify dilemma between authority files and Autopsie-Prinzip (scenario: user selects an authority record. System fills in certain fields automatically. User edits one or more of the automatically selected fields afterwards) (proposal made by Inga: entry in IR item follows Autopsie-Prinzip but browse tree will be generated from authority record and standardized data. Notation of original (Vorlage) should be integrated in authority record as an alternative (e.g. alternative name).
*Specify duplicate checking for authority records. Duplicate checking should also compare e.g. name and alternative name.
*Specify users and their rights and privileges concerning authority files.
*Specify if a separate authority file workflow is required.
*Describe entry of multiple authors (via copy and paste).
'''Handling of new authority records'''
*Describe creation of new authority records (e.g. when does user create a new record? (Depositor during submission? Moderator during FQA? AF-Editor in a separate workflow? Is it possible to use an existing entry as template? Should the system generate a message to AF-Editor in case a new authority record has been created?).
*Specify a “Regelwerk” for the creation of new authority records.
*Specify if an authority record has obligatory elements.
'''Import of external authority files'''
*Specify how external authority files can be provided (licensed by MPS? Online available? CD-ROM?) and which procedures are required (includes: harvesting, data conversion (format and character set), linking to IR items, update mechanism, maintenance).
*Describe import of external authority files or subset of it.
*Where will be imports of data sets like: name authority files (e.g. PND), user/person related information; imports from MPG-IP-database hosted at GWDG, other authority files (e.g. Zeitschriftendatenbank) described and handled? – They are not described in USC_ingestion.
'''Build-up of internal authority files'''
*Describe procedure of how to create incrementally built authority files.
*Clarify integration/interaction of internal and external authority files (initial import of external authority files or harvest of authority files scheduled on a regular basis?).
*Will it be possible to extend/modify authority records in case of loading/synchronizing authority files from external source (assumption: no or by customizable fields).
'''Customization'''
*Clarify on which criteria authority files are chosen (customization of authority files on collection, user, user group level?).
*Describe setup of authority files on collection level.
'''Export'''
*Specify export of authority records/authority files
*Define what kind of descriptive elements of an authority record should be exportable in case IR item is not self-contained or in case IR item should be enriched with additional information from authority record.
'''Ingestion'''
*Specify assignment of authority records for ingested data.
'''Searching'''
*Define which elements of authority records are searchable (simple, advanced and expert search).
*Specify searching in external authority files (provide interface for AF-Editor and Moderator to external authority files for inquiries and data transfer).
*Define generation of browse trees (proposal: browse trees should be generated of standardized data of authority record).
*Specify search in internal authority files.
*Describe basket functionality for authority records (e.g. important for re-direction in batch mode or re-use of data).
'''Migration of eDoc data'''
*Specify assignment of authority records to migrated eDoc items.
'''Others'''
*Is the SFX knowledge base an alternative to the ZDB?
*Favourite co-authors feature has to be implemented in accordance with authority file concept (Wörterbuchfunktion could be an alternative to the favourite coauthors feature).
*Automatic “Umverknüpfungsprozess” has to be specified. Privileges and rights of IR items have to be considered (i.e. how to handle the re-direction of items from other collections).
== Potential future projects ==
== Potential future projects ==



Revision as of 13:24, 8 December 2008

Potential future projects[edit]

Working group on authority files

  • build a working group on authority files (out of PubMan pilot group and other interested Max Planck Institutes). Possible tasks:
    • sample creation of controlled entries of MPG-related authors (maybe of one institute) according to standard guidelines and in Library of Congress Authority File format.

Further development of CoLab page

  • to clarify terminology:
    • create a general page (Übersichtsseite) on controlled vocabulary (introductory text about different areas in the domain of controlled vocabularies (including thesauri, classifications, subject headings, ontologies, etc.)
    • create the following subpages:
      • rename current page on ControlledVocab into control of named entities - done--Sabine 09:53, 17 April 2008 (CEST)
      • (technical) service currently developed might be called something like service for control of named entities - done--Sabine 11:20, 16 April 2008 (CEST)


History[edit]

Under this heading bits and pieces will be collected that have been arisen during discussions etc. and that should be kept for the sake of completeness.

Naming[edit]

There had been a discussion what kind of term we should use instead of authority files/authority records/etc. On 30th of November it has been agreed to use the term „control of named entities / controlled named entities“. During the discussion the following alternative terms have been proposed:

  • normalizing metadata/data entries
  • managing controlled vocabularies
  • harmonizing metadata/data entries
  • controlling metadata entries
  • terminology management
  • reference information service (can be split in: reference person service, reference affiliation service, reference journal service, etc.)
  • (proposal) master data management is another term that can be considered (though it is not an exact same meaning like used in ERP, CRM systems)
  • (proposal) metadata value domains/metadata domain value
  • controlled metadata values
  • see also ISAAR(CPF):http://www.ica.org/en/node/30230

If i'm not mistaken CDS Invenio (the software of the CERN document server) calls the concept knowldege base. It's also worth mentioning, how it functions: No normalization of data is performed on input, i.e. the data in the database will always be what was inserted by the metadata editor. Knowledge bases do only come into play when outputting data. In this case, output formatting templates can associate certain fields with knowledge bases and thus force normalization of data. This concept is due to a requirement which should be familiar from eDoc: Scientists want to be able to get their data out exactly as it was inserted - e.g. author names in all-caps. Obviously this approach has it's own share of problems. Basically all methods which investigate the data (searching, duplicate detection, etc.) must take knowledge bases into account, or will only work in idiosyncratic ways.

Remark from Traugott: CDS invenio is based on different usage and business model and therefore their features are not really applicable to our scope.