Talk:Control of Named Entities
This is a protected page.
Information on this page is in stage „work in progress“ or needs to be discussed.
Prototyping[edit]
Prototype service for controlled named entities - journal names[edit]
(work in progress!)
Functional proposal:
Functional proposal for prototyping a first "controlled vocab service" for next release:
- start with a basic service, which allows the import of a controlled list (start content from edoc, done by vlad/nicole) and an update/edit of this list -> that might be a task for R3.
- In a later release, would be to include this service in the edit mask/submission and include it in the search. --Ulla 15:28, 12 November 2007 (CET)
RDF schema:
RDF schema: http://schemas.library.nhs.uk/ApplicationProfile/Journal.rdf
- This looks quite comprehensive and we just need a small subset . After 10 minutes analyzing the schema, I'm not sure how the identifiers are further encoded (ISSNURL?). My vote: too complex, reduce it to minimum? --Inga 16:47, 29 November 2007 (CET)
NLM DTD: http://dtd.nlm.nih.gov/publishing/tag-library/2.3/n-z4u0.html
Start content:
Inga will provide an import file (start content) after her holiday, see https://dev.livingreviews.org/projects/vlib/wiki/SFXJournalIssues#JournalListforeSciDoc --Inga 22:27, 13 November 2007 (CET)
Scope[edit]
(First ideas, needs to be discussed)
A list of potential descriptive elements is given below. The list should be considered as a first draft and has to be revised.
Person[edit]
- Complete name
The complete name of a person, usually a concatenation of given names and family name
- Given name
A given name of a person
- Family name
The family name of a person
- Alternative name
Any alternative name used for the person
- Title
The title or peerage of a person in one string
- Pseudonym
The pen or stage name of a person
- Remark Sabine: Can pseudonym also be covered by alternative name?
- Remark Ulla: Let's assume: yes
- Affiliation
The institution the person was affiliated to when creating the item
- Identifier
Identifier in the Personennormdatei, provided by the Deutsche Nationalbibliothek
- Remark Sabine: IMO other identifier should be allowed as well (e.g. Identifier of Library of Congress Name Authority File)
- Remark Ulla: Can be modified if needed
Email address of the person (e.g. will allow users to send an email to the author asking for the fulltext in case it is not available)
- Remark Sabine: I am not sure whether the email address is an important information for all persons or for registered PubMan users only. The handling of this "private data" has also be clarified.
- Remark Traugott: Problem of regularly updating the email address
Remark Ulla: IMO updating controlled vocab. is always a challenge, not only for emails...?
- Homepage
The location of a personal homepage (e.g. in case fulltext is available via personal homepage)
- Remark Sabine: same as for Email
Organization[edit]
- Name
The name of the organization, including translations. (Translations need respective language flag)
Cataloging rules are necessary, i.e. the full network path should be visible in the name of the orgunit.
- Alternative name
Any alternative name or abbreviation used for the organization
- City
The city where the organization is located
- Country
The country where the organization is located
- Type
Type of organization, i.e. institution, institute, department, group, research unit, project, sub-project, research school
- Time period
Indication, in which time period the organisation was active.
- Relations
- actual hierarchies and network-relations, e.g. sub-units
Remark Traugott: hierarchical relations between org units might be keep sufficiently within the titel/name element, with the sequence from higher to lower.
- historical dependencies, i.e. successor-predecessor
- Identifier
URI, eSciDoc Identifier for the organisation. In addition, other identifiers can be kept.
- MPS-Section(?)
Affiliation of the organisational unit to one of the three sections
to be checked: if needed only for statistical reports, we might consider to keep a separate source (section-org URI) and take the data just in case needed,
Other candidates[edit]
Potential other candidates for normalized metadata entries have to be discussed futher and maybe to be defined with pilots.
- Conferences/events
Potential metadata elements:
Title The name of the event (e.g. Symposium on Theory of Computing)
Alternative title Any alternative name of the event
Abbreviation Abbreviated name of the event (e.g. STOC)
Start date Start date of the event
End date End date of the event
Place Place where the event took place
Invitation status The information if the creator was explicitly invited
- Remark Sabine: Should this information be stored in controlled metadata record?
- Remark Ulla: No, not to my understanding
- Keywords, classifications, thesauri → see cpt_pubman_classifications
- Title of Source (e.g. Series titles)
Potential external sources[edit]
(work in progress)
The tables give an overview of potential sources of controlled named entities which are of interest. The information given in the tables reflects the current situation and has to be updated from time to time. The tables are in stage "work in progress" and other sources might be added.
Person(s)
Person(s), i.e. full name of persons (authors, editors, referees, etc.)
Name of service | Scope | Info | Formats supported | Interfaces | Costs | Access |
---|---|---|---|---|---|---|
Library of Congress Name Authority Service | To be evaluated in detail
(likely not to cover too many MPG authors) |
Introduction | MARCXML | SOAP
WSDL |
Records are free of charge[1] | via web site |
Personennormdatei (PND) | ca. 2,6 mio names (1 mio with individualized records)
To be evaluated in detail (likely not to cover too many MPG authors) |
Introduction | MAB2 | Z39.50 | PND, GKD and SWD only in combination available | CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July |
Virtual International Authority File (VIAF) | First prototype covers LC and DNB personal name authority and related bibliographic records | project web site | MARC21 (?) | Prototype system available at: | ||
Computer Science Bibliography (DBLP) | Computer Science | http://dblp.uni-trier.de | HTML, XML |
Corporate bodies
Name of service | Scope | Info | Formats supported | Interfaces | Costs | Access |
---|---|---|---|---|---|---|
Körperschaftsnormdatei (GKD) | More than 1 mio records (german&foreign corporate bodies and conferences) | Introduction | MAB2 | Z39.50 | see PND | see PND |
Journal(s)
Name of service | Scope | Info | Formats supported | Interfaces | Costs | Access |
---|---|---|---|---|---|---|
Zeitschriftendatenbank (ZDB) | ca. 1,3 mio records | Introduction | MAB2, UNIMARC, SUTRS | Z39.50 | It has to be clarified with the GWDG if a tailored version of the ZDB (only listing MPG licensed journals) is available. | |
ISSN Register | 1.284.413 records (2006) | http://www.issn.org | MARC21, UNIMARC | Z39.50 | costs | Access via the ISSN portal or Z39.50 or via a combined web access Z39.50 and ISSN portal |
Rights
Name of service | Scope | Info | Formats supported | Interfaces | Costs | Access |
---|---|---|---|---|---|---|
SHERPA/RoMEO
Publishers copyright policies&self-archiving |
340 publishers (July 2007) | http://www.sherpa.ac.uk/romeo.php | XML | Prototype API | Conditions of re-use | Prototype API |
Directory of Open Access Journals (DOAJ) | 2.987 journals, 164.284 articles (5th of December 2007) | http://www.doaj.org | XML | OAI-PMH | Conditions of re-use | OAI-PMH |
Potential use cases[edit]
(First ideas, needs to be discussed)
- Create an authority record
- Use an authority record as template
- Display an authority record
- Edit an authority record
- Delete an authority record
- Link an authority record to an IR item
- Redirect an authority record
- Search an authority record
Description of potential use cases[edit]
(First ideas, needs to be discussed)
This section contains a first draft of potential use cases, described in a generic way that have to be adapted according to the respective type of authority file (e.g. journal, person, etc.). Whether the described use cases can also be applied for affiliations has to be evaluated further. The description of potential use cases is based on the assumption that the system supports the incremental build-up of internal authority files and external sources are used and integrated as a start content.
Create an authority record[edit]
Preconditions, assumptions
- IR item is self-contained
- Authority record is immediately visible and selectable (no status: pending, submitted, etc.)
- Depositor is not allowed to create, edit, delete or redirect an authority record
- Potential new role “AF-Editor” is not considered
- An authority record has no obligatory descriptive elements (=> no validation process is required)
- UC can be triggered independently or during FQA process as an extension of UC_PM_QA_XXX in case IR item has not been assigned to an authority record during submission process or has been assigned to the wrong authority record and the appropriate record is not yet available
- Nice to have: system provides interface to external authority files for inquiries and data transfer (not considered)
Actors
- Moderator, MD-Editor
Basic course of events (creation during FQA process)
- The user chooses to create an authority record
- The system creates a new authority record for the respective metadata field
- Continue with UC edit an authority record
- The system links the selected IR item with the authority record (via ID)
Alternative a (in case use case is triggered independently)
- The user chooses to create an authority record
- The system displays a list of all authority files for which the user has privileges
- The user selects an authority file and confirms the choice
- The system creates a new authority record
- Continue with UC edit an authority record
Alternative b (in case use case is triggered independently)
- If the user has rights for only one authority file, the authority file selection is automatically performed by the system
Use an authority record as template[edit]
Preconditions, assumptions
- IR item is self-contained
- Potential new role of “AF-Editor” is not considered
- UC can be triggered during FQA process or independently
- One authority record is selected
Actors
- Moderator, MD-Editor
Basic course of events
- The user chooses to use the selected authority record as template
- The system creates a new authority record and populates the new record with the metadata of the selected record
- Continue with UC edit authority record
Display an authority record[edit]
Preconditions, assumptions
- IR item is self-contained
- Potential new role of “AF-Editor” is not considered
- UC can be triggered during Submission, FQA process or independently
- One authority record is selected
Actors
- Moderator, MD-Editor, (Depositor) (authority record view for Depositor must not contain personal data, e.g. date of birth etc.)
Basic course of events
- User chooses to display the selected authority record
- The system displays the authority record
Edit an authority record[edit]
Preconditions, assumptions
- IR item is self-contained
- Depositor is not allowed to create, edit, delete or redirect an authority record
- Potential new role “AF-Editor” is not considered
- Check of correct assignment of authority record is performed during FQA process and no separate authority record quality assurance process is implemented => UC can be triggered during FQA process or independently
- UC is included by UC create an authority record and by UC use an authority record as template
- The user wants to change or provide data for an authority record
Actors
- Moderator, MD-Editor
Basic course of events
- The user chooses to edit the selected authority record
- The system displays an edit view for the selected authority record
- (Optional) the user adds new metadata values or modifies existing metadata values
- The user chooses to finalize the data
- The system stores the authority record and displays a success message
Delete an authority record[edit]
Preconditions, assumptions
- IR item is self-contained
- Deletion of authority records is important in case duplicates have been generated
- Always the newer authority record should be deleted (=> date of creation is an important information and should be displayed somewhere in the authority record view)
- Only authority records with no IR items assigned can be deleted. In case to be deleted authority record is still linked to IR items, links have to be changed manually beforehand (cf. UC redirect an authority record). Maybe an automatic “Umverknüpfungsprozess” should be implemented at a later date.
- Depositor is not allowed to create, edit, delete or redirect an authority record
- Potential new role “AF-Editor” is not considered
- One authority record is selected
Actors
- Moderator
Basic course of events
- The user chooses to delete the selected authority record
- The sytem checks if not IR items are linked with the selected authority record
- No IR items are linked with the authority record
- The system prompts the user to confirm the deletion
- The user confirms to delete the authority record
- The system deletes the authority record and displays a success message
Alternative a
- The selected authority record is still linked to one or more IR items
- The deletion fails
Alternative b
- The user does not confirm to delete the authority record
- The selected authority record is unaffected
Link an authority record[edit]
Preconditions, assumptions
- IR item is self-contained
- After selecting an authority record (and establishing a link between authority record and IR item) the user is still allowed to edit the medatata field but the established link will not remain. Incorrect links should be discovered and corrected during FQA process
- Potential new role “AF-Editor” is not considered
- Use case is part of USC submission or USC FQA an should be integrated as an include association in UC_PM_SM_XXX
Actors
- Depositor, Moderator, MD-Editor
Basic course of events
- User fills in the corresponding metadata field. During his/her typing the system automatically suggests a list of potential authority records (Wörterbuchfunktion)
- The user selects an authority record
- The system links the item with the selected authority record via AR-ID. In case a link has already been established, the system overwrites the previous AR-ID (relevant in case UC is triggered during FQA process)
- (Optional) the user edits the corresponding metadata field. The link between the item and the authority record does not remain and the item is marked as <not assigned to authority record>
Alternative
- No appropriate authority record is available. The user enters a free-text
- The item is marked as <not assigned to authority record>
Redirect an authority record[edit]
Preconditions, assumption
- IR item is self-contained
- Depositor is not allowed to create, edit, delete or redirect an authority record
- Potential new role “AF-Editor” is not considered
- UC is triggered in case duplicate has been detected or in case IR item has been assigned to the wrong authority record
Actors
- Moderator, System
Basic course of events
- Moderator triggers automatic Umverknüpfungsprozess
Search an authority record[edit]
Preconditions, assumptions
- IR item is self-contained
- Potential new role “AF-Editor” is not considered
- UC can be triggered during FQA process or independently
Actors
- Moderator, MD-Editor
Basic course of events
- The user selects one or more authority files
- The system displays a simple search field
- The user enters a search string and chooses to start the search
- The system searches in all metadata fields of authority records
- The system displays the list of items of the search result
Alternative
- No item matched the search string
- The system displays a message
Potential new role: AF-Editor[edit]
(First idea, needs to be discussed)
It has to be discussed further if a new role called AF-Editor has to be established. The idea is that the AF-Editor is responsible to provide and maintain high data quality of authority records and to ensure the consistency of the authority file databases. He/she is familiar with relevant cataloging and standardization rules and takes care of the standardization of selected data. The AF-Editor complements the area of responsibilities of the Moderator and the MD-Editor and has special privileges to authorize and to deactivate authority records. Once an authority record has been authorized it is locked and can only be edited by the AF-Editor him-/herself.
Potential new use cases
- authorize an authority record
- send an authority record back for revision (in case e.g. Moderator wants to edit an authority record which has been already authorized)
- propose an authority record for deactivation
Potential new status
- After creation authority record is either in state pending or submitted.
Potential privileges/competencies
- AF-Editor is allowed to authorize and to deactivate authority records and beyond that has privileges to all other actions connected to authority files/authority records.
- During separate AFQA process authority record gets checked and authoritzed by the AF-Editor. A list of newly created authority records is displayed in the AF-Editors’ workspace.
Open issues
- Separate AFQA process and its interaction with FQA process has to be specified. We assume that the release process of IR items is not affected by new AF workflow when IR items are self-contained and follow Autopsie-Prinzip.
Potential privileges/competencies[edit]
(First idea, needs to be discussed)
Depositor
- display an authority record
- link an authority record
Moderator
- create an authority record
- display an authority record
- edit an authority record
- deactivate an authority record
- link an authority record to an IR item
- redirect an authority record
- search an authority record
MD-Editor
- create an authority record
- display an authority record
- edit an authority record
- link an authority record to an IR item
- search an authority record
Potential web services[edit]
It has to be discussed whether part of the controlled metadata values which are stored and maintained in PubMan should be provided via web services (e.g. an interface/plugin for organizational units in order to re-use the data for instance when writing a scientific paper). The legal situation for metadata values from external sources has to be clarified in this context.
Potential future projects[edit]
- build a working group on authority files (out of PubMan pilot group and other interested Max Planck Institutes). Possible tasks:
- sample creation of controlled entries of MPG-related authors (maybe of one institute) according to standard guidelines and in Library of Congress Authority File format.
Open Issues[edit]
Metadata
- Agree on a list of potential candidates for authority files. Note: If a generic mechanism like CDS Invenio's knowledge base would be implemented no such list would be needed in advance.
- Define what kind of descriptive elements an authority record should contain. Descriptive elements may differ from authority file to authority file and should therefore be defined individually.
- Decide whether an IR-item should be self-contained or not. Question: What does self-contained mean? Even right now, IR items are not self-contained in the sense that they contain all relevant metadata values, because other repository objects like creators are only referenced.
- Define how to map authority files to a MD element in a specific MDS. Note: for every MD element the system supports authority files for, we probably need to specify a list of descriptive information available for the authority file (e.g. journal names: title, translation of title, title abbreviations, ISSN, eISSN, etc. persons: last name, first name, etc.)
- Specify linking between IR items and authority records via ID.
- Specify linking of different authority files/databases (e.g. user database - personal name authority file - affiliation authority file).
- Describe selection of authority record (Depositor during submission? Free-text field? System suggests authority record while Depositor fills in information? Depositor may search within authority file database and selects an record?).
Handling of authority files
- Specify assignment of items to an authority record and when it will take place (while submission by selecting an authority record from the selection list? While submission by accepting an authority record selected by the system? While FQA?).
- Describe administration and control of authority files (who is allowed to create, edit, delete, redirect, and authorize authority records? (see proposal of new role AF-Editor).
- Define what will happen in case no appropriate authority record is available.
- Specify if different kinds of authority files require different handling.
- Clarify dilemma between authority files and Autopsie-Prinzip (scenario: user selects an authority record. System fills in certain fields automatically. User edits one or more of the automatically selected fields afterwards) (proposal made by Inga: entry in IR item follows Autopsie-Prinzip but browse tree will be generated from authority record and standardized data. Notation of original (Vorlage) should be integrated in authority record as an alternative (e.g. alternative name).
- Specify duplicate checking for authority records. Duplicate checking should also compare e.g. name and alternative name.
- Specify users and their rights and privileges concerning authority files.
- Specify if a separate authority file workflow is required.
- Describe entry of multiple authors (via copy and paste).
Handling of new authority records
- Describe creation of new authority records (e.g. when does user create a new record? (Depositor during submission? Moderator during FQA? AF-Editor in a separate workflow? Is it possible to use an existing entry as template? Should the system generate a message to AF-Editor in case a new authority record has been created?).
- Specify a “Regelwerk” for the creation of new authority records.
- Specify if an authority record has obligatory elements.
Import of external authority files
- Specify how external authority files can be provided (licensed by MPS? Online available? CD-ROM?) and which procedures are required (includes: harvesting, data conversion (format and character set), linking to IR items, update mechanism, maintenance).
- Describe import of external authority files or subset of it.
- Where will be imports of data sets like: name authority files (e.g. PND), user/person related information; imports from MPG-IP-database hosted at GWDG, other authority files (e.g. Zeitschriftendatenbank) described and handled? – They are not described in USC_ingestion.
Build-up of internal authority files
- Describe procedure of how to create incrementally built authority files.
- Clarify integration/interaction of internal and external authority files (initial import of external authority files or harvest of authority files scheduled on a regular basis?).
- Will it be possible to extend/modify authority records in case of loading/synchronizing authority files from external source (assumption: no or by customizable fields).
Customization
- Clarify on which criteria authority files are chosen (customization of authority files on collection, user, user group level?).
- Describe setup of authority files on collection level.
Export
- Specify export of authority records/authority files
- Define what kind of descriptive elements of an authority record should be exportable in case IR item is not self-contained or in case IR item should be enriched with additional information from authority record.
Ingestion
- Specify assignment of authority records for ingested data.
Searching
- Define which elements of authority records are searchable (simple, advanced and expert search).
- Specify searching in external authority files (provide interface for AF-Editor and Moderator to external authority files for inquiries and data transfer).
- Define generation of browse trees (proposal: browse trees should be generated of standardized data of authority record).
- Specify search in internal authority files.
- Describe basket functionality for authority records (e.g. important for re-direction in batch mode or re-use of data).
Migration of eDoc data
- Specify assignment of authority records to migrated eDoc items.
Others
- Is the SFX knowledge base an alternative to the ZDB?
- Favourite co-authors feature has to be implemented in accordance with authority file concept (Wörterbuchfunktion could be an alternative to the favourite coauthors feature).
- Automatic “Umverknüpfungsprozess” has to be specified. Privileges and rights of IR items have to be considered (i.e. how to handle the re-direction of items from other collections).
History[edit]
Under this heading bits and pieces will be collected that have been arisen during discussions etc. and that should be kept for the sake of completeness.
Naming[edit]
There had been a discussion what kind of term we should use instead of authority files/authority records/etc. On 30th of November it has been agreed to use the term „control of named entities / controlled named entities“. During the discussion the following alternative terms have been proposed:
- normalizing metadata/data entries
- managing controlled vocabularies
- harmonizing metadata/data entries
- controlling metadata entries
- terminology management
- reference information service (can be split in: reference person service, reference affiliation service, reference journal service, etc.)
- (proposal) master data management is another term that can be considered (though it is not an exact same meaning like used in ERP, CRM systems)
- (proposal) metadata value domains/metadata domain value
- controlled metadata values
- see also ISAAR(CPF):http://www.ica.org/en/node/30230
If i'm not mistaken CDS Invenio (the software of the CERN document server) calls the concept knowldege base. It's also worth mentioning, how it functions: No normalization of data is performed on input, i.e. the data in the database will always be what was inserted by the metadata editor. Knowledge bases do only come into play when outputting data. In this case, output formatting templates can associate certain fields with knowledge bases and thus force normalization of data. This concept is due to a requirement which should be familiar from eDoc: Scientists want to be able to get their data out exactly as it was inserted - e.g. author names in all-caps. Obviously this approach has it's own share of problems. Basically all methods which investigate the data (searching, duplicate detection, etc.) must take knowledge bases into account, or will only work in idiosyncratic ways.
Remark from Traugott: CDS invenio is based on different usage and business model and therefore their features are not really applicable to our scope.
Footnotes & References[edit]
- ↑ Information from web site: "users do not have to register or request permission to search, save, print, or email the LC authority records. The only limitation is that authority records may only be saved, printed or emailed one at a time."