Managing CoNE entities - Persons

From MPDLMediaWiki
Jump to: navigation, search

Scenarios, use cases and data structure for extending the eSciDocEnhanced Scientific Documentation CONEControl of Named Entities service with data on persons, especially in the context of eSciDoc.PubMan.


Growing index

Status: in specification

Schedule: tbd

Whatever submission method is applied (manual submission, import, copy&paste author information), every creator information gets a ConeID. Optionally, matching algorithms can be implemented, to match submittted/ingested creator information to alreaddy existing Cone entities (e.g. based on given name, family name and affiliation) In subsequent quality assurance actions, privileged users can improve/manage the person entities in Cone, and

  • define authorized entries
  • relate various ConeIDs to one authorized entry
  • edit the person information for a specific authorized entry (e.g. current/former/additional affiliations, additional naming variants, additional external IDs, such as local IDs, ResearcherID, KakenID etc.)

Based on this growing index, eSciDoc.PubMan will maintain 2 pools of person data:

Un-authorized person data

  • i.e. names, name variants and respective affiliations as provided during submission or import of data by users
  • the data pool will be created by using the current name variants of publication items on pubman and will be continuosly growing by new entries (by submission or import)
  • The user who submits data manually, will be supported by an autosuggest list containing all currently available authorized and un-authorized data, to reduce duplicate entries (see below Scenarios)
  • The person information stored with a publication item will not be altered, i.e. the person information stored with a publication item can be un-authorized data, can be already authorized data. In case of un-authorized data, the unauthorized nameID can be related afterwards to an authorized PersonID in CONEControl of Named Entities.
  • Therefore, the publication item will contain only one possible naming variant, with specific affiliations at a given point in time, which is the one provided during submission/import.
  • The entries in the un-authorized pool of data, i.e. including all possible naming variants of the same person, will have internal IDIdentifier, to be linked to CONEControl of Named Entities IDs.
    • to be checked if during the linking, the Cone authorized persons will have the un-authorized name variants added as alternative names --Natasa 16:45, 13 January 2009 (UTCCoordinated Universal Time)

Authorized person data

  • contains main name entry, controlled alternative name variants, controlled affiliation (see data structure)
  • will be controlled by selected users via edition of the CONEControl of Named Entities service data
  • Only selected users can access the CONEControl of Named Entities service to edit and maintain the controlled person IDs
  • Selected users can relate un-auhtorized person name IDs to permanent, controlled, authorized person IDs in CONEControl of Named Entities.

An example how to distinguish these two lists in the presentation can be found here.


These scenarios, and any related use case has to be crosschecked with preliminary functional specification!!--Ulla 19:05, 24 August 2009 (UTCCoordinated Universal Time)


Status: in specification

Schedule: tbd

During Submission (either easy or full submission), user can enter any name variant. Either s/he follows the "Autopsie Prinzip" and copies the name variant directly from and strictly following the typing on the original copy. Alternatively, to increase data quality, s/he can choose a name variant, including an affiliation, from the auto-suggest list for persons. These values can be un-authorized or authorized person data.

  • The User can select a value from the autosuggest list and store the item. In the process of selection from auto-suggest list, he is supported by information, which of the suggested values is an "un-authorized" and what is "auhtorized" person data.(Example worldCat). After selection of an value, the publication item contains IDIdentifier and value of selected IDIdentifier.
  • The user can select an un-authorized name from the autosuggest list, but overwrites the provided value. A new un-authorized nameID is created, without relation to the previosly selected nameID.
  • The user can select an Authorized person IDIdentifier. He is not allowed to overwrite the value, but he can create a "potential candidate" for a new naming variant of the authorized person IDIdentifier.
  • The user can ignore the autosuggest list and enter whatever value for the person. A new un-authorized nameId is created.


Status: in specification

Schedule: tbd

During import of publication data (single references or multiple references) and during copy&paste of person data, user can enter any name variant provided by the external source.

  • Any person data created is automatically created with new ConeID (Growing index)
  • Optionally, a matching algorithm is provided, to match the imported person data with already existing person data. User can decide if he wants to search for matches on IDIdentifier, given name, family name, affiliation and match the imported data with the existing data. (Handling of possible matches migth be similar to Duplicate Check Handling)

Import person data eDoc2PubMan

Status: in specification

Schedule: tbd

If person data are available in a structured format, we can provide a batch operation for assignment of authoritative entries for person entities, and load the person data separately to CoNe, before the institute starts using PubManPublication Management productively.


Status: in specification

Schedule: tbd

By default, any search triggered via Quick search, Advanced search or Search&Export service will search in both un-authorized nameIDs and authorized CONEControl of Named Entities-IDs. In addition, user can specify, if he/she wants to do a search for exact match.

On the PubManPublication Management GUIGraphical User Interface for search results, user should be indicated, what was the final query (exact match or including variants) to understand the result list.

  • Optionally: If user searches for exact match, he gets the information that naming variants exists ("Did you mean...."-feature)
  • Optionally: User gets indicated the number of records available for the exact match and the related variants
  • Optionally: User can specify if he wants to search for variants based on different handling of german Umlaute. Theoretically, variants like "Buechner" and "Buchner" for entity "Büchner" should be modeled as name variants. Still, we can assume, that this will not always be the case, as external sources such as WOS in general cut out the german Umlaute, and it will depend on user efforts, to "clean up". Therefore, the feature to allow the user to specify "Did you mean /search also for Buchner and Buechner" might help.

View researcher portfolio/profile

Status: implemented

Schedule: R4.1

The search triggered by "View researcher profile" is searching exclusively on CONEControl of Named Entities PersonIDs, as the service "view researcher portfolio" is bound to an authorized CONEControl of Named Entities PersonID. Check related use case for details.

Edit researcher portfolio

Status: in specification

Schedule: tbd

  • The User can edit his researcher portfolio

Check related scenarios here

Export researcher portfolio

Status: in specification

Schedule: tbd

The user can export his researcher portfolio in RDFResource Description Framework (as FOAFFriend of a Friend Project profile) Check related scenarios here

Edit CONEControl of Named Entities

Status: in specification

Schedule: tbd

Only privileged users can define, which of the un-authorized names/naming variants relate to one specific entity. They should be able to

  • create an authorized entry
  • relate naming variants to this authorized entry
  • edit the personal data related to a PersonID
  • define which of this personal data should be visible on the researcher portfolio
  • add new naming variants to a Person IDIdentifier (e.g. "potential candidates", cf submission)
  • look-up Person data in external authority files (Researcher ID, WorldCat, Kaken, PND)
  • batch operations for cleaning up
  1. search/Browse for a person name as string
  2. get a report of all released publication items which contain this string
  3. get information on all ConeIDs within these publications
  4. get information of other ConeIDs for this string

Open questions

  • in case of import of references (bibtex, endnote, fetch md), can we combine it with an alternative to "autosuggest", i.e. to avoid duplicate entries?
    • One could think of checking the given name for matches in both pools and then give a message to the user about possible controlled names which he can alter in the edit item mask.--Kleinfercher 08:50, 19 December 2008 (UTCCoordinated Universal Time)
  • does it make sense to provide additional extension on view item page, to search for "all publications of this author"? Is actually same scenario as search, but in addition to start searching in quick search, user would have option to trigger search right from view item details (i.e. name of person)
Comment Nicole: I think it makes sense to offer a link "all publications of this author", as we also had this in eDocElectronic Documentation and I think it was used quite often. The only question would then be, if this will return into an exact search then or not. --Nicole 08:32, 18 December 2008 (UTCCoordinated Universal Time)

Possible solution: Linked person names in view item triggers exact search for this person name and provides pubman results. (similar to edoc). Complementary, icons for researcher portfolios are provided for those persons with CONEControl of Named Entities Id.--Ulla 14:36, 18 December 2008 (UTCCoordinated Universal Time)

Data for CoNEControl of Named Entities Person

The current namespaces and terms used to describe a CoNe person can be found here

External Resources

As potential external resources, following sources can be considered for Person(s), i.e. full name of persons (authors, editors, referees, etc.)

Name of service Scope Info Formats supported Interfaces Costs Access
Library of Congress Name Authority Service To be evaluated in detail

(likely not to cover too many MPGMax-Planck-Gesellschaft authors)



MARCXML SOAPSimple Object Access Protocol

WSDLWeb Services Description Language

Records are free of charge[1] via web site
Personennormdatei (PNDPersonen Normdatei) ca. 2,6 mio names (1 mio with individualized records)

To be evaluated in detail (likely not to cover too many MPGMax-Planck-Gesellschaft authors)

Introduction MAB2


Z39.50 PNDPersonen Normdatei, GKDGemeinsame Körperschaftsdatei and SWDSchlagwort Datei only in combination available


CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July
PNDPersonen Normdatei is licensed in MPSMax Planck Society, database is available via the AlephIntegrated Library System by ExLibris server, more info
Virtual International Authority File (VIAF) First prototype covers LC and DNBDeutsche Nationalbibliothek personal name authority and related bibliographic records project web site MARC21 (?) Prototype system available at:

Computer Science Bibliography (DBLP) Computer Science HTMLHypertext Markup Language, XMLExtensible Markup Language
Wikipedia Persondata info data dump HTMLHypertext Markup Language

Related links

good overview on standards in use (international standards, library-derived systems, commercial systems)
Cite error: <ref> tags exist, but no <references/> tag was found