Difference between revisions of "Managing CoNE entities - Persons"

From MPDLMediaWiki
Jump to navigation Jump to search
(trac URLs changed)
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Scenarios, use cases and data structure for extending the eSciDoc CONE service with data on persons, especially in the context of [[Portal:PubMan |eSciDoc.PubMan]].  
Scenarios, use cases and data structure for extending the eSciDoc CONE service with data on persons, especially in the context of [[Portal:PubMan |eSciDoc.PubMan]].  


==Data pools==
==Concepts==
[[Portal:PubMan |eSciDoc.PubMan]] will maintain 2 pools of person data:
===Growing index===
'''Status: in specification'''
 
'''Schedule: tbd'''
 
Whatever submission method is applied (manual submission, import, copy&paste author information), every creator information gets a ConeID. Optionally, matching algorithms can be implemented, to match submittted/ingested creator information to alreaddy existing Cone entities (e.g. based on given name, family name and affiliation)
In subsequent quality assurance actions, privileged users can improve/manage the person entities in Cone, and
*define authorized entries
*relate various ConeIDs to one authorized entry
*edit the person information for a specific authorized entry (e.g. current/former/additional affiliations, additional naming variants, additional external IDs, such as local IDs, ResearcherID, KakenID etc.)
 
Based on this growing index, [[Portal:PubMan |eSciDoc.PubMan]] will maintain 2 pools of person data:
===Un-authorized person data===
===Un-authorized person data===
*i.e. names, name variants and respective affiliations as provided during submission or import of data by users
*i.e. names, name variants and respective affiliations as provided during submission or import of data by users
*the data pool will be created by using the current name variants of publication items on pubman and will be continuosly growing by new entries (by submission or import)
*the data pool will be created by using the current name variants of publication items on pubman and will be continuosly growing by new entries (by submission or import)
*The user who submits data manually, will be supported by an autosuggest list containing all currently available un-authorized data, to reduce duplicate entries (see below Scenarios)
*The user who submits data manually, will be supported by an autosuggest list containing all currently available authorized and un-authorized data, to reduce duplicate entries (see below Scenarios)
*The person information stored with a publication item will not be altered, i.e. the person information stored with a publication item can be un-authorized data, can be already authorized data. In case of un-authorized data, the unauthorized nameID can be related afterwards to an authorized PersonID in CONE.
*The person information stored with a publication item will not be altered, i.e. the person information stored with a publication item can be un-authorized data, can be already authorized data. In case of un-authorized data, the unauthorized nameID can be related afterwards to an authorized PersonID in CONE.
*Therefore, the publication item will contain only one possible naming variant, with specific affiliations at a given point in time, which is the one provided during submission/import.
*Therefore, the publication item will contain only one possible naming variant, with specific affiliations at a given point in time, which is the one provided during submission/import.
Line 23: Line 34:
==Scenarios==
==Scenarios==
These scenarios, and any related use case has to be crosschecked with [[Talk:Service_for_Control_of_Named_Entities |preliminary functional specification]]!!--[[User:Uat|Ulla]] 19:05, 24 August 2009 (UTC)
These scenarios, and any related use case has to be crosschecked with [[Talk:Service_for_Control_of_Named_Entities |preliminary functional specification]]!!--[[User:Uat|Ulla]] 19:05, 24 August 2009 (UTC)


===Submission===
===Submission===
'''Status: in specification'''  
'''Status: in specification'''  


'''Schedule: R4.1'''
'''Schedule: tbd'''


During Submission (either easy or full submission), user can enter any name variant. Either s/he follows the "Autopsie Prinzip" and copies the name variant directly from and strictly following the typing on the original copy. Alternatively, to increase data quality, s/he can choose a name variant, including an affiliation, from the auto-suggest list for persons. These values can be un-authorized or authorized person data.  
During Submission (either easy or full submission), user can enter any name variant. Either s/he follows the "Autopsie Prinzip" and copies the name variant directly from and strictly following the typing on the original copy. Alternatively, to increase data quality, s/he can choose a name variant, including an affiliation, from the auto-suggest list for persons. These values can be un-authorized or authorized person data.  
Line 38: Line 50:


*The user can ignore the autosuggest list and enter whatever value for the person. A new un-authorized nameId is created.
*The user can ignore the autosuggest list and enter whatever value for the person. A new un-authorized nameId is created.
===Import/Copy&paste===
'''Status: in specification'''
'''Schedule: tbd'''
During import of publication data (single references or multiple references) and during copy&paste of person data, user can enter any name variant provided by the external source.
*Any person data created is automatically created with new ConeID (Growing index)
*Optionally, a matching algorithm is provided, to match the imported person data with already existing person data. User can decide if he wants to search for matches on ID, given name, family name, affiliation and match the imported data with the existing data. (Handling of possible matches migth be similar to Duplicate Check Handling)
===Import person data eDoc2PubMan===
'''Status: in specification'''
'''Schedule: tbd'''
If person data are available in a structured format, we can provide a batch operation for assignment of authoritative entries for person entities, and load the person data separately to CoNe, before the institute starts using PubMan productively.


===Search===
===Search===
'''Status: in specification'''  
'''Status: in specification'''  


'''Schedule: R4.1'''
'''Schedule: tbd'''
 
By default, any search triggered via Quick search, Advanced search or Search&Export service will search in both un-authorized nameIDs and authorized CONE-IDs.
In addition, user can specify, if he/she wants to do a search for exact match.


Any search triggered via Quick search, Advanced search or Search&Export service will search in both un-authorized nameIDs and authorized CONE-IDs.
On the PubMan GUI for search results, user should be indicated, what was the final query (exact match or including variants) to understand the result list.  


On the PubMan GUI for search results, user should be indicated, that other possible naming variants exist.
*Optionally: If user searches for exact match, he gets the information that naming variants exists ("Did you mean...."-feature)
*Optionally: User gets indicated the number of records available for the exact match and the related variants
*Optionally: User can specify if he wants to search for variants based on different handling of german Umlaute. Theoretically, variants like "Buechner" and "Buchner" for entity "Büchner" should be modeled as name variants. Still, we can assume, that this will not always be the case, as external sources such as WOS in general cut out the german Umlaute, and it will depend on user efforts, to "clean up". Therefore, the feature to allow the user to specify "Did you mean /search also for Buchner and Buechner" might help.


::'''Comment Nicole:''' I think it would be good if the user would get the following information: number of records found with a CONE ID, number of records found with unauthorized person ID, variants. The user should then be able to specify if s/he only wants to see a specific set of records or all. --[[User:Nicole|Nicole]] 08:30, 18 December 2008 (UTC)


::'''Questions Nicole:''' How/where shall we search if the user wants to perform an exact search? --[[User:Nicole|Nicole]] 08:30, 18 December 2008 (UTC)


===View researcher portfolio/profile===
===View researcher portfolio/profile===
'''Status: in design'''  
'''Status: implemented'''  


'''Schedule: R4.1'''
'''Schedule: R4.1'''


The search triggered by [[Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile |"View researcher profile"]] is searching exclusively on CONE PersonIDs, as the service "view researcher portfolio" is bound to an authorized CONE PersonID. Check related use case for details. http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile
The search triggered by [[Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile |"View researcher profile"]] is searching exclusively on CONE PersonIDs, as the service "view researcher portfolio" is bound to an authorized CONE PersonID. Check related use case for details. http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile
===Edit researcher portfolio===
'''Status: in specification'''
'''Schedule: tbd'''
*The User can edit his researcher portfolio
Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development
===Export researcher portfolio===
'''Status: in specification'''
'''Schedule: tbd'''
The user can export his researcher portfolio in RDF (as FOAF profile)
Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development


===Edit CONE===
===Edit CONE===
'''Status: in specification'''  
'''Status: in specification'''  


'''Schedule: R4.1/R4.2'''
'''Schedule: tbd'''


Only selected users can define, which of the un-authorized names/naming variants relate to one specific person. They should be able to
Only privileged users can define, which of the un-authorized names/naming variants relate to one specific entity. They should be able to
*create and release a PersonId in CONE (start content might be ingested?)
*create an authorized entry
*relate naming variants to this authorized entry
*edit the personal data related to a PersonID
*edit the personal data related to a PersonID
*relate one or many un-authorized nameIDs to one PersonID
*define which of this personal data should be visible on the researcher portfolio
*add new naming variants to a Person ID (e.g. "potential candidates", cf submission)
*add new naming variants to a Person ID (e.g. "potential candidates", cf submission)
*look-up Persons in external authority files ([http://www.researcherid.com/ Researcher ID], [http://www.worldcat.org WorldCat], Kaken, [http://www.d-nb.de/standardisierung/normdateien/pnd.htm PND])
*look-up Person data in external authority files ([http://www.researcherid.com/ Researcher ID], [http://www.worldcat.org WorldCat], Kaken, [http://www.d-nb.de/standardisierung/normdateien/pnd.htm PND])
*batch operations for cleaning up
#search/Browse for a person name as string
#get a report of all released publication items which contain this string
#get information on all ConeIDs within these publications
#get information of other ConeIDs for this string


===Open questions===
===Open questions===
Line 84: Line 140:




=== Comments ===
 
*The presentation of the researcher portfolio will be in English, an extra page in CONE service will be needed to provide a translation of the researcher portfolio in e.g. Japanese letters.
*NIMS wants it to be visible on the researcher profile who last modified it and when.


=External Resources=
=External Resources=
Line 122: Line 176:
|PND, GKD and SWD only in combination available
|PND, GKD and SWD only in combination available
[http://www.ddb.de/service/pdf/normdaten_cd.pdf costs]
[http://www.ddb.de/service/pdf/normdaten_cd.pdf costs]
|CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July <br/> PND is  licensed in MPS, database is available via the Aleph server<ref>https://dev.livingreviews.org/projects/vlib/wiki/AuthFiles</ref>
|CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July <br/> PND is  licensed in MPS, database is available via the Aleph server, [https://devtools.mpdl.mpg.de/projects/vlib/wiki/AuthFiles more info]
|-
|-
|Virtual International Authority File (VIAF)
|Virtual International Authority File (VIAF)
Line 150: Line 204:
|}
|}
==Related links==
==Related links==
*[https://dev.livingreviews.org/projects/vlib/wiki/AuthFiles  Summary and overview on authority files in the context of MPG]
*[https://devtools.mpdl.mpg.de/projects/vlib/wiki/AuthFiles  Summary and overview on authority files in the context of MPG]


*[http://names.mimas.ac.uk/documents/LandscapeReport26Jun2008.pdf  A review of the current landscape in relation to a proposed Name Authority Service for UK repositories of research outputs]  
*[http://names.mimas.ac.uk/documents/LandscapeReport26Jun2008.pdf  A review of the current landscape in relation to a proposed Name Authority Service for UK repositories of research outputs]  


good overview on standards in use (international standards, library-derived systems, commercial systems)
good overview on standards in use (international standards, library-derived systems, commercial systems)
[[Category:ESciDoc|Control of named entities]]
 
[[Category:ServiceOrientedArchitecture]]
 
[[Category:PubMan_Development|Control of Named Entities]]
[[Category:CoNE]]
[[Category:Metadata]]

Latest revision as of 09:28, 14 August 2012

Scenarios, use cases and data structure for extending the eSciDoc CONE service with data on persons, especially in the context of eSciDoc.PubMan.

Concepts[edit]

Growing index[edit]

Status: in specification

Schedule: tbd

Whatever submission method is applied (manual submission, import, copy&paste author information), every creator information gets a ConeID. Optionally, matching algorithms can be implemented, to match submittted/ingested creator information to alreaddy existing Cone entities (e.g. based on given name, family name and affiliation) In subsequent quality assurance actions, privileged users can improve/manage the person entities in Cone, and

  • define authorized entries
  • relate various ConeIDs to one authorized entry
  • edit the person information for a specific authorized entry (e.g. current/former/additional affiliations, additional naming variants, additional external IDs, such as local IDs, ResearcherID, KakenID etc.)

Based on this growing index, eSciDoc.PubMan will maintain 2 pools of person data:

Un-authorized person data[edit]

  • i.e. names, name variants and respective affiliations as provided during submission or import of data by users
  • the data pool will be created by using the current name variants of publication items on pubman and will be continuosly growing by new entries (by submission or import)
  • The user who submits data manually, will be supported by an autosuggest list containing all currently available authorized and un-authorized data, to reduce duplicate entries (see below Scenarios)
  • The person information stored with a publication item will not be altered, i.e. the person information stored with a publication item can be un-authorized data, can be already authorized data. In case of un-authorized data, the unauthorized nameID can be related afterwards to an authorized PersonID in CONE.
  • Therefore, the publication item will contain only one possible naming variant, with specific affiliations at a given point in time, which is the one provided during submission/import.
  • The entries in the un-authorized pool of data, i.e. including all possible naming variants of the same person, will have internal ID, to be linked to CONE IDs.
    • to be checked if during the linking, the Cone authorized persons will have the un-authorized name variants added as alternative names --Natasa 16:45, 13 January 2009 (UTC)

Authorized person data[edit]

  • contains main name entry, controlled alternative name variants, controlled affiliation (see data structure)
  • will be controlled by selected users via edition of the CONE service data
  • Only selected users can access the CONE service to edit and maintain the controlled person IDs
  • Selected users can relate un-auhtorized person name IDs to permanent, controlled, authorized person IDs in CONE.


An example how to distinguish these two lists in the presentation can be found here.

Scenarios[edit]

These scenarios, and any related use case has to be crosschecked with preliminary functional specification!!--Ulla 19:05, 24 August 2009 (UTC)


Submission[edit]

Status: in specification

Schedule: tbd

During Submission (either easy or full submission), user can enter any name variant. Either s/he follows the "Autopsie Prinzip" and copies the name variant directly from and strictly following the typing on the original copy. Alternatively, to increase data quality, s/he can choose a name variant, including an affiliation, from the auto-suggest list for persons. These values can be un-authorized or authorized person data.

  • The User can select a value from the autosuggest list and store the item. In the process of selection from auto-suggest list, he is supported by information, which of the suggested values is an "un-authorized" and what is "auhtorized" person data.(Example worldCat). After selection of an value, the publication item contains ID and value of selected ID.
  • The user can select an un-authorized name from the autosuggest list, but overwrites the provided value. A new un-authorized nameID is created, without relation to the previosly selected nameID.
  • The user can select an Authorized person ID. He is not allowed to overwrite the value, but he can create a "potential candidate" for a new naming variant of the authorized person ID.
  • The user can ignore the autosuggest list and enter whatever value for the person. A new un-authorized nameId is created.

Import/Copy&paste[edit]

Status: in specification

Schedule: tbd

During import of publication data (single references or multiple references) and during copy&paste of person data, user can enter any name variant provided by the external source.

  • Any person data created is automatically created with new ConeID (Growing index)
  • Optionally, a matching algorithm is provided, to match the imported person data with already existing person data. User can decide if he wants to search for matches on ID, given name, family name, affiliation and match the imported data with the existing data. (Handling of possible matches migth be similar to Duplicate Check Handling)

Import person data eDoc2PubMan[edit]

Status: in specification

Schedule: tbd

If person data are available in a structured format, we can provide a batch operation for assignment of authoritative entries for person entities, and load the person data separately to CoNe, before the institute starts using PubMan productively.

Search[edit]

Status: in specification

Schedule: tbd

By default, any search triggered via Quick search, Advanced search or Search&Export service will search in both un-authorized nameIDs and authorized CONE-IDs. In addition, user can specify, if he/she wants to do a search for exact match.

On the PubMan GUI for search results, user should be indicated, what was the final query (exact match or including variants) to understand the result list.

  • Optionally: If user searches for exact match, he gets the information that naming variants exists ("Did you mean...."-feature)
  • Optionally: User gets indicated the number of records available for the exact match and the related variants
  • Optionally: User can specify if he wants to search for variants based on different handling of german Umlaute. Theoretically, variants like "Buechner" and "Buchner" for entity "Büchner" should be modeled as name variants. Still, we can assume, that this will not always be the case, as external sources such as WOS in general cut out the german Umlaute, and it will depend on user efforts, to "clean up". Therefore, the feature to allow the user to specify "Did you mean /search also for Buchner and Buechner" might help.



View researcher portfolio/profile[edit]

Status: implemented

Schedule: R4.1

The search triggered by "View researcher profile" is searching exclusively on CONE PersonIDs, as the service "view researcher portfolio" is bound to an authorized CONE PersonID. Check related use case for details. http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile

Edit researcher portfolio[edit]

Status: in specification

Schedule: tbd

  • The User can edit his researcher portfolio

Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development

Export researcher portfolio[edit]

Status: in specification

Schedule: tbd

The user can export his researcher portfolio in RDF (as FOAF profile) Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development

Edit CONE[edit]

Status: in specification

Schedule: tbd

Only privileged users can define, which of the un-authorized names/naming variants relate to one specific entity. They should be able to

  • create an authorized entry
  • relate naming variants to this authorized entry
  • edit the personal data related to a PersonID
  • define which of this personal data should be visible on the researcher portfolio
  • add new naming variants to a Person ID (e.g. "potential candidates", cf submission)
  • look-up Person data in external authority files (Researcher ID, WorldCat, Kaken, PND)
  • batch operations for cleaning up
  1. search/Browse for a person name as string
  2. get a report of all released publication items which contain this string
  3. get information on all ConeIDs within these publications
  4. get information of other ConeIDs for this string

Open questions[edit]

  • in case of import of references (bibtex, endnote, fetch md), can we combine it with an alternative to "autosuggest", i.e. to avoid duplicate entries?
    • One could think of checking the given name for matches in both pools and then give a message to the user about possible controlled names which he can alter in the edit item mask.--Kleinfercher 08:50, 19 December 2008 (UTC)
  • does it make sense to provide additional extension on view item page, to search for "all publications of this author"? Is actually same scenario as search, but in addition to start searching in quick search, user would have option to trigger search right from view item details (i.e. name of person)
Comment Nicole: I think it makes sense to offer a link "all publications of this author", as we also had this in eDoc and I think it was used quite often. The only question would then be, if this will return into an exact search then or not. --Nicole 08:32, 18 December 2008 (UTC)

Possible solution: Linked person names in view item triggers exact search for this person name and provides pubman results. (similar to edoc). Complementary, icons for researcher portfolios are provided for those persons with CONE Id.--Ulla 14:36, 18 December 2008 (UTC)

Data for CoNE Person[edit]

The current namespaces and terms used to describe a CoNe person can be found here




External Resources[edit]

As potential external resources, following sources can be considered for Person(s), i.e. full name of persons (authors, editors, referees, etc.)

Name of service Scope Info Formats supported Interfaces Costs Access
Library of Congress Name Authority Service To be evaluated in detail

(likely not to cover too many MPG authors)

Introduction

WSDL http://authorities.loc.gov

MARCXML SOAP

WSDL

Records are free of charge[1] via web site
Personennormdatei (PND) ca. 2,6 mio names (1 mio with individualized records)

To be evaluated in detail (likely not to cover too many MPG authors)

Introduction MAB2

USMARC SUTRS

Z39.50 PND, GKD and SWD only in combination available

costs

CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July
PND is licensed in MPS, database is available via the Aleph server, more info
Virtual International Authority File (VIAF) First prototype covers LC and DNB personal name authority and related bibliographic records project web site MARC21 (?) Prototype system available at:

http://viaf.org

Computer Science Bibliography (DBLP) Computer Science http://dblp.uni-trier.de HTML, XML
Wikipedia Persondata info data dump HTML

Related links[edit]

good overview on standards in use (international standards, library-derived systems, commercial systems)

  1. Information from web site: "users do not have to register or request permission to search, save, print, or email the LC authority records. The only limitation is that authority records may only be saved, printed or emailed one at a time."