Difference between revisions of "Managing CoNE entities - Persons"
(trac URLs changed) |
|||
(31 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
Scenarios, use cases and data structure for extending the eSciDoc CONE service with data on persons, especially in the context of [[Portal:PubMan |eSciDoc.PubMan]]. | |||
==Concepts== | |||
===Growing index=== | |||
'''Status: in specification''' | |||
'''Schedule: tbd''' | |||
Whatever submission method is applied (manual submission, import, copy&paste author information), every creator information gets a ConeID. Optionally, matching algorithms can be implemented, to match submittted/ingested creator information to alreaddy existing Cone entities (e.g. based on given name, family name and affiliation) | |||
In subsequent quality assurance actions, privileged users can improve/manage the person entities in Cone, and | |||
*define authorized entries | |||
*relate various ConeIDs to one authorized entry | |||
*edit the person information for a specific authorized entry (e.g. current/former/additional affiliations, additional naming variants, additional external IDs, such as local IDs, ResearcherID, KakenID etc.) | |||
Based on this growing index, [[Portal:PubMan |eSciDoc.PubMan]] will maintain 2 pools of person data: | |||
[[Portal:PubMan |eSciDoc.PubMan]] will maintain 2 pools of person data: | |||
===Un-authorized person data=== | ===Un-authorized person data=== | ||
*i.e. names, name variants and respective affiliations as provided during submission or import of data by users | *i.e. names, name variants and respective affiliations as provided during submission or import of data by users | ||
*the data pool will be created by using the current name variants of publication items on pubman and will be continuosly growing by new entries (by submission or import) | *the data pool will be created by using the current name variants of publication items on pubman and will be continuosly growing by new entries (by submission or import) | ||
*The user who submits data manually, will be supported by an autosuggest list containing all currently available un-authorized data, to reduce duplicate entries (see below Scenarios) | *The user who submits data manually, will be supported by an autosuggest list containing all currently available authorized and un-authorized data, to reduce duplicate entries (see below Scenarios) | ||
*The person information stored with a publication item will not be altered, i.e. the person information stored with a publication item can be un-authorized data, can be already authorized data. In case of un-authorized data, the unauthorized nameID can be related afterwards to an authorized PersonID in CONE. | *The person information stored with a publication item will not be altered, i.e. the person information stored with a publication item can be un-authorized data, can be already authorized data. In case of un-authorized data, the unauthorized nameID can be related afterwards to an authorized PersonID in CONE. | ||
*Therefore, the publication item will contain only one possible naming variant, with specific affiliations at a given point in time, which is the one provided during submission/import. | *Therefore, the publication item will contain only one possible naming variant, with specific affiliations at a given point in time, which is the one provided during submission/import. | ||
Line 24: | Line 33: | ||
==Scenarios== | ==Scenarios== | ||
These scenarios, and any related use case has to be crosschecked with [[Talk:Service_for_Control_of_Named_Entities |preliminary functional specification]]!!--[[User:Uat|Ulla]] 19:05, 24 August 2009 (UTC) | |||
===Submission=== | ===Submission=== | ||
'''Status: in specification''' | '''Status: in specification''' | ||
'''Schedule: | '''Schedule: tbd''' | ||
During Submission (either easy or full submission), user can enter any name variant. Either s/he follows the "Autopsie Prinzip" and copies the name variant directly from and strictly following the typing on the original copy. Alternatively, to increase data quality, s/he can choose a name variant, including an affiliation, from the auto-suggest list for persons. These values can be un-authorized or authorized person data. | During Submission (either easy or full submission), user can enter any name variant. Either s/he follows the "Autopsie Prinzip" and copies the name variant directly from and strictly following the typing on the original copy. Alternatively, to increase data quality, s/he can choose a name variant, including an affiliation, from the auto-suggest list for persons. These values can be un-authorized or authorized person data. | ||
Line 38: | Line 50: | ||
*The user can ignore the autosuggest list and enter whatever value for the person. A new un-authorized nameId is created. | *The user can ignore the autosuggest list and enter whatever value for the person. A new un-authorized nameId is created. | ||
===Import/Copy&paste=== | |||
'''Status: in specification''' | |||
'''Schedule: tbd''' | |||
During import of publication data (single references or multiple references) and during copy&paste of person data, user can enter any name variant provided by the external source. | |||
*Any person data created is automatically created with new ConeID (Growing index) | |||
*Optionally, a matching algorithm is provided, to match the imported person data with already existing person data. User can decide if he wants to search for matches on ID, given name, family name, affiliation and match the imported data with the existing data. (Handling of possible matches migth be similar to Duplicate Check Handling) | |||
===Import person data eDoc2PubMan=== | |||
'''Status: in specification''' | |||
'''Schedule: tbd''' | |||
If person data are available in a structured format, we can provide a batch operation for assignment of authoritative entries for person entities, and load the person data separately to CoNe, before the institute starts using PubMan productively. | |||
===Search=== | ===Search=== | ||
'''Status: in specification''' | '''Status: in specification''' | ||
'''Schedule: | '''Schedule: tbd''' | ||
By default, any search triggered via Quick search, Advanced search or Search&Export service will search in both un-authorized nameIDs and authorized CONE-IDs. | |||
In addition, user can specify, if he/she wants to do a search for exact match. | |||
On the PubMan GUI for search results, user should be indicated, what was the final query (exact match or including variants) to understand the result list. | |||
*Optionally: If user searches for exact match, he gets the information that naming variants exists ("Did you mean...."-feature) | |||
*Optionally: User gets indicated the number of records available for the exact match and the related variants | |||
*Optionally: User can specify if he wants to search for variants based on different handling of german Umlaute. Theoretically, variants like "Buechner" and "Buchner" for entity "Büchner" should be modeled as name variants. Still, we can assume, that this will not always be the case, as external sources such as WOS in general cut out the german Umlaute, and it will depend on user efforts, to "clean up". Therefore, the feature to allow the user to specify "Did you mean /search also for Buchner and Buechner" might help. | |||
===View researcher portfolio/profile=== | ===View researcher portfolio/profile=== | ||
'''Status: | '''Status: implemented''' | ||
'''Schedule: R4.1''' | '''Schedule: R4.1''' | ||
The search triggered by [[Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile |"View researcher profile"]] is searching exclusively on CONE PersonIDs, as the service "view researcher portfolio" is bound to an authorized CONE PersonID. Check related use case for details. http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile | The search triggered by [[Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile |"View researcher profile"]] is searching exclusively on CONE PersonIDs, as the service "view researcher portfolio" is bound to an authorized CONE PersonID. Check related use case for details. http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile | ||
===Edit researcher portfolio=== | |||
'''Status: in specification''' | |||
'''Schedule: tbd''' | |||
*The User can edit his researcher portfolio | |||
Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development | |||
===Export researcher portfolio=== | |||
'''Status: in specification''' | |||
'''Schedule: tbd''' | |||
The user can export his researcher portfolio in RDF (as FOAF profile) | |||
Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development | |||
===Edit CONE=== | ===Edit CONE=== | ||
'''Status: in specification''' | '''Status: in specification''' | ||
'''Schedule: | '''Schedule: tbd''' | ||
Only | Only privileged users can define, which of the un-authorized names/naming variants relate to one specific entity. They should be able to | ||
*create | *create an authorized entry | ||
*relate naming variants to this authorized entry | |||
*edit the personal data related to a PersonID | *edit the personal data related to a PersonID | ||
* | *define which of this personal data should be visible on the researcher portfolio | ||
*add new naming variants to a Person ID (e.g. "potential candidates", cf submission) | *add new naming variants to a Person ID (e.g. "potential candidates", cf submission) | ||
*look-up | *look-up Person data in external authority files ([http://www.researcherid.com/ Researcher ID], [http://www.worldcat.org WorldCat], Kaken, [http://www.d-nb.de/standardisierung/normdateien/pnd.htm PND]) | ||
*batch operations for cleaning up | |||
#search/Browse for a person name as string | |||
#get a report of all released publication items which contain this string | |||
#get information on all ConeIDs within these publications | |||
#get information of other ConeIDs for this string | |||
===Open questions=== | ===Open questions=== | ||
Line 77: | Line 133: | ||
::'''Comment Nicole:''' I think it makes sense to offer a link "all publications of this author", as we also had this in eDoc and I think it was used quite often. The only question would then be, if this will return into an exact search then or not. --[[User:Nicole|Nicole]] 08:32, 18 December 2008 (UTC) | ::'''Comment Nicole:''' I think it makes sense to offer a link "all publications of this author", as we also had this in eDoc and I think it was used quite often. The only question would then be, if this will return into an exact search then or not. --[[User:Nicole|Nicole]] 08:32, 18 December 2008 (UTC) | ||
Possible solution: Linked person names in view item triggers exact search for this person name and provides pubman results. (similar to edoc). Complementary, icons for researcher portfolios are provided for those persons with CONE Id.--[[User:Uat|Ulla]] 14:36, 18 December 2008 (UTC) | Possible solution: Linked person names in view item triggers exact search for this person name and provides pubman results. (similar to edoc). Complementary, icons for researcher portfolios are provided for those persons with CONE Id.--[[User:Uat|Ulla]] 14:36, 18 December 2008 (UTC) | ||
== Data for CoNE Person== | |||
The current namespaces and terms used to describe a CoNe person can be found [[CoNE_Person |here]] | |||
= | =External Resources= | ||
As potential external resources, following sources can be considered for Person(s), i.e. full name of persons (authors, editors, referees, etc.) | |||
== | {|style="font-size=50%" border="1" | ||
! Name of service | |||
! Scope | |||
! Info | |||
! Formats supported | |||
! Interfaces | |||
! Costs | |||
! Access | |||
|- | |||
| Library of Congress Name Authority Service | |||
| To be evaluated in detail | |||
(likely not to cover too many MPG authors) | |||
| [http://www.oclc.org/research/researchworks/authority/default.htm Introduction] | |||
[http://alcme.oclc.org/eprintsUK/services/NACOMatch?wsdl WSDL] | |||
http://authorities.loc.gov | |||
|MARCXML | |||
| SOAP | |||
WSDL | |||
|Records are free of charge<ref name="loc">Information from web site: "users do not have to register or request permission to search, save, print, or email the LC authority records. The only limitation is that authority records may only be saved, printed or emailed one at a time."</ref> | |||
|via [http://authorities.loc.gov web site] | |||
|- | |||
|Personennormdatei (PND) | |||
|ca. 2,6 mio names (1 mio with individualized records) | |||
To be evaluated in detail (likely not to cover too many MPG authors) | |||
| [http://www.ddb.de/standardisierung/normdateien/pnd.htm Introduction] | |||
|MAB2 | |||
USMARC | |||
SUTRS | |||
|Z39.50 | |||
|PND, GKD and SWD only in combination available | |||
[http://www.ddb.de/service/pdf/normdaten_cd.pdf costs] | |||
|CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July <br/> PND is licensed in MPS, database is available via the Aleph server, [https://devtools.mpdl.mpg.de/projects/vlib/wiki/AuthFiles more info] | |||
|- | |||
|Virtual International Authority File (VIAF) | |||
|First prototype covers LC and DNB personal name authority and related bibliographic records | |||
|[http://www.oclc.org/research/projects/viaf project web site] | |||
|MARC21 (?) | |||
| | |||
| | |||
|Prototype system available at: | |||
http://viaf.org | |||
|- | |||
|Computer Science Bibliography (DBLP) | |||
|Computer Science | |||
|http://dblp.uni-trier.de | |||
|HTML, XML | |||
| | |||
| | |||
| | |||
|- | |||
|Wikipedia Persondata | |||
| | |||
|[http://en.wikipedia.org/wiki/Wikipedia:Persondata info] [http://download.wikimedia.org/ data dump] | |||
|HTML | |||
| | |||
| | |||
| | |||
|} | |||
==Related links== | ==Related links== | ||
*[https:// | *[https://devtools.mpdl.mpg.de/projects/vlib/wiki/AuthFiles Summary and overview on authority files in the context of MPG] | ||
*[http://names.mimas.ac.uk/documents/LandscapeReport26Jun2008.pdf A review of the current landscape in relation to a proposed Name Authority Service for UK repositories of research outputs] | *[http://names.mimas.ac.uk/documents/LandscapeReport26Jun2008.pdf A review of the current landscape in relation to a proposed Name Authority Service for UK repositories of research outputs] | ||
Line 121: | Line 211: | ||
[[Category: | [[Category:CoNE]] | ||
Latest revision as of 09:28, 14 August 2012
Scenarios, use cases and data structure for extending the eSciDoc CONE service with data on persons, especially in the context of eSciDoc.PubMan.
Concepts[edit]
Growing index[edit]
Status: in specification
Schedule: tbd
Whatever submission method is applied (manual submission, import, copy&paste author information), every creator information gets a ConeID. Optionally, matching algorithms can be implemented, to match submittted/ingested creator information to alreaddy existing Cone entities (e.g. based on given name, family name and affiliation) In subsequent quality assurance actions, privileged users can improve/manage the person entities in Cone, and
- define authorized entries
- relate various ConeIDs to one authorized entry
- edit the person information for a specific authorized entry (e.g. current/former/additional affiliations, additional naming variants, additional external IDs, such as local IDs, ResearcherID, KakenID etc.)
Based on this growing index, eSciDoc.PubMan will maintain 2 pools of person data:
Un-authorized person data[edit]
- i.e. names, name variants and respective affiliations as provided during submission or import of data by users
- the data pool will be created by using the current name variants of publication items on pubman and will be continuosly growing by new entries (by submission or import)
- The user who submits data manually, will be supported by an autosuggest list containing all currently available authorized and un-authorized data, to reduce duplicate entries (see below Scenarios)
- The person information stored with a publication item will not be altered, i.e. the person information stored with a publication item can be un-authorized data, can be already authorized data. In case of un-authorized data, the unauthorized nameID can be related afterwards to an authorized PersonID in CONE.
- Therefore, the publication item will contain only one possible naming variant, with specific affiliations at a given point in time, which is the one provided during submission/import.
- The entries in the un-authorized pool of data, i.e. including all possible naming variants of the same person, will have internal ID, to be linked to CONE IDs.
- to be checked if during the linking, the Cone authorized persons will have the un-authorized name variants added as alternative names --Natasa 16:45, 13 January 2009 (UTC)
Authorized person data[edit]
- contains main name entry, controlled alternative name variants, controlled affiliation (see data structure)
- will be controlled by selected users via edition of the CONE service data
- Only selected users can access the CONE service to edit and maintain the controlled person IDs
- Selected users can relate un-auhtorized person name IDs to permanent, controlled, authorized person IDs in CONE.
An example how to distinguish these two lists in the presentation can be found here.
Scenarios[edit]
These scenarios, and any related use case has to be crosschecked with preliminary functional specification!!--Ulla 19:05, 24 August 2009 (UTC)
Submission[edit]
Status: in specification
Schedule: tbd
During Submission (either easy or full submission), user can enter any name variant. Either s/he follows the "Autopsie Prinzip" and copies the name variant directly from and strictly following the typing on the original copy. Alternatively, to increase data quality, s/he can choose a name variant, including an affiliation, from the auto-suggest list for persons. These values can be un-authorized or authorized person data.
- The User can select a value from the autosuggest list and store the item. In the process of selection from auto-suggest list, he is supported by information, which of the suggested values is an "un-authorized" and what is "auhtorized" person data.(Example worldCat). After selection of an value, the publication item contains ID and value of selected ID.
- The user can select an un-authorized name from the autosuggest list, but overwrites the provided value. A new un-authorized nameID is created, without relation to the previosly selected nameID.
- The user can select an Authorized person ID. He is not allowed to overwrite the value, but he can create a "potential candidate" for a new naming variant of the authorized person ID.
- The user can ignore the autosuggest list and enter whatever value for the person. A new un-authorized nameId is created.
Import/Copy&paste[edit]
Status: in specification
Schedule: tbd
During import of publication data (single references or multiple references) and during copy&paste of person data, user can enter any name variant provided by the external source.
- Any person data created is automatically created with new ConeID (Growing index)
- Optionally, a matching algorithm is provided, to match the imported person data with already existing person data. User can decide if he wants to search for matches on ID, given name, family name, affiliation and match the imported data with the existing data. (Handling of possible matches migth be similar to Duplicate Check Handling)
Import person data eDoc2PubMan[edit]
Status: in specification
Schedule: tbd
If person data are available in a structured format, we can provide a batch operation for assignment of authoritative entries for person entities, and load the person data separately to CoNe, before the institute starts using PubMan productively.
Search[edit]
Status: in specification
Schedule: tbd
By default, any search triggered via Quick search, Advanced search or Search&Export service will search in both un-authorized nameIDs and authorized CONE-IDs. In addition, user can specify, if he/she wants to do a search for exact match.
On the PubMan GUI for search results, user should be indicated, what was the final query (exact match or including variants) to understand the result list.
- Optionally: If user searches for exact match, he gets the information that naming variants exists ("Did you mean...."-feature)
- Optionally: User gets indicated the number of records available for the exact match and the related variants
- Optionally: User can specify if he wants to search for variants based on different handling of german Umlaute. Theoretically, variants like "Buechner" and "Buchner" for entity "Büchner" should be modeled as name variants. Still, we can assume, that this will not always be the case, as external sources such as WOS in general cut out the german Umlaute, and it will depend on user efforts, to "clean up". Therefore, the feature to allow the user to specify "Did you mean /search also for Buchner and Buechner" might help.
View researcher portfolio/profile[edit]
Status: implemented
Schedule: R4.1
The search triggered by "View researcher profile" is searching exclusively on CONE PersonIDs, as the service "view researcher portfolio" is bound to an authorized CONE PersonID. Check related use case for details. http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#UC_View_researcher_portfolio.2Fprofile
Edit researcher portfolio[edit]
Status: in specification
Schedule: tbd
- The User can edit his researcher portfolio
Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development
Export researcher portfolio[edit]
Status: in specification
Schedule: tbd
The user can export his researcher portfolio in RDF (as FOAF profile) Check related scenarios here http://colab.mpdl.mpg.de/mediawiki/Researcher_Portfolio#Future_development
Edit CONE[edit]
Status: in specification
Schedule: tbd
Only privileged users can define, which of the un-authorized names/naming variants relate to one specific entity. They should be able to
- create an authorized entry
- relate naming variants to this authorized entry
- edit the personal data related to a PersonID
- define which of this personal data should be visible on the researcher portfolio
- add new naming variants to a Person ID (e.g. "potential candidates", cf submission)
- look-up Person data in external authority files (Researcher ID, WorldCat, Kaken, PND)
- batch operations for cleaning up
- search/Browse for a person name as string
- get a report of all released publication items which contain this string
- get information on all ConeIDs within these publications
- get information of other ConeIDs for this string
Open questions[edit]
- in case of import of references (bibtex, endnote, fetch md), can we combine it with an alternative to "autosuggest", i.e. to avoid duplicate entries?
- One could think of checking the given name for matches in both pools and then give a message to the user about possible controlled names which he can alter in the edit item mask.--Kleinfercher 08:50, 19 December 2008 (UTC)
- does it make sense to provide additional extension on view item page, to search for "all publications of this author"? Is actually same scenario as search, but in addition to start searching in quick search, user would have option to trigger search right from view item details (i.e. name of person)
- Comment Nicole: I think it makes sense to offer a link "all publications of this author", as we also had this in eDoc and I think it was used quite often. The only question would then be, if this will return into an exact search then or not. --Nicole 08:32, 18 December 2008 (UTC)
Possible solution: Linked person names in view item triggers exact search for this person name and provides pubman results. (similar to edoc). Complementary, icons for researcher portfolios are provided for those persons with CONE Id.--Ulla 14:36, 18 December 2008 (UTC)
Data for CoNE Person[edit]
The current namespaces and terms used to describe a CoNe person can be found here
External Resources[edit]
As potential external resources, following sources can be considered for Person(s), i.e. full name of persons (authors, editors, referees, etc.)
Name of service | Scope | Info | Formats supported | Interfaces | Costs | Access |
---|---|---|---|---|---|---|
Library of Congress Name Authority Service | To be evaluated in detail
(likely not to cover too many MPG authors) |
Introduction | MARCXML | SOAP
WSDL |
Records are free of charge[1] | via web site |
Personennormdatei (PND) | ca. 2,6 mio names (1 mio with individualized records)
To be evaluated in detail (likely not to cover too many MPG authors) |
Introduction | MAB2
USMARC SUTRS |
Z39.50 | PND, GKD and SWD only in combination available | CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July PND is licensed in MPS, database is available via the Aleph server, more info |
Virtual International Authority File (VIAF) | First prototype covers LC and DNB personal name authority and related bibliographic records | project web site | MARC21 (?) | Prototype system available at: | ||
Computer Science Bibliography (DBLP) | Computer Science | http://dblp.uni-trier.de | HTML, XML | |||
Wikipedia Persondata | info data dump | HTML |
Related links[edit]
good overview on standards in use (international standards, library-derived systems, commercial systems)
- ↑ Information from web site: "users do not have to register or request permission to search, save, print, or email the LC authority records. The only limitation is that authority records may only be saved, printed or emailed one at a time."