Difference between revisions of "Talk:Service for Control of Named Entities"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 117: Line 117:
'''Status''': The current implementation does not assume manual additions of journal records directly via CONE, but an automatic ingestion from the SFX knowledge base only. Therefore, the SFX-ID has been used as primary identifier by CONE so far, but this proves to be no reliable solution, e.b. because SFX-IDs are maintained by ExLibris' team only (-> Not all journals will have an SFX-ID already - because they are too new, etc.)
'''Status''': The current implementation does not assume manual additions of journal records directly via CONE, but an automatic ingestion from the SFX knowledge base only. Therefore, the SFX-ID has been used as primary identifier by CONE so far, but this proves to be no reliable solution, e.b. because SFX-IDs are maintained by ExLibris' team only (-> Not all journals will have an SFX-ID already - because they are too new, etc.)


: An ID other than SFX should be used as unique ID, as it is not possible to find out SFX-IDs for journal names, which one wants to add. Maybe we can use own IDs like for persons or the ZDB-ID? --[[User:Nicole|Nicole]] and [[User:Karin|Karin]]
'''Discussion'''_
An ID other than SFX should be used as unique ID, as it is not possible to find out SFX-IDs for journal names, which one wants to add. Maybe we can use own IDs like for persons or the ZDB-ID? --[[User:Nicole|Nicole]] and [[User:Karin|Karin]]


:: I agree, SFX-ID should not be used. We chose SFX because in our initial datasets this was the identifier that was best populated, compared to ISSN. But what else? If we took our own ID we would not have "Control of Named Entities" anymore, just "Named Entities". --[[User:MFranke|MFranke]]
: I agree, SFX-ID should not be used. We chose SFX because in our initial datasets this was the identifier that was best populated, compared to ISSN. But what else? If we took our own ID we would not have "Control of Named Entities" anymore, just "Named Entities". --[[User:MFranke|MFranke]]
::: Please note that the [[Service_for_Control_of_Named_Entities#Create_an_authority_record|use cases below]] assume that the "authority" is represented by a special set of pubman users - which could be an argument for following Natasa's proposal below (local identifiers, records extended by authoritative data) --[[User:Inga|Inga]] 16:16, 14 April 2009 (UTC)
:: Please note that the [[Service_for_Control_of_Named_Entities#Create_an_authority_record|use cases below]] assume that the "authority" is represented by a special set of pubman users - which could be an argument for following Natasa's proposal below (local identifiers, records extended by authoritative data) --[[User:Inga|Inga]] 16:16, 14 April 2009 (UTC)


:: please consider that NIMS has high interest in contributing to quality to the journal data, but they do not have any SFX service they could use. I am therefore not sure, if it is good idea to bind the "quality management", i.e the authorisation of entries, to SFX. Still, to offer SFX as optional "quality check" might be of good use at least in MPG. --[[User:Uat|Ulla]]
: Please consider that NIMS has high interest in contributing to quality to the journal data, but they do not have any SFX service they could use. I am therefore not sure, if it is good idea to bind the "quality management", i.e the authorisation of entries, to SFX. Still, to offer SFX as optional "quality check" might be of good use at least in MPG. --[[User:Uat|Ulla]]


:: Can we use ZDB-ID instead maybe? This is at least easier to find out than SFX :-). Inga, is ZDB-ID unique? We only have to think about what to do if there is not ZDB-ID. --[[User:Nicole|Nicole]] 15:41, 14 April 2009 (UTC)
: Can we use ZDB-ID instead maybe? This is at least easier to find out than SFX :-). Inga, is ZDB-ID unique? We only have to think about what to do if there is not ZDB-ID. --[[User:Nicole|Nicole]] 15:41, 14 April 2009 (UTC)


::: I guess you may run into similar problems with the ZDB-ID as well, i.e. the registry is controlled by an external authority and new journals may not necessarily be available in the moment they are required. I'm not sure how well Japanese titles are covered as well. But, the ZDB-ID is available in a range of systems (incl. EZB) - which may be a big pro.<br>Additional note: If it's only a "usability problem" with retrieving the SFX-ID: I could add it to the sfx menu as an intermediate solution, e.g. [http://sfx.mpg.de/sfxtst3?ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fsfxit.com%3Acitation&rft.genre=journal&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&sfx.title_search=exact&url_ctx_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Actx&url_ver=Z39.88-2004&rft.issn=0070-4342 example] - or a bit more hidden ;) --[[User:Inga|Inga]] 17:08, 14 April 2009 (UTC)
:: I guess you may run into similar problems with the ZDB-ID as well, i.e. the registry is controlled by an external authority and new journals may not necessarily be available in the moment they are required. I'm not sure how well Japanese titles are covered as well. But, the ZDB-ID is available in a range of systems (incl. EZB) - which may be a big pro.<br>Additional note: If it's only a "usability problem" with retrieving the SFX-ID: I could add it to the sfx menu as an intermediate solution, e.g. [http://sfx.mpg.de/sfxtst3?ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fsfxit.com%3Acitation&rft.genre=journal&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&sfx.title_search=exact&url_ctx_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Actx&url_ver=Z39.88-2004&rft.issn=0070-4342 example] - or a bit more hidden ;) --[[User:Inga|Inga]] 17:08, 14 April 2009 (UTC)


==== Plans after R5 ====
==== Plans after R5 ====

Revision as of 17:18, 14 April 2009

Information on this page is in stage „work in progress“ or needs to be discussed.

Prototype service for controlled named entities - journal names[edit]

(work in progress!)

Process[edit]

  1. select an authority file (corporate bodies, journals, authors) and available external source
    Done: decision for journal names, use (enriched) edoc data
  2. create (import) data locally into an authority file from a selected source
  3. check for possible providers for look-up services for <rights> and <subject> (DOAJ? romeo? EZB?, Ulrichs?).
    Discussion: see below
  4. implement the referencing from the PubMan edit interface (enable automatic grow of the authority file for start when reference is not done)
  5. create very simple viewer/editor for the authority file data
  6. get feedback from potential pilot users
  7. modify/add functionalities based on the functional and technical feedback
  8. extend the prototype with another authority file and repeat the steps 2-5

Descriptive Metadata[edit]

For the selection of the descriptive metadata the main focus has been set on the minimum level of information that is needed to disambiguate entities. The list of descriptive metadata elements is extendable by new elements.

Metadata elements:

  • Journal title [1]

The name of the journal (e.g. "Journal of the ACM")

  • Alternative title [0-n]

Any alternative name or abbreviation of the journal

Remark Inga: Tagging of abbreviations as such? Indicating the origin of abbreviation if known? Discussion: see below
  • Publisher [0-n?]

The name of the institution that publishes the journal

  • Identifier [0-n]

Any external identifier (e.g. ISSN, EZB-ID, ZDB-ID)

Schema has to be indicated

  • Locator [0-1?]

Locator of the authority file source

Question Inga: Do we mean an URL pointing to the record?
  • Rights [0-n]

Statement on open access availability

Discussion: see below

  • Subject [0-n]

Subject/domain field of the journal

Possible Relations:

  • isSuccessorOf
  • isPredecessorOf

Rights statement for journals[edit]

Update on <rights>: as there is no requirement from Christoph/Anja for rights statements on journal level, we can choose whatever provider/Whatever information. I would opt for DOAJ, as it gives at least clear indication, which journals are OA, although no information on "Green" road publishers. disadvantage romeo/sherpa: indicates on publisher level, but not on journal level.--Ulla 16:16, 11 January 2008 (CET)

Requirement: The information collected under PubMan OA Statistics provide no clear picture for what kind of request the right statements are required. Is the goal either to receive the information if specific articles are open access or if the journal supports oa-publishing (for all articles? for some articles? via author-pays model?). We probably should avoid to include rights information until we have a clearer picture. --Inga 17:38, 18 January 2008 (CET)

Values: How do we populate i.e. what value has the rights metadata in the journal if the journal is OA (in accordance with DOAJ? (there are statements like: http://www.doaj.org/doaj?func=loadTempl&templ=faq#definition)--Natasa 15:41, 18 January 2008 (CET)

An overview on OA levels is provided in the wikipedia article on Open access journals:

Level Explanation Value Source
1 Journals entirely open access gold DOAJ
2 Journals with research articles open access ?? no source
3 Journals with some research articles open access ?? no source
4 Journals with some articles open access and the other delayed access ?? no source
5 Journals with delayed open access ?? no source
6 Journals permitting self-archiving of articles. green sherpa/RoMEO

Some thoughts on DOAJ

  1. DOAJ is a directory of open access scientific and scholarly journals. Each month new journals are added and existing journals are deleted from the repository. Therefore, rights information from DOAJ need to be updated regularly. Note: The oai-pmh repository does not maintain information about deletions.
  2. By definition, DOAJ does not list journals which use embargo periods (e.g. many Highwire journals) or which only provide parts of their content under oa condition (e.g. some BMC journals or backfiles with costs?).
  3. Therefore: DOAJ can be used to check if an journal is "on the golden road to OA". According to the DOAJ definition, this information could be escalated to all articles published in the journal. To avoid continuous updates, the information may rather be fetched dynamically than physically stored in pubman. If no information is available, this does not necessarily mean that the journal does not provide OA articles.

Discussion on Journal Abbreviations[edit]

Sabine and Traugott:

We do not see a need for tagging the acronyms of e.g. journal titles. The functionality that if an depositor fills in the acronym of a journal title and the full title should then be filled in automatically by the system can - in our opinion - also be provided if the acronym of the journal is stored in the title or alternative title element. The only scenario when tagging might be required is if we would like to generate lists of e.g. journal title acronyms but this can also be done by using "Woerterbuecher" etc.

Comment Natasa: Additional vocabularies are again another level of complexity to my understanding (if I understood your message right  :) . My proposal is to see what e.g. ZDB offers /if they offer abbreviations clearly separated from journal names - then we already have sources to do it in our system and clearly "tag" what is abbreviation and what is an alternative name/. [Comment Inga:] I would like to support this idea: Let's get/fetch/use the information if it's available.

[Comment Natasa:] Yes, will be done.

Comment Inga: Another use case is the generation of reference lists following a citation style which "expects" journal abbreviation, e.g. the ACS citation style.
Comment Traugott: this seems very ambitious to me, yet another piece of information we would need to control in an authority list maintained by escidoc, most probably via storing all the data locally. As all authority lists, creating, de-duplicating, including disambiguation information, validating and maintaining it, is a huge job. To my experience, acronyms are much worse than e.g. journal titles or place names. Sometimes they are even part of the official title of the journal.

In case we wouldn't hold a very rich list of different disciplines and journal communities acronyms, quite often several for the same journal, entering an acronym would result in only a wrong full title being displayed or the wrong acronym added to the citation information. For reference lists and citations (acc. to different styles), authors would very much like to see their (community's) own acronym, I assume. In many cases, they might not know which this is, even if we would be able to display several alternatives.

The main problem we see is that it is most likely that for one journal title more than one acronym is in use and that one acronym might be used for more than one journal, hence acronyms are not unambiguous. [Comment Inga:] ... and unfortunately journal titles are not unambiguous as well! This has to be taken into consideration when offering the functionality that the depositor fills in an acronym and the system provides the full title. So the depositor has to check if the right full title has been filled in or he/she has to select the full title out of a list of potential "right" full titles.

Comment Natasa: To my understanding of course this should be the case. The metadata in the item are populated with journal name and not with the journal abbreviation. When user tries to enter part of journal abbreviation or journal name the system should offer a list of names of journals (i.e. list of journals which have this exact or similar abbreviation or part of the journal name) - only from this list the user selects the appropriare journal name and the metadata are filled in correctly.

Would that be fine?

Comment Inga: Yes, this is required! If a lookup returns more than one journal object, the user needs to explicitly select one journal. In this case, the publisher information is probably a necessary piece of information. [Comment Natasa]: OK, agreed then!

Background information[edit]

RDF schema: http://schemas.library.nhs.uk/ApplicationProfile/Journal.rdf

This looks quite comprehensive and we just need a small subset . After 10 minutes analyzing the schema, I'm not sure how the identifiers are further encoded (ISSNURL?). My vote: too complex, reduce it to minimum? --Inga 16:47, 29 November 2007 (CET)

NLM DTD: http://dtd.nlm.nih.gov/publishing/tag-library/2.3/n-z4u0.html

Identifier used by CONE[edit]

Status: The current implementation does not assume manual additions of journal records directly via CONE, but an automatic ingestion from the SFX knowledge base only. Therefore, the SFX-ID has been used as primary identifier by CONE so far, but this proves to be no reliable solution, e.b. because SFX-IDs are maintained by ExLibris' team only (-> Not all journals will have an SFX-ID already - because they are too new, etc.)

Discussion_ An ID other than SFX should be used as unique ID, as it is not possible to find out SFX-IDs for journal names, which one wants to add. Maybe we can use own IDs like for persons or the ZDB-ID? --Nicole and Karin

I agree, SFX-ID should not be used. We chose SFX because in our initial datasets this was the identifier that was best populated, compared to ISSN. But what else? If we took our own ID we would not have "Control of Named Entities" anymore, just "Named Entities". --MFranke
Please note that the use cases below assume that the "authority" is represented by a special set of pubman users - which could be an argument for following Natasa's proposal below (local identifiers, records extended by authoritative data) --Inga 16:16, 14 April 2009 (UTC)
Please consider that NIMS has high interest in contributing to quality to the journal data, but they do not have any SFX service they could use. I am therefore not sure, if it is good idea to bind the "quality management", i.e the authorisation of entries, to SFX. Still, to offer SFX as optional "quality check" might be of good use at least in MPG. --Ulla
Can we use ZDB-ID instead maybe? This is at least easier to find out than SFX :-). Inga, is ZDB-ID unique? We only have to think about what to do if there is not ZDB-ID. --Nicole 15:41, 14 April 2009 (UTC)
I guess you may run into similar problems with the ZDB-ID as well, i.e. the registry is controlled by an external authority and new journals may not necessarily be available in the moment they are required. I'm not sure how well Japanese titles are covered as well. But, the ZDB-ID is available in a range of systems (incl. EZB) - which may be a big pro.
Additional note: If it's only a "usability problem" with retrieving the SFX-ID: I could add it to the sfx menu as an intermediate solution, e.g. example - or a bit more hidden ;) --Inga 17:08, 14 April 2009 (UTC)

Plans after R5[edit]

Proposal by Natasa

  1. each new Journal entry, if not selected from a list has a growing (local Id)
  2. authorizing it may mean some integration with SFX/ZDB, e.g. by obtaining an SFX-ID via SFX API or ZDB z39.50 interface
    • authorizing it may mean actually even relating it to an existing SFX-ID in Cone (e.g. name alternatives )?
  3. the identifier metadata on the publication will not be changed, some interfaces changes though would be required for Cone service.

Another alternative, which would be a bit more heavy would be to:

  1. allow only authorized journal entries in the metadata
  2. have "journal entry user" who will take requests for new journal entries
  3. enter somehow in SFX-ID (that is if any SFX integration would be possible)
  4. Only afterwards modify the submission (or enable some automated utility for update of journal identifiers)

Uniqueness of SFX-ID[edit]

SFX-ID is not unique as you can see here

Ya, I do remember that Natasa informed me about this occurrences long time ago, but I never found the time to follow-up in detail. After checking the incident today, I believe that my lookup-up script may have been confused by SFX-IDs of "relatedObjects" (a feature which have been introduced to the SFX KB in beginning of 2008). For the example above, following SFX-IDs would be correct
... and both objects are related to "Verhandlungen der deutschen Zoologen" with SFX-ID 110975506069213. Sorry! --Inga 15:10, 14 April 2009 (UTC)

How to retrieve an SFX-ID for a journal?[edit]

Technically it is already possible to add new entries. To retrieve the SFX-ID you may either

  • search for the journal title via MPG/SFX citation linker. The source code of the corresponding SFX menu includes the object-id, e.g.
'rft.object_id' => '954925427230'
  • request the SFX API. The context object hash includes the element
<item key="rft.object_id">954925578060</item>

Important Notice: several object identifiers may be listed for one request due to "relatedObject" feature (see examples above)

Other candidates[edit]

Potential other candidates for normalized metadata entries have to be discussed further and maybe to be defined with pilots.

Person[edit]

Potential metadata elements[edit]

Complete name The complete name of a person, usually a concatenation of given names and family name

Remark Inga: I would assume that the given name can be automatically be generated from given name and first name. Therefore I wouldn't consider this as additional element --Inga 12:19, 28 December 2007 (CET)

Given name A given name of a person

Family name The family name of a person

Alternative name Any alternative name used for the person

Title The title or peerage of a person in one string

Pseudonym The pen or stage name of a person

Remark Sabine: Can pseudonym also be covered by alternative name?
Remark Ulla: Let's assume: yes
Remark Inga: But in this case we should again consider a typization of alternative names --Inga 12:19, 28 December 2007 (CET)

Affiliation The institution the person was affiliated to when creating the item

Remark Inga: The above information is only available in the context of one publication. Therefore, I would suggest to point to the affiliation, the person is currently working for. In addition, we might list all former (known) affiliations as well. --Inga 12:23, 28 December 2007 (CET)

Identifier Identifier in the Personennormdatei, provided by the Deutsche Nationalbibliothek

Remark Sabine: IMO other identifier should be allowed as well (e.g. Identifier of Library of Congress Name Authority File)
Remark Ulla: Can be modified if needed
Remark Inga: Schema has to be indicated --Inga 12:19, 28 December 2007 (CET)

Email Email address of the person (e.g. will allow users to send an email to the author asking for the fulltext in case it is not available)

Remark Sabine: I am not sure whether the email address is an important information for all persons or for registered PubMan users only. The handling of this "private data" has also be clarified.
Remark Traugott: Problem of regularly updating the email address
Remark Ulla: IMO updating controlled vocab. is always a challenge, not only for emails...?
Remark Inga: I agree with Sabine, email address is of special importance for eSciDoc users only. If we don't find a reliable source for importing this information regularly (PND?), we shouldn't try to maintain this information ourself --Inga 12:19, 28 December 2007 (CET)

Homepage The location of a personal homepage (e.g. in case fulltext is available via personal homepage)

Remark Sabine: same as for Email

Date of Birth

Place of Birth

Date of Death

Place of Death

Resources[edit]

MPG Units[edit]

Potential metadata elements[edit]

Complete name In Englisch and/or German?

Alternative name

Place

Address

Homepage

Resources[edit]

Conferences/events[edit]

Potential metadata elements[edit]

Title The name of the event (e.g. Symposium on Theory of Computing)

Alternative title Any alternative name of the event

Abbreviation Abbreviated name of the event (e.g. STOC)

Start date Start date of the event

End date End date of the event

Place Place where the event took place

Invitation status The information if the creator was explicitly invited

Remark Sabine: Should this information be stored in controlled metadata record?
Remark Ulla: No, not to my understanding
To my understanding, the invitation status can only be specified for each talk individually and is therefore no generic metadata for the conference --Inga 12:19, 28 December 2007 (CET)

Keywords, classifications, thesauri[edit]

→ see cpt_pubman_classifications

Title of Source (e.g. Series titles)[edit]

Series titles may be handled via the journal authority file as well --Inga 12:19, 28 December 2007 (CET)

Potential external sources[edit]

The tables give an overview of potential sources of controlled named entities which are of interest. The information given in the tables reflects the current situation and has to be updated from time to time. The tables are in stage "work in progress" and other sources might be added.

Person[edit]

Person(s), i.e. full name of persons (authors, editors, referees, etc.)

Name of service Scope Info Formats supported Interfaces Costs Access
Library of Congress Name Authority Service To be evaluated in detail

(likely not to cover too many MPG authors)

Introduction

WSDL http://authorities.loc.gov

MARCXML SOAP

WSDL

Records are free of charge[1] via web site
Personennormdatei (PND) ca. 2,6 mio names (1 mio with individualized records)

To be evaluated in detail (likely not to cover too many MPG authors)

Introduction MAB2

USMARC SUTRS

Z39.50 PND, GKD and SWD only in combination available

costs

CD-ROM (2 CDs) as cumulative new editions. Published biyearly in January and July
PND is licensed in MPS, database is available via the Aleph server[2]
Virtual International Authority File (VIAF) First prototype covers LC and DNB personal name authority and related bibliographic records project web site MARC21 (?) Prototype system available at:

http://viaf.org

Computer Science Bibliography (DBLP) Computer Science http://dblp.uni-trier.de HTML, XML
Wikipedia Persondata info data dump HTML

Corporate body[edit]

Name of service Scope Info Formats supported Interfaces Costs Access
Körperschaftsnormdatei (GKD) More than 1 mio records (german&foreign corporate bodies and conferences) Introduction MAB2 Z39.50 see PND see PND

Journal[edit]

Name of service Scope Info Formats supported Interfaces Costs Access
Zeitschriftendatenbank (ZDB) ca. 1,3 mio records Introduction MAB2, UNIMARC, SUTRS Z39.50 It has to be clarified with the GWDG if a tailored version of the ZDB (only listing MPG licensed journals) is available.
ISSN Register 1.284.413 records (2006) http://www.issn.org MARC21, UNIMARC Z39.50 costs Access via the ISSN portal or Z39.50 or via a combined web access Z39.50 and ISSN portal

Rights[edit]

Name of service Scope Info Formats supported Interfaces Costs Access
SHERPA/RoMEO

Publishers copyright policies&self-archiving

340 publishers (July 2007) http://www.sherpa.ac.uk/romeo.php XML Prototype API Conditions of re-use Prototype API
Directory of Open Access Journals (DOAJ) 2.987 journals, 164.284 articles (5th of December 2007) http://www.doaj.org XML OAI-PMH Conditions of re-use OAI-PMH

Potential use cases[edit]

(First ideas, needs to be discussed)

This section contains a first draft of potential use cases, described in a generic way that have to be adapted according to the respective type of authority file (e.g. journal, person, etc.):

  • Create an authority record
  • Use an authority record as template
  • Display an authority record
  • Edit an authority record
  • Delete an authority record
  • Link an authority record to an IR item
  • Redirect an authority record
  • Search an authority record

Whether the described use cases can also be applied for affiliations has to be evaluated further. The description of potential use cases is based on the assumption that the system supports the incremental build-up of internal authority files and external sources are used and integrated as a start content.

Please note: in the usage scenarios/use cases the deletion of a record should be stated as exception. Instead the deactivation of a record happens more often.

Create an authority record[edit]

Preconditions, assumptions

  • IR item is self-contained
  • Authority record is immediately visible and selectable (no status: pending, submitted, etc.)
  • Depositor is not allowed to create, edit, delete or redirect an authority record
  • Potential new role “AF-Editor” is not considered
  • An authority record has no obligatory descriptive elements (=> no validation process is required)
  • UC can be triggered independently or during FQA process as an extension of UC_PM_QA_XXX in case IR item has not been assigned to an authority record during submission process or has been assigned to the wrong authority record and the appropriate record is not yet available
  • Nice to have: system provides interface to external authority files for inquiries and data transfer (not considered)

Actors

  • Moderator, MD-Editor

Basic course of events (creation during FQA process)

  • The user chooses to create an authority record
  • The system creates a new authority record for the respective metadata field
  • Continue with UC edit an authority record
  • The system links the selected IR item with the authority record (via ID)

Alternative a (in case use case is triggered independently)

  • The user chooses to create an authority record
  • The system displays a list of all authority files for which the user has privileges
  • The user selects an authority file and confirms the choice
  • The system creates a new authority record
  • Continue with UC edit an authority record

Alternative b (in case use case is triggered independently)

  • If the user has rights for only one authority file, the authority file selection is automatically performed by the system

Use an authority record as template[edit]

Preconditions, assumptions

  • IR item is self-contained
  • Potential new role of “AF-Editor” is not considered
  • UC can be triggered during FQA process or independently
  • One authority record is selected

Actors

  • Moderator, MD-Editor

Basic course of events

  • The user chooses to use the selected authority record as template
  • The system creates a new authority record and populates the new record with the metadata of the selected record
  • Continue with UC edit authority record

Display an authority record[edit]

Preconditions, assumptions

  • IR item is self-contained
  • Potential new role of “AF-Editor” is not considered
  • UC can be triggered during Submission, FQA process or independently
  • One authority record is selected

Actors

  • Moderator, MD-Editor, (Depositor) (authority record view for Depositor must not contain personal data, e.g. date of birth etc.)

Basic course of events

  • User chooses to display the selected authority record
  • The system displays the authority record

Edit an authority record[edit]

Preconditions, assumptions

  • IR item is self-contained
  • Depositor is not allowed to create, edit, delete or redirect an authority record
  • Potential new role “AF-Editor” is not considered
  • Check of correct assignment of authority record is performed during FQA process and no separate authority record quality assurance process is implemented => UC can be triggered during FQA process or independently
  • UC is included by UC create an authority record and by UC use an authority record as template
  • The user wants to change or provide data for an authority record

Actors

  • Moderator, MD-Editor

Basic course of events

  • The user chooses to edit the selected authority record
  • The system displays an edit view for the selected authority record
  • (Optional) the user adds new metadata values or modifies existing metadata values
  • The user chooses to finalize the data
  • The system stores the authority record and displays a success message

Delete an authority record[edit]

Preconditions, assumptions

  • IR item is self-contained
  • Deletion of authority records is important in case duplicates have been generated
  • Always the newer authority record should be deleted (=> date of creation is an important information and should be displayed somewhere in the authority record view)
  • Only authority records with no IR items assigned can be deleted. In case to be deleted authority record is still linked to IR items, links have to be changed manually beforehand (cf. UC redirect an authority record). Maybe an automatic “Umverknüpfungsprozess” should be implemented at a later date.
  • Depositor is not allowed to create, edit, delete or redirect an authority record
  • Potential new role “AF-Editor” is not considered
  • One authority record is selected

Actors

  • Moderator

Basic course of events

  • The user chooses to delete the selected authority record
  • The sytem checks if not IR items are linked with the selected authority record
  • No IR items are linked with the authority record
  • The system prompts the user to confirm the deletion
  • The user confirms to delete the authority record
  • The system deletes the authority record and displays a success message

Alternative a

  • The selected authority record is still linked to one or more IR items
  • The deletion fails

Alternative b

  • The user does not confirm to delete the authority record
  • The selected authority record is unaffected

Link an authority record[edit]

Preconditions, assumptions

  • IR item is self-contained
  • After selecting an authority record (and establishing a link between authority record and IR item) the user is still allowed to edit the medatata field but the established link will not remain. Incorrect links should be discovered and corrected during FQA process
  • Potential new role “AF-Editor” is not considered
  • Use case is part of USC submission or USC FQA an should be integrated as an include association in UC_PM_SM_XXX

Actors

  • Depositor, Moderator, MD-Editor

Basic course of events

  • User fills in the corresponding metadata field. During his/her typing the system automatically suggests a list of potential authority records (Wörterbuchfunktion)
  • The user selects an authority record
  • The system links the item with the selected authority record via AR-ID. In case a link has already been established, the system overwrites the previous AR-ID (relevant in case UC is triggered during FQA process)
  • (Optional) the user edits the corresponding metadata field. The link between the item and the authority record does not remain and the item is marked as <not assigned to authority record>

Alternative

  • No appropriate authority record is available. The user enters a free-text
  • The item is marked as <not assigned to authority record>

Redirect an authority record[edit]

Preconditions, assumption

  • IR item is self-contained
  • Depositor is not allowed to create, edit, delete or redirect an authority record
  • Potential new role “AF-Editor” is not considered
  • UC is triggered in case duplicate has been detected or in case IR item has been assigned to the wrong authority record

Actors

  • Moderator, System

Basic course of events

  • Moderator triggers automatic Umverknüpfungsprozess

Search an authority record[edit]

Preconditions, assumptions

  • IR item is self-contained
  • Potential new role “AF-Editor” is not considered
  • UC can be triggered during FQA process or independently

Actors

  • Moderator, MD-Editor

Basic course of events

  • The user selects one or more authority files
  • The system displays a simple search field
  • The user enters a search string and chooses to start the search
  • The system searches in all metadata fields of authority records
  • The system displays the list of items of the search result

Alternative

  • No item matched the search string
  • The system displays a message

Potential new role: AF-Editor[edit]

(First idea, needs to be discussed)

It has to be discussed further if a new role called AF-Editor has to be established. The idea is that the AF-Editor is responsible to provide and maintain high data quality of authority records and to ensure the consistency of the authority file databases. He/she is familiar with relevant cataloging and standardization rules and takes care of the standardization of selected data. The AF-Editor complements the area of responsibilities of the Moderator and the MD-Editor and has special privileges to authorize and to deactivate authority records. Once an authority record has been authorized it is locked and can only be edited by the AF-Editor him-/herself.

Potential new use cases

  • authorize an authority record
  • send an authority record back for revision (in case e.g. Moderator wants to edit an authority record which has been already authorized)
  • propose an authority record for deactivation

Potential new status

  • After creation authority record is either in state pending or submitted.

Potential privileges/competencies

  • AF-Editor is allowed to authorize and to deactivate authority records and beyond that has privileges to all other actions connected to authority files/authority records.
  • During separate AFQA process authority record gets checked and authoritzed by the AF-Editor. A list of newly created authority records is displayed in the AF-Editors’ workspace.

Open issues

  • Separate AFQA process and its interaction with FQA process has to be specified. We assume that the release process of IR items is not affected by new AF workflow when IR items are self-contained and follow Autopsie-Prinzip.

Potential privileges/competencies[edit]

(First idea, needs to be discussed)

Depositor

  • display an authority record
  • link an authority record

Moderator

  • create an authority record
  • display an authority record
  • edit an authority record
  • deactivate an authority record
  • link an authority record to an IR item
  • redirect an authority record
  • search an authority record

MD-Editor

  • create an authority record
  • display an authority record
  • edit an authority record
  • link an authority record to an IR item
  • search an authority record

Potential web services[edit]

It has to be discussed whether part of the controlled metadata values which are stored and maintained in PubMan should be provided via web services (e.g. an interface/plugin for organizational units in order to re-use the data for instance when writing a scientific paper). The legal situation for metadata values from external sources has to be clarified in this context.

Open Issues[edit]

Metadata

  • Agree on a list of potential candidates for authority files. Note: If a generic mechanism like CDS Invenio's knowledge base would be implemented no such list would be needed in advance.
  • Define what kind of descriptive elements an authority record should contain. Descriptive elements may differ from authority file to authority file and should therefore be defined individually.
  • Decide whether an IR-item should be self-contained or not. Question: What does self-contained mean? Even right now, IR items are not self-contained in the sense that they contain all relevant metadata values, because other repository objects like creators are only referenced.
  • Define how to map authority files to a MD element in a specific MDS. Note: for every MD element the system supports authority files for, we probably need to specify a list of descriptive information available for the authority file (e.g. journal names: title, translation of title, title abbreviations, ISSN, eISSN, etc. persons: last name, first name, etc.)
  • Specify linking between IR items and authority records via ID.
  • Specify linking of different authority files/databases (e.g. user database - personal name authority file - affiliation authority file).
  • Describe selection of authority record (Depositor during submission? Free-text field? System suggests authority record while Depositor fills in information? Depositor may search within authority file database and selects an record?).

Handling of authority files

  • Specify assignment of items to an authority record and when it will take place (while submission by selecting an authority record from the selection list? While submission by accepting an authority record selected by the system? While FQA?).
  • Describe administration and control of authority files (who is allowed to create, edit, delete, redirect, and authorize authority records? (see proposal of new role AF-Editor).
  • Define what will happen in case no appropriate authority record is available.
  • Specify if different kinds of authority files require different handling.
  • Clarify dilemma between authority files and Autopsie-Prinzip (scenario: user selects an authority record. System fills in certain fields automatically. User edits one or more of the automatically selected fields afterwards) (proposal made by Inga: entry in IR item follows Autopsie-Prinzip but browse tree will be generated from authority record and standardized data. Notation of original (Vorlage) should be integrated in authority record as an alternative (e.g. alternative name).
  • Specify duplicate checking for authority records. Duplicate checking should also compare e.g. name and alternative name.
  • Specify users and their rights and privileges concerning authority files.
  • Specify if a separate authority file workflow is required.
  • Describe entry of multiple authors (via copy and paste).

Handling of new authority records

  • Describe creation of new authority records (e.g. when does user create a new record? (Depositor during submission? Moderator during FQA? AF-Editor in a separate workflow? Is it possible to use an existing entry as template? Should the system generate a message to AF-Editor in case a new authority record has been created?).
  • Specify a “Regelwerk” for the creation of new authority records.
  • Specify if an authority record has obligatory elements.

Import of external authority files

  • Specify how external authority files can be provided (licensed by MPS? Online available? CD-ROM?) and which procedures are required (includes: harvesting, data conversion (format and character set), linking to IR items, update mechanism, maintenance).
  • Describe import of external authority files or subset of it.
  • Where will be imports of data sets like: name authority files (e.g. PND), user/person related information; imports from MPG-IP-database hosted at GWDG, other authority files (e.g. Zeitschriftendatenbank) described and handled? – They are not described in USC_ingestion.

Build-up of internal authority files

  • Describe procedure of how to create incrementally built authority files.
  • Clarify integration/interaction of internal and external authority files (initial import of external authority files or harvest of authority files scheduled on a regular basis?).
  • Will it be possible to extend/modify authority records in case of loading/synchronizing authority files from external source (assumption: no or by customizable fields).

Customization

  • Clarify on which criteria authority files are chosen (customization of authority files on collection, user, user group level?).
  • Describe setup of authority files on collection level.

Export

  • Specify export of authority records/authority files
  • Define what kind of descriptive elements of an authority record should be exportable in case IR item is not self-contained or in case IR item should be enriched with additional information from authority record.

Ingestion

  • Specify assignment of authority records for ingested data.

Searching

  • Define which elements of authority records are searchable (simple, advanced and expert search).
  • Specify searching in external authority files (provide interface for AF-Editor and Moderator to external authority files for inquiries and data transfer).
  • Define generation of browse trees (proposal: browse trees should be generated of standardized data of authority record).
  • Specify search in internal authority files.
  • Describe basket functionality for authority records (e.g. important for re-direction in batch mode or re-use of data).

Migration of eDoc data

  • Specify assignment of authority records to migrated eDoc items.

Others

  • Is the SFX knowledge base an alternative to the ZDB?
  • Favourite co-authors feature has to be implemented in accordance with authority file concept (Wörterbuchfunktion could be an alternative to the favourite coauthors feature).
  • Automatic “Umverknüpfungsprozess” has to be specified. Privileges and rights of IR items have to be considered (i.e. how to handle the re-direction of items from other collections).

Footnotes & References[edit]

  1. Information from web site: "users do not have to register or request permission to search, save, print, or email the LC authority records. The only limitation is that authority records may only be saved, printed or emailed one at a time."
  2. https://dev.livingreviews.org/projects/vlib/wiki/AuthFiles


Questions for service implementation and data[edit]

Michael, Natasa--Natasa 16:40, 27 August 2008 (UTC)

  • AUTOSUGGESTION
    • would it be OK to autosuggest only journals that have SFX-ID
    • suggested list to contain SFX-ID and edoc journal name (as it is normalized)
    • TEST TO TRY:

1) lucene index compound from searchable fields 2) add special triple for autosuggest results for each journal 3) each journal contains regularly metadata as defined

  • JOURNAL MAINTENANCE
    • if new journals are added must / can they always acquire SFX-ID? If needed, HOW? e.g. send first to SFX, get back jouID, then assign real ID
  • DDC
    • please note that the current notation of DDC values in the CONE service (e.g. "100.110.116 - Change") may be confusing because DDC uses the dot to separate a category from its extension, e.g.
 641 Food and drink
 641.84 Sandwiches
Aha, we were not aware of this, so we took the hierarchy of the three levels above as value and as identifier. So does it mean only the third level should have been actually stated as a value (in this case identifiers would still have the full path?)--Natasa 08:29, 11 March 2009 (UTC)
I haven't understood for what the "full path" is required - there is only one path to 112 in any case (= 100.110.112). I believe that the actual DDC value is the number only, e.g.
 <dc:subject scheme="DDC">
   062
 </dc:subject>
see http://dublincore.org/documents/2002/04/14/dc-xml-guidelines/. In addition, DDC codes/notations have been translated into terms in a couple of natural languages, including German. --Inga 19:45, 11 March 2009 (UTC)
It is explicitly decided for values to use only English names (and to serve only English names in the autosuggested list). The whole concept of data internationalization is not yet completely clear. --Natasa 08:47, 12 March 2009 (UTC)
Not considering internationalization is fine with me - even NIMS will bring the issue on the table very soon. In terms of interoperability, you probably should consider to store the notation individually and to use this in the pubman metadata only. But it looks like this is already done ;) - I got confused because the json output merges various sub elements into"value" --Inga 12:20, 12 March 2009 (UTC)
With respect to identifiers, must admit, was probably not fully understood. I was using the following resources: http://dc2008.de/wp-content/uploads/2008/09/panzer.pdf, http://dc2008.de/wp-content/uploads/2008/09/panzer.pdf, http://ddc.typepad.com/025431/2007/09/designing-ident.html - and misunderstood the class identifier (was not paying attention on the level). Sure, we only need the last 3 digits in each identifier and identifier value. Another mislead was also the fact that we only use first three levels of DDC :) --Natasa 08:45, 12 March 2009 (UTC)
Ja! ... and this is definitively not the implementation recommended by Traugott, because the first three levels do not provide a very "fine-grained" classification by subject, see Talk:ESciDoc_Application_Profile_Publication#Subject --Inga 12:28, 12 March 2009 (UTC)