Difference between revisions of "Service for Control of Named Entities"

From MPDLMediaWiki
Jump to navigation Jump to search
(internal link corrected)
 
(75 intermediate revisions by 10 users not shown)
Line 1: Line 1:
<accesscontrol>MPDL,,PubMan,, eSciDoc</accesscontrol>
{{ESciDoc Services}}
{{ESciDoc Solutions}}
[[Category:PubMan]]


The aim of this page is to collect information concerning control of named entities in the context of PubMan/eSciDoc. The page contains functional and technical specifications of envisioned PubMan services in this domain.


'''Note for users maintaining this page:


Issues that are still work in progress or need to be discussed should be put on the discussion page first. As soon as something has been agreed, it will be moved to the main article page.'''
The purpose of this service is to provide methods to deal with controlled lists of named entities to assure data quality and facilitate data access and data entry. Read more on the background [[Control_of_Named_Entities|here]].


== First services ==
==Introduction==
=== Core service "organizational unit handler ===
The eSciDoc system provides a core service "organizational unit handler which manages the organizational units for the eSciDoc system. In future, this core service might be extended by an additional service for named entity control for organizational units, to be able to track and manage more information needed for organisational units.


The basic descriptive elements which will be covered by the core service are:
The Control of named entities (Cone) service at present supports the named entity types [[Service_for_Control_of_Named_Entities#Supported_Named_Entity_Types |listed below]]. It is integrated with [[PubMan|PubMan]] in the publication metadata editing forms via "auto suggest" component for respective metadata edit fields (journal name, language of the publication).


*'''Name'''
==Operations==
The name of the organization, including translations. (Translations need respective language flag)
The CoNE service supports the following operations:


Cataloging rules are necessary, i.e. the full network path should be visible in the name of the organizational unit.
*searching (based on the metadata of named entity i.e. in case of journals: journal name, publisher name, place of publication
*retrieving details of selected named entity e.g. retrieving the metadata of a selected journal based on the journal identifier


*'''Alternative name'''
{| style=" height:340px" border="2"
Any alternative name or abbreviation used for the organization
|- bgcolor="#ccccff"
! width="150" |'''Operation'''
! width="150" |'''Status'''
!width="200"|'''Input''' 
! width="150" |'''Output'''
!width="300"|'''Description'''   
|- style="height:33px"
|query||implemented||(opt)'''q''':String - the query (see below)<br/>(opt)'''[predicate]''': String - a value for a certain field (see below)<br/>(opt)'''lang''': String - the language as ISO 636-1 code<br/>(opt)'''n''': int - The maximum number of results that should be returned||String||Scope:'''Public''' <br/>Gives back a list of resources/ids in the given language (default 'en') matching the given query. The number of maximum hits given back is configurable (default 50).
|-
|details||implemented||'''id''': String<br/>(opt)'''lang''': String||String||Scope:'''Public''' <br/>Returns all available information on the resource in the given language (default 'en') identified by the given id.
|-
|all||implemented||(opt)'''lang''': String||String||Scope:'''Public''' <br/>Gives back a list of all resources/ids in the given language (default 'en').
|}


*'''City'''
The query parameter '''q''' may contain a single term (e.g. ''q=nature'') or multiple terms (e.g. ''q=psychology therapy''). CoNE will return those entries that contain all terms in one or more of the searchable fields. Putting the query into quotes e.g. ''q=&quot;John Doe&quot;'' will cause CoNE to search for exact matches, wildcards (*) are allowed.
The city where the organization is located
As an alternative to search all fields, it is also possible to search certain fields (predicates) by specifying the predicate name and a query similar to the syntax described above (e.g. ''foaf:family_name=Miller'' or ''http:&#47;&#47;purl.org/escidoc/metadata/terms/0.1/suffix=&quot;*x&quot;''). The namespace of the predicate can be given fully or as prefix, prefixes are predefined [https://subversion.mpdl.mpg.de/repos/common/trunk/common_services/cone/src/main/resources/models.xml here].


*'''Country'''
==Interfaces==
The country where the organization is located
The CoNE service comes with a flexible interface definition that defines the Input/Output formats. The following formats are available


*'''Type'''
{| style=" height:340px" border="2"
Type of organization, i.e. institution, institute, department, group, research unit, project, sub-project, research school
|- bgcolor="#ccccff"
! width="150" |'''Interface'''
! width="150" |'''Status'''  
!width="200"|'''Input''' 
! width="150" |'''Output'''
!width="300"|'''Description'''   
|- style="height:33px"
|jquery||implemented||HTTP Get-Request||jquery proprietary list, JSON||HTTPServlet used by jquery Javascript components for autosuggest features
|-
|options||implemented||HTTP Get-Request||Format that can easily be read by Java to generate JSF options||HTTPServlet usable by Java a.o.
|-
|json||implemented||HTTP Get-Request||pure JSON||HTTPServlet usable by Javascript/AJAX components
|-
|html||implemented||HTTP Get-Request||(X)HTML||HTTPServlet that provides a human readable overview on the data (planned to be used for author pages)
|-
|rdf||implemented||HTTP Get-Request||RDF/XML||HTTPServlet that provides a RDF view on the data
|-
|unapi||planned||HTTP Get-Request||varying||HTTPServlet that conforms to the unAPI interface specification
|-
|vcard||planned||HTTP Get-Request||ASCII(UTF-8 for single values possible using BASE64 encoding?)||HTTPServlet that conforms to the vCard specification ([http://tools.ietf.org/html/rfc2425 RFC2425])
|-
|}


*'''Time period'''
==Supported Named Entity Types==
Indication, in which time period the organization was active.
Currently, we support following vocabularies:
===Journals===
*url to the journal service interface (to be done)
*[[CoNE_Journal| Metadata Description for CoNE Journal]]
*Functional Specification for [[Managing_CoNE_entities_-_Journals |managing CoNE journals]]
*example urls to the service (to be done)
*url to the test-jsp (to be done)


*'''Relations'''
===Languages===
**actual hierarchies and network-relations, e.g. sub-units
*url to the language service interface (to be done)
**: Remark Traugott: hierarchical relations between org units might be keep sufficiently within the titel/name element, with the sequence from higher to lower.
*metadata description (to be done)
**historical dependencies, i.e. successor-predecessor
*example urls to the service (to be done)
*Functional Specification to handle the entity (to be done)


*'''Identifier'''
===Persons===
URI, eSciDoc Identifier for the organisation. In addition, other identifiers can be kept.
*Metadata Profile under [[CoNE_Person|CoNE_Person]]
*Functional Specification for [[Managing_CoNE_entities_-_Persons |Managing CoNE persons]]


Ongoing work on the core service definition, see [[Talk:PubMan_Func_Spec_Organizational_Unit_Management#Metadata | Discussion page on functional specification for org unit management]]
===DDC===
*Metadata description (to be done)
*Functional Specification for handling entity (to be done)
*Currently implemented: [[Dewey Decimal Classification]]


===Future Development===
===Mimetypes===
A list of the IANA mimetype list plus additional mimetypes needed for PubMan (currently used for validation).
*Metadata description (to be done)
*Functional description for handling the entity (to be done)


====Classification/Tagging====
===eSciDoc mimetypes===
*'''MPS-Section(?)'''
A subset of the mimetypes list above. Planned to be used for PubMan in the future.
Affiliation of the organisational unit to one of the three sections
*Metadata description (to be done)
*Functional description for handling the entity (to be done)


===Prototype service for controlled named entities - journal names===
==Candidates for Named Entity Types==


To understand better the issues of controlled named entities for a certain application, we decided to start with a prototype service for PubMan on controlled named entitites.
Potential further candidates for normalized metadata entries:


'''Stages of prototyping:'''
=== MPG Units ===


# select an authority file (corporate bodies, journals, authors) and available external source 
==== Potential metadata elements ====
# create (import) data locally into an authority file from a selected source
'''Complete name'''
# implement the referencing from the PubMan edit interface (enable automatic grow of the authority file for start when reference is not done)
In Englisch and/or German?
# create very simple viewer/editor for the authority file data
# get feedback from potential pilot users
# extend the prototype with another authority file and repeat the steps 2-5
# modify/add functionalities based on the functional and technical feedback


Please see work in progress on [[Talk:Service_for_Control_of_Named_Entities#Prototype_service_for_controlled_named_entities_-_journal_names|Talk:Service_for_Control_of_Named_Entities]]
'''Alternative name'''
 
'''Place'''
 
'''Address'''
 
'''Homepage'''
 
==== Resources ====
* http://www.mpg.de/english/institutesProjectsFacilities/index.html - MPG website, the authorative source?
* http://wiki.mpg.de/index.php/Alumni:ListeInstitute - list of MPG units (existing and former) provided by maxnet wiki
* http://www.biochem.mpg.de/iv/strategy_tab.html#mpg - Search queries for MPG units provided by IVS-BM
* http://de.wikipedia.org/wiki/Max-Planck-Gesellschaft#Forschungseinrichtungen_der_Max-Planck-Gesellschaft
* Corporate body (not MPG-exclusive)
 
{|style="font-size=50%" border="1"
! Name of service                               
! Scope
! Info
! Formats supported
! Interfaces
! Costs
! Access
|-
|Körperschaftsnormdatei (GKD)
|More than 1 mio records (german&foreign corporate bodies and conferences)
|[http://www.ddb.de/standardisierung/normdateien/gkd.htm Introduction]
|MAB2
|Z39.50
|see PND
|see PND
|}
* http://www.niso.org/apps/group_public/download.php/2773/NISO_I2_IR_Survey_Final_Report.pdf A survey by NISO on usage of institutional identifiers in repositories, incl. types of identifiers and metadata in use
* [[ESciDoc_User_Roles#CoNE_roles]]: CoNE roles in the eSciDoc framework
 
=== Conferences/events ===
 
==== Potential metadata elements ====
 
'''Title'''
The name of the event (e.g. Symposium on Theory of Computing)
 
'''Alternative title'''
Any alternative name of the event
 
'''Abbreviation'''
Abbreviated name of the event (e.g. STOC)
 
'''Start date'''
Start date of the event
 
'''End date'''
End date of the event
 
'''Place'''
Place where the event took place
 
'''Invitation status'''
The information if the creator was explicitly invited
 
: Remark Sabine: Should this information be stored in controlled metadata record?
 
:: Remark Ulla: No, not to my understanding
 
::: To my understanding, the invitation status can only be specified for each talk individually and is therefore no generic metadata for the conference --[[User:Inga|Inga]] 12:19, 28 December 2007 (CET)
 
 
[[Category:CoNE]]

Latest revision as of 14:28, 25 April 2012

eSciDoc SOA

SOAP and REST style interfaces
Service layers

Core services
Context Handler · Item Handler
Container Handler
Organizational Unit Handler
User Account Handler
Authentication
Content Model Handler
Semantic Store Handler

Intermediate services
Validation Service
Statistics Manager
Technical Metadata extraction
PIDManager
Basket Handler
Duplication detection
ImageHandler(Digilib)

Application services
Depositing
Searching
Search&Export
Control of Named Entities
Citation style Manager
RightsChecking
DataAcquisition
Transformation
Fledged Data
PID Cache
OAI-PMH

SOA Introduction

edit



The purpose of this service is to provide methods to deal with controlled lists of named entities to assure data quality and facilitate data access and data entry. Read more on the background here.

Introduction[edit]

The Control of named entities (Cone) service at present supports the named entity types listed below. It is integrated with PubMan in the publication metadata editing forms via "auto suggest" component for respective metadata edit fields (journal name, language of the publication).

Operations[edit]

The CoNE service supports the following operations:

  • searching (based on the metadata of named entity i.e. in case of journals: journal name, publisher name, place of publication
  • retrieving details of selected named entity e.g. retrieving the metadata of a selected journal based on the journal identifier
Operation Status Input Output Description
query implemented (opt)q:String - the query (see below)
(opt)[predicate]: String - a value for a certain field (see below)
(opt)lang: String - the language as ISO 636-1 code
(opt)n: int - The maximum number of results that should be returned
String Scope:Public
Gives back a list of resources/ids in the given language (default 'en') matching the given query. The number of maximum hits given back is configurable (default 50).
details implemented id: String
(opt)lang: String
String Scope:Public
Returns all available information on the resource in the given language (default 'en') identified by the given id.
all implemented (opt)lang: String String Scope:Public
Gives back a list of all resources/ids in the given language (default 'en').

The query parameter q may contain a single term (e.g. q=nature) or multiple terms (e.g. q=psychology therapy). CoNE will return those entries that contain all terms in one or more of the searchable fields. Putting the query into quotes e.g. q="John Doe" will cause CoNE to search for exact matches, wildcards (*) are allowed. As an alternative to search all fields, it is also possible to search certain fields (predicates) by specifying the predicate name and a query similar to the syntax described above (e.g. foaf:family_name=Miller or http://purl.org/escidoc/metadata/terms/0.1/suffix="*x"). The namespace of the predicate can be given fully or as prefix, prefixes are predefined here.

Interfaces[edit]

The CoNE service comes with a flexible interface definition that defines the Input/Output formats. The following formats are available

Interface Status Input Output Description
jquery implemented HTTP Get-Request jquery proprietary list, JSON HTTPServlet used by jquery Javascript components for autosuggest features
options implemented HTTP Get-Request Format that can easily be read by Java to generate JSF options HTTPServlet usable by Java a.o.
json implemented HTTP Get-Request pure JSON HTTPServlet usable by Javascript/AJAX components
html implemented HTTP Get-Request (X)HTML HTTPServlet that provides a human readable overview on the data (planned to be used for author pages)
rdf implemented HTTP Get-Request RDF/XML HTTPServlet that provides a RDF view on the data
unapi planned HTTP Get-Request varying HTTPServlet that conforms to the unAPI interface specification
vcard planned HTTP Get-Request ASCII(UTF-8 for single values possible using BASE64 encoding?) HTTPServlet that conforms to the vCard specification (RFC2425)

Supported Named Entity Types[edit]

Currently, we support following vocabularies:

Journals[edit]

Languages[edit]

  • url to the language service interface (to be done)
  • metadata description (to be done)
  • example urls to the service (to be done)
  • Functional Specification to handle the entity (to be done)

Persons[edit]

DDC[edit]

  • Metadata description (to be done)
  • Functional Specification for handling entity (to be done)
  • Currently implemented: Dewey Decimal Classification

Mimetypes[edit]

A list of the IANA mimetype list plus additional mimetypes needed for PubMan (currently used for validation).

  • Metadata description (to be done)
  • Functional description for handling the entity (to be done)

eSciDoc mimetypes[edit]

A subset of the mimetypes list above. Planned to be used for PubMan in the future.

  • Metadata description (to be done)
  • Functional description for handling the entity (to be done)

Candidates for Named Entity Types[edit]

Potential further candidates for normalized metadata entries:

MPG Units[edit]

Potential metadata elements[edit]

Complete name In Englisch and/or German?

Alternative name

Place

Address

Homepage

Resources[edit]

Name of service Scope Info Formats supported Interfaces Costs Access
Körperschaftsnormdatei (GKD) More than 1 mio records (german&foreign corporate bodies and conferences) Introduction MAB2 Z39.50 see PND see PND

Conferences/events[edit]

Potential metadata elements[edit]

Title The name of the event (e.g. Symposium on Theory of Computing)

Alternative title Any alternative name of the event

Abbreviation Abbreviated name of the event (e.g. STOC)

Start date Start date of the event

End date End date of the event

Place Place where the event took place

Invitation status The information if the creator was explicitly invited

Remark Sabine: Should this information be stored in controlled metadata record?
Remark Ulla: No, not to my understanding
To my understanding, the invitation status can only be specified for each talk individually and is therefore no generic metadata for the conference --Inga 12:19, 28 December 2007 (CET)