ESciDoc Search Index Names

From MPDLMediaWiki
Jump to navigation Jump to search

Introduction[edit]

This article describes the rules for creation of search indexes of the resource properties and metadata stored as the eSciDoc content resources.
Note: the specification at present covers indexing rules of item and container resources. The automatically generated index names rules may however be used also for other resources. Should be separately specified if needed.

The rules define top-level indexing contexts derived from the XML representation of the escidoc resource. The names of these contexts should be used as prefix for index names of indexed elements. The following top-level contexts are defined:

  • property
  • metadata (name of the context may change, depending on the metadata profile name)
    • Important: only metadata records labeled "escidoc" should be indexed, unless specified additionally
  • component
  • struct-map

Automatic index names[edit]

This section describes the rules for indexes that should always be created based on respective rules of the top-level contexts.

Indexing of properties[edit]

  • All properties are indexed (represented as elements or attributes in escidoc-resource.xml)
  • Index names of regular properties are
 property.<path-of-elements-and-attributes-under-properties-element>
  • Index names on content-model-specific-properties (cmsprop) are
 property.content-model-specific.<path-of-elements-and-attributes-under-cmsprop-element>


    • Exception 1: even though property.objid is not explicitly stated in the property section of the resource XML, the index name should be
                *property.objid
    • Exception 2: property indexes that include additionally referenced data
                *property.created-by.objid
                *property.created-by.name
    • Example 1: standard property indexes
                *property.version.status
                *property.version.number
                *property.public-status
                *property.public-status-comment
                *property.context.objid
                *property.content-model.objid
                *property.lock-status
                *property.pid
                *property.creation-date
    • Example 2: content-model-specific-property indexes
                *content-model-specific.local-tags.local-tag

Indexing of metadata[edit]

  • Top-level context for index names of metadata has prefix based on the top-level enclosing metadata element. At present, MPDL solutions have the following contexts (Note: these are not fixed, and may be extended with new top-level metadata indexing contexts, it depends on application profile in use):
    • publication
    • face-item
    • virr-toc
    • virr-element
    • FacesAlbum (Note: should be renamed to faces-album)
  • Index name should be <metadata-record-enclosing-element-name>.<path-of-elements-and-attributes-under-metadata-element>
  • Index names should not use namespace alias (e.g. dc, dcterms, escidoc)
    • Examples publication: generated indexes of the qualified path of all metadata
         * publication.title
         * publication.alternativeTitle
         * publication.genre
         * publication.source.title
         * publication.source.genre
         * publication.event.title
         * publication.source.source.genre
         * publication.source.source.title
    • Examples face-item: generated indexes of the qualified path of all metadata
         * face-item.emotion
         * face-item.picture-group
         * face-item.identifier
         * face-item.age
         * face-item.age-group
         * face-item.gender

Indexing of components[edit]

  • The top-level indexing context name is "component".
  • Same rules as for Indexing of properties and indexing of metadata apply.
    • Example 1: index names of component properties
        *component.file-name
        *component.description
        *component.visibility
    • Example 2: index names of component metadata for "file metadata profile"
        *component.file.title
        *component.file.description
        *component.file.format
        *component.file.extent

Indexing of struct-map[edit]

  • The top-level indexing context name is "struct-map".
  • Same rules as for Indexing of properties and indexing of metadata apply.
    • Example 1: index names of component properties
        *component.file-name
        *component.description
        *component.visibility
    • Example 2: index names of component metadata for "file metadata profile"
        *component.file.title
        *component.file.description
        *component.file.format
        *component.file.extent

Compound/Derived indexes[edit]

Compound index names are dependent on the metadata set and are created by request, we need to make sure that we do have correct index names when requesting new index (in future we can expect to register xslt transformations for derivation on content model level):

        * publicaton.compound.publication-titles
           ** index of all titles of the publication on publication level (title, alternativeTitle)
IMO is enough to name it publication.compound.title--Friederike 10:58, 11 February 2009 (UTC)
        * publication.compound.any-titles
           ** index of all titles of the publication on any level (publication.title, publication.alternativeTitle, 
              source.title, source.source.title, source.alternativeTitle, event.title, event.alternativeTitle)
Add new option ANY (compound | any | void) * publication.any.title
To make it a rule one could say compound referres to all object on same level, any referres to all objects on same level and below--Friederike 10:58, 11 February 2009 (UTC)
        * publication.compound.source-titles
           **index of all titles of the publication on source level (source.title, source.alternativeTitle, if wished also 
              source.source.title, source.source.AlternativeTitle - not necessary now)
Respectivly: *publication.source.compound.title--Friederike 10:58, 11 February 2009 (UTC)

The index publication.source.any.title would deliver also the (not yet possible) titles of a sources source.


The option comound or any always reffers to the object in front of it.

Therefor publication.compound.title gives title & alternativeTitle. And publication.source.compound.title gives source.title & source.alternativeTitle


These are to be created based on our requirements

Person indexes[edit]

See also PubMan_Indexing#Persons_.28escidoc.any-persons.29 Stated: Creator.Person.CompleteName with Creator.CreatorType = "Person"

we need compound indexes such as

      *publication.compound.creator.publication-person
         **This will index all creator persons on publication level
      *publication.compound.creator.any-person
         **This will index all creator persons on any level (publication, source, source.source)
      *publication.compound.creator.source-person
         **This will index all creator persons on source level only (if wished also source.source level)


Organization indexes[edit]

See also PubMan_Indexing#Organization Logic is the same, index names proposal change:

    * escidoc.organization-name => automatically becomes  publication.creator.organization.organization-name
    * escidoc.any-organization-pids  =>   becomes publication.compound.publication-organization-pids
           **if indexing only publication creators organizations
    * escidoc.any-organization-pids  => becomes publication.compound.any-organization-pids 
           ** if indexing all publication and source creators affiliations organizations


Creator indexes[edit]

We need compound index for searching any type of creator, and sorting by any type of creator

       * publication.compound.publication-creator-names

To index (also needed respective sortkeys) all names of publication creators independently of whether this is a person or organization.

NOTE: This is anyway a requirement, as we are sorting by creator names, and these can be organization or person names. There is not an index that does it only for publication creators. Existing index does it also for source creators.

@TODO: check if similar compound index we also need for PIDs e.g. publication.compound.publication-creator-pids

Discussion/Comments[edit]

Faces[edit]

  • In my opinion, this is not relevant for Faces, because in Faces we only have very clear indexes (one per attribute).--Kristina 10:17, 6 February 2009 (UTC)
Ok, but even in this case we have clear naming convention for face indexes such as:
     *faces.emotion instead of escidoc.emotion
     *faces.age instead of escidoc.age


ViRR[edit]

Section moved to VIRR Development Page

General[edit]

  • Perhaps it makes sense to add escidoc to beginning of indexes, like:
*escidoc.publication.compound.creator
*escidoc.faces.album.creator

because then we could also query like

*escidoc.compound.creator

Possible use case: find all escidoc items where Mr. X was creator of.

*escidoc.compound.title delivers all title on first level 
 **(publication.title, publication.alternativetitle, virrelement title, etc. )
*escidoc.any.title 
 **deliviers all title elements in all escidoc items