ESciDoc Search Index Names

From MPDLMediaWiki
Jump to navigation Jump to search

Introduction[edit]

This article describes the rules for creation of search indexes of the resource properties and metadata stored as the eSciDoc content resources.

Note: the specification is at present defined only for indexing of item and container resources.

The rules define top-level indexing contexts derived from the XML representation of the escidoc resource. The names of these contexts should be used as prefix for index names of indexed elements. The following top-level contexts are defined:

  • property
  • metadata
  • component
  • struct-map

Automatic index names[edit]

This section describes the rules for indexes that should always be created based on respective rules of the top-level contexts.

Indexing of properties[edit]

  • All properties are indexed
  • Index names of regular properties are
       property.<path-of-elements-and-attributes-under-properties-element>
  • Index names on content-model-specific-properties are
       property.content-model-specific.<-path-of-elements-and-attributes-under-content-model-specific-properties-element>


    • Exception 1: even though property.objid is not explicitly stated in the property section of the resource XML, the index name should be
                *property.objid
    • Exception 2: property indexes that include additionally referenced data
                *property.created-by.objid
                *property.created-by.name
    • Example 1: standard property indexes
                *property.version.status
                *property.version.number
                *property.public-status
                *property.public-status-comment
                *property.context.objid
                *property.content-model.objid
                *property.lock-status
                *property.pid
                *property.creation-date
    • Example 2: content-model-specific-property indexes
                *content-model-specific.local-tags.local-tag

Indexing of metadata[edit]

  • Top-level context for index names of metadata has prefix based on the top-level enclosing metadata element. At present, MPDL solutions have the following contexts (Note: these are not fixed, and may be extended with new top-level metadata indexing contexts, it depends on application profile in use):
    • publication
    • face-item
    • virr-toc
    • virr-element
    • FacesAlbum (Note: should be renamed to faces-album)
  • Index name should be <metadata-record-enclosing-element-name>.

Index of the qualified path of all metadata and attributes such as:

         * publication.title
         * publication.alternativeTitle
         * publication.genre
         * publication.source.title
         * publication.source.genre
         * publication.event.title

The below logic we still do not have in the metadata, but if we specify source within source then it would be like:

         * publication.source.source.genre
         * publication.source.source.title
  • metadata
    • always use the top-level element name in the metadata record as prefix e.g.


   <escidocMetadataProfile:publication xmlns:escidocMetadataProfile="http://escidoc.mpg.de/metadataprofile/schema/0.1/" type="book">
     <publication:creator xmlns:publication="http://escidoc.mpg.de/metadataprofile/schema/0.1/publication" role="author">
       <escidoc:person xmlns:escidoc="http://escidoc.mpg.de/metadataprofile/schema/0.1/types">
          <escidoc:complete-name></escidoc:complete-name>
          <escidoc:family-name>Kastens</escidoc:family-name>
       <escidoc:given-name>Karin</escidoc:given-name>
       <escidoc:organization>
          <escidoc:organization-name>Library,MPI for Psychlinguistics</escidoc:organization-name>
          <escidoc:address></escidoc:address>
          <escidoc:identifier>showPermanent</escidoc:identifier>
       </escidoc:organization>
       <escidoc:identifier>urn:cone:persons100</escidoc:identifier>
      </escidoc:person>
    </publication:creator>
    <dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">NEK Test Book</dc:title>
    <dcterms:alternative xmlns:dcterms="http://purl.org/dc/terms/">APA guide to preparing manuscripts for journal publication</dcterms:alternative>
    <publication:publishing-info xmlns:publication="http://escidoc.mpg.de/metadataprofile/schema/0.1/publication">
       <dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">American Psychological Association</dc:publisher>
       <escidoc:place xmlns:escidoc="http://escidoc.mpg.de/metadataprofile/schema/0.1/types">Washington, D.C.</escidoc:place>
       <escidoc:edition xmlns:escidoc="http://escidoc.mpg.de/metadataprofile/schema/0.1/types">   </escidoc:edition>
    </publication:publishing-info>
    <dcterms:created xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF"></dcterms:created>
    <dcterms:modified xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF"></dcterms:modified>
    <dcterms:dateSubmitted xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF"></dcterms:dateSubmitted>
    <dcterms:dateAccepted xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF"></dcterms:dateAccepted>
    <publication:published-online xmlns:dcterms="http://purl.org/dc/terms/" xmlns:publication="http://escidoc.mpg.de/metadataprofile/schema/0.1/publication" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF">   </publication:published-online>
    <dcterms:issued xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF">1991</dcterms:issued>
    <publication:total-number-of-pages xmlns:publication="http://escidoc.mpg.de/metadataprofile/schema/0.1/publication"></publication:total-number-of-pages>
    <dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/"></dc:subject>
    <dcterms:DDC xmlns:dcterms="http://purl.org/dc/terms/"></dcterms:DDC>
    <dcterms:tableOfContents xmlns:dcterms="http://purl.org/dc/terms/"></dcterms:tableOfContents>
 </escidocMetadataProfile:publication>


    • publication | virrelement | faces)
  • composition (derivation)
  • clear name of which metadata is used for indexing

Indexing of components[edit]

Where qualified means that for item-level structures partly the path is ignored (possible?, makes sense?) e.g.:

  • components element is ignored
    • therefore instead of having index of components.component.file-name we have index such as component.file-name
  • for content-model-specific properties full path is to be taken as we can not know in advance the structure in it e.g.:
    • content-model-specific.local-tags.local-tag

Compound/Derived indexes[edit]

Compound index names are dependent on the metadata set and are created by request, we need to make sure that we do have correct index names when requesting new index (in future we can expect to register xslt transformations for derivation on content model level):

        * publicaton.compound.publication-titles
           ** index of all titles of the publication on publication level (title, alternativeTitle)
IMO is enough to name it publication.compound.title--Friederike 10:58, 11 February 2009 (UTC)
        * publication.compound.any-titles
           ** index of all titles of the publication on any level (publication.title, publication.alternativeTitle, 
              source.title, source.source.title, source.alternativeTitle, event.title, event.alternativeTitle)
Add new option ANY (compound | any | void) * publication.any.title
To make it a rule one could say compound referres to all object on same level, any referres to all objects on same level and below--Friederike 10:58, 11 February 2009 (UTC)
        * publication.compound.source-titles
           **index of all titles of the publication on source level (source.title, source.alternativeTitle, if wished also 
              source.source.title, source.source.AlternativeTitle - not necessary now)
Respectivly: *publication.source.compound.title--Friederike 10:58, 11 February 2009 (UTC)

The index publication.source.any.title would deliver also the (not yet possible) titles of a sources source.


The option comound or any always reffers to the object in front of it.

Therefor publication.compound.title gives title & alternativeTitle. And publication.source.compound.title gives source.title & source.alternativeTitle


These are to be created based on our requirements

Person indexes[edit]

See also PubMan_Indexing#Persons_.28escidoc.any-persons.29 Stated: Creator.Person.CompleteName with Creator.CreatorType = "Person"

we need compound indexes such as

      *publication.compound.creator.publication-person
         **This will index all creator persons on publication level
      *publication.compound.creator.any-person
         **This will index all creator persons on any level (publication, source, source.source)
      *publication.compound.creator.source-person
         **This will index all creator persons on source level only (if wished also source.source level)


Organization indexes[edit]

See also PubMan_Indexing#Organization Logic is the same, index names proposal change:

    * escidoc.organization-name => automatically becomes  publication.creator.organization.organization-name
    * escidoc.any-organization-pids  =>   becomes publication.compound.publication-organization-pids
           **if indexing only publication creators organizations
    * escidoc.any-organization-pids  => becomes publication.compound.any-organization-pids 
           ** if indexing all publication and source creators affiliations organizations


Creator indexes[edit]

We need compound index for searching any type of creator, and sorting by any type of creator

       * publication.compound.publication-creator-names

To index (also needed respective sortkeys) all names of publication creators independently of whether this is a person or organization.

NOTE: This is anyway a requirement, as we are sorting by creator names, and these can be organization or person names. There is not an index that does it only for publication creators. Existing index does it also for source creators.

@TODO: check if similar compound index we also need for PIDs e.g. publication.compound.publication-creator-pids

Discussion/Comments[edit]

Faces[edit]

  • In my opinion, this is not relevant for Faces, because in Faces we only have very clear indexes (one per attribute).--Kristina 10:17, 6 February 2009 (UTC)
Ok, but even in this case we have clear naming convention for face indexes such as:
     *faces.emotion instead of escidoc.emotion
     *faces.age instead of escidoc.age


ViRR[edit]

Section moved to VIRR Development Page

General[edit]

  • Perhaps it makes sense to add escidoc to beginning of indexes, like:
*escidoc.publication.compound.creator
*escidoc.faces.album.creator

because then we could also query like

*escidoc.compound.creator

Possible use case: find all escidoc items where Mr. X was creator of.

*escidoc.compound.title delivers all title on first level 
 **(publication.title, publication.alternativetitle, virrelement title, etc. )
*escidoc.any.title 
 **deliviers all title elements in all escidoc items