Talk:PubMan Indexing

From MPDLMediaWiki
Jump to: navigation, search

Discussion about search requirements for framework release 1.0

Questions from Michael, arrived at the MPDLMax Planck Digital Library team on 8th of February and further discussed on this wiki side

Rules for sorting the search result

You wanted to send me the rules for sorting the search result

  1. The functional specification regarding sorting is available on PubMan Sorting --Inga 17:56, 19 February 2008 (CETCentral European Time)
  2. For technical implementation, please check: http://colab.mpdl.mpg.de/mediawiki/images/b/bf/PubItemVOComparator.java --Natasa 15:40, 13 February 2008 (CETCentral European Time)]

Discussion:

PubItemVOComparator.java uses <String>.compareToIgnoreCase. This sorts special characters like ä,ö,ü at the end. I was trying java.text.Collator-class. This sorts special characters correctly. I would propose to use the Collator class. --Michael.hoppe 13:04, 14 February 2008 (CETCentral European Time)
Thanks! The involvement of special characters is actually part of the functional specification --Inga 20:09, 19 February 2008 (CETCentral European Time)
OK, Michael, If you think the Collator class provides better results it is fine for us. But, :) one question: does this class actually compares based on the "locale" value for language? We have mixture of English, German and probably other languages (i.e. next candidate is a French language). How it will behave in this case and how it affects the sorting when we use the fuzzy search (as this would probably be the most common case i.e. people rarely specify language in which they search)? --Natasa 11:29, 14 February 2008 (CETCentral European Time)
I am not sure what the locale is used for in the Collator-class. I tried sorting german umlaute and french apostrophes etc. with locale German and locale English and couldn't see any differences. --Michael.hoppe 13:04, 14 February 2008 (CETCentral European Time)
Other remark: When using a custom comparator class with lucene, the custom class doesn't need untokenized index fields for sorting. Therefore we do not need seperate indexes for the sorting anymore. Therefore we can delete the sort.-fields (eg sort.title). You would have to use the escidoc.-fields (e.g. escidoc.title) as sort-criterion. Example of cql: escidoc.metadata=escidoc* sortKeys=escidoc.title --Michael.hoppe 07:36, 14 February 2008 (CETCentral European Time)
Using the index field names as sortKeys seems to be easier and more comprehensive it will not be that big problem for us to change the sorting logic actually. --Natasa 11:29, 14 February 2008 (CETCentral European Time)

Indexes

You have a page where you describe the indexes you want to have for search (PubMan Indexing). Is this page complete? Can i delete all the other fields that we index right now separately? (properties of item/container, attributes of elements ...)

This page is to my understanding complete. --Natasa 15:55, 13 February 2008 (CETCentral European Time)
Please note that this page only describes the search options currently provided by the PubManPublication Management user interface. In addition, we have the requirement to support "expert searches" which could be implemented by specific CQLCommon Query Language statements, e.g. to restrict the search to publication.title. Therefore, I would like to keep separate indexes for metadata elements and selected item properties. Should I provide a list somewhere? --Inga 21:00, 19 February 2008 (CETCentral European Time)
Yes, please provide a list. Currently we index all metadata-elements as own indexes, but no properties!Michael.hoppe 09:20, 20 February 2008 (CETCentral European Time)

Index escidoc.metadata

We then have the index escidoc.any-field instead of escidoc.metadata

The index escidoc.metadata is fine as name, it is the logic of what is indexed with this index is a bit wrong - please do not change the name of the index as this we use in our business logic for queries - the colab page is the func spec and you provided the index names. --Natasa 15:55, 13 February 2008 (CETCentral European Time)
OK, i will change the logic of what is indexed in the escidoc.metadata index. --Michael.hoppe 07:57, 14 February 2008 (CETCentral European Time)
Regarding index names: Could we at least point from this page to the names of the indexes build to provide a certain functionality? I started to add the index names, but wasn't 100% successful until now. Michael, could you complete? --Inga 21:00, 19 February 2008 (CETCentral European Time)
OK, i completed the list --Michael.hoppe 09:22, 20 February 2008 (CETCentral European Time)
There is difference between particular index fields on this page and any-field index on this page: i.e. any-field index still contains ONLY descriptive metadata of item and container and component (i.e. what is in the respective metadata records) and respective IDs and PIDs and NOT other PROPERTIES (e.g. context identifier - if needed they will be separate index fields such as escidoc.item.context etc.) --Natasa 15:55, 13 February 2008 (CETCentral European Time)
OK, I will implement it according to the page. --Michael.hoppe 07:57, 14 February 2008 (CETCentral European Time)

Index any.*

Can we remove the any- prefix from the index names (escidoc. any-genre gets escidoc.genre)?

Please see my answer for renaming index names above. Why would you know rename the indexes? You have given the names for the indexes and we were previously not clear what was the logic, does it has something to do with repeatable metadata? for start maybe not renaming the indexes would be fine. --Natasa 16:48, 13 February 2008 (CETCentral European Time)
OK, i will leave the names as they are. --Michael.hoppe 07:57, 14 February 2008 (CETCentral European Time)
Hm! The SRUSearch/Retrieval via URL interface currently lists many indexes which are hard to map to the respective elements because we decided not to keep the complete path. What do you think about a dual strategy: Keep the names for all indexes already used (see index names in article), provide new (longer) names for additional indexes (see my remark above) --Inga 21:00, 19 February 2008 (CETCentral European Time)
In the article you only refer to the any-indexes which are customized for you. You dont refer to any of the indexes that have names according to the element-names. exception currently is escidoc.organization-name. You can instead use escidoc.any-organizations. Then i could change the index-names to longer names (eg escidoc.publication.creator.person.complete-name instead of escidoc.complete-name) if you want. --Michael.hoppe 09:29, 20 February 2008 (CETCentral European Time)

Index organization.name

You want to search e.g. all Organization.Name for each language separately and for all languages at once

  • If you only want to search English organization-names, use the escidoc_en database
  • If you want to search all language organization-names, use the escidoc_all database
On last developer workshop we agreed to have one search database, which is aware of language specific stemming if the metadata has xml-lang associated with it and to change the logic to "fuzzy" search logic. Are the 2 questions above of further relevance then? --Natasa 16:48, 13 February 2008 (CETCentral European Time)
We cannot do stemming for one index in more than one language. Therefore we decided on the last workshop that we can do fuzzy search instead. So as agreed in the workshop, I added stopwords to the escidoc_all database. You can do fuzzy search in cql eg: escidoc.metadata=/fuzzy dokument --Michael.hoppe 07:57, 14 February 2008 (CETCentral European Time)
Does that mean that eSciDocEnhanced Scientific Documentation will further have various databases, thus something like (escidoc_all.metadata=plankton OR escidoc_de.title=fischfutter) won't be possible? The user may only want the specify the language for a specific part of the search request, e.g. to limit the result. I would suggest to forget about the language specific databases. --Inga 21:00, 19 February 2008 (CETCentral European Time)
I agree, the concept of different language-specific databases didnt proove. I suggest only using the language-independent database and do fuzzy search. --Michael.hoppe 09:34, 20 February 2008 (CETCentral European Time)

Index any-organizaton-pids

Can we remove the any-organization-pids index?

Absolutely not! This should not happen, as this index is very important for browsing and i.e. searching with one organization pid for MPGMax-Planck-Gesellschaft should give as results items which are related directly to the child-organizational-unit of MPGMax-Planck-Gesellschaft even if the MPGMax-Planck-Gesellschaft is not related directly to the MPGMax-Planck-Gesellschaft in the metadata! This is an important issue it should provide the path-list of ids --Natasa 16:48, 13 February 2008 (CETCentral European Time)
Michael.hoppe 07:57, 14 February 2008 (CETCentral European Time) Then maybe you should add it to the http://colab.mpdl.mpg.de/mediawiki/PubMan_Indexing page
Added. Hopefully is clearer now. Please let us know if there are other questions regarding this issue. --Natasa 11:12, 14 February 2008 (CETCentral European Time)
Split into two elements suggested, see PubMan_Indexing#Organization --Inga 21:00, 19 February 2008 (CETCentral European Time)


Indexes escidoc.component.*

We create new indexes for component-properties with name escidoc.component.<property-name>

For all metadata in the metadata record of the component. This record as agreed with your colleagues (pls. check with Rozita) will be provided by us with the component --Natasa 16:48, 13 February 2008 (CETCentral European Time)
OK. --Michael.hoppe 07:58, 14 February 2008 (CETCentral European Time)
I have no clue what the metadata record of a component is. Has this something to do with containers? Should we document this index somewhere? --Inga 21:00, 19 February 2008 (CETCentral European Time)
I also have no clue, but i dont really mind. It is up to you what metadata-records you put into the component. Indexer will index all elements of all md-records. By now as escidoc.component.<elementname>, but if you want i can change that to a longer name (escidoc.component.<path-to-element>. --Michael.hoppe 09:37, 20 February 2008 (CETCentral European Time)
Metadata record of a component are the metadata of the file itself (component is a file associated with item). That means the metadata record of a component contains file name, file size, content-category, anything else relevant for specific file. --Natasa 09:29, 25 February 2008 (CETCentral European Time)

Index identifier

We write all pids and ids (as described on the PubManPublication Management_Indexing page) in the index identifier and additionally the pid of the file as escidoc.component.pid?

Split into two elements suggested, see PubMan Indexing Internal Identifier --Inga 21:00, 19 February 2008 (CETCentral European Time)

Index escidoc.objecttype

Proposal: I think we should have an index escidoc.objecttype where we can distinguish between item and container

sound reasonable in respect of reusability --Tom 16:49, 13 February 2008 (CETCentral European Time)
Would like to ask here, is this index selective enough? (all objects are items or containers) --Natasa 17:15, 13 February 2008 (CETCentral European Time)
What do you mean? You can restrict your search to containers only eg escidoc.metadata=whatever and escidoc.objecttype=container. -- Michael.hoppe 08:02, 14 February 2008 (CETCentral European Time)
Another issue, can we make index for content-model-name (and not only for the content-model-id?) does it make sense? --Natasa 17:15, 13 February 2008 (CETCentral European Time)
Would make sense. Slows down the index-generation a bit. -- Michael.hoppe 08:13, 14 February 2008 (CETCentral European Time)
Is it slowing it dramatically? (i thought that in item.xml all referenced objects are present with id of the object and the title of the object i.e. think that this is named externaltitle or smth. so one does not need to make extra query from the content model. Is this actually the case? --Natasa 11:20, 14 February 2008 (CETCentral European Time)
No this is only partly the case. When retrieving the object with RESTRepresentational State Transfer, we get an attribute xlink:title, containing the name. But when requesting it with SOAPSimple Object Access Protocol, we dont get the attribute xlink:title. And the indexer works with SOAPSimple Object Access Protocol! So indexer would have to retrieve the content-model.xml to get the name. I have to test, how long this request takes. Hopefully below one second ;-) -- Michael.hoppe 16:26, 14 February 2008 (CETCentral European Time)
Should we document this index somewhere? --Inga 21:00, 19 February 2008 (CETCentral European Time)

Searching for ISSNInternational Standard Serial Number/ISBNInternational Standard Book Number

How do you put the ISSNInternational Standard Serial Number in the metadata?

Option a: As dc.identifier like

 <identifier>
   <catalog>URIUniform Resource Identifier</catalog> 
   <entry>urn:ISSN:1361-3200</entry> 
 </identifier> ?

Then i could just put it in the index like that and you can search for it

 (urn:ISSN* or urn:ISSN:13*)

Option b: Or do you put the ISSNInternational Standard Serial Number like that:

 <issn>1361-3200</issn>

Then i could write it in the index as ISSNInternational Standard Serial Number:1361-3200 and you can search for it

 (ISSNInternational Standard Serial Number* or ISSNInternational Standard Serial Number:13*)

We put in the metadata in the following manner --Natasa 17:15, 13 February 2008 (CETCentral European Time)

 <dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xmlns:eidt="http://escidoc.mpg.de/metadataprofile/schema/0.1/idtypes" 
   xsi:type="eidt:ISSNInternational Standard Serial Number">0028-0836</dc:identifier>
 <dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:eidt="http://escidoc.mpg.de/metadataprofile/schema/0.1/idtypes" 
   xsi:type="eidt:ISBNInternational Standard Book Number">978-3-499-13467-8</dc:identifier>
OK, then i could take the xsi:type together with the value and index it as ISBNInternational Standard Book Number:asdasdasd. Is this OK for you? -- Michael.hoppe 08:13, 14 February 2008 (CETCentral European Time)
The problem is that if we index it as e.g. ISBNInternational Standard Book Number:asdasdasd we will not be able to find "asdasdasd" only, that was why we wanted to index it as two word-tokens such as "ISBNInternational Standard Book Number asdasdasd" - note: we are aware of the consequences for not exact results, but the users sometimes search only for an identifier value and they do not know exactly the type of the identifier.
I checked what happens if we index it as 2 tokens: We then had to search it as phrase (in double quotes). But unfortunately, lucene doesn't support wildcards in phrase-queries. So we wouldnt be able to search for "ISSNInternational Standard Serial Number a*". What if we index it as the two tokens: one with just the value of the identifier (asdasdasd) and the other one as <type>:<value> (ISSNInternational Standard Serial Number:asdasdasd)? -- Michael.hoppe 16:18, 14 February 2008 (CETCentral European Time)
When we deal with any-field index we need smth like "ISSNInternational Standard Serial Number" and "asdasd" to be able to search for e.g. ISSNInternational Standard Serial Number or ISBNInternational Standard Book Number* or ISBNInternational Standard Book Number 1234* in any-field index. When we deal with any-identifier index we need the variants to search for ISSNInternational Standard Serial Number:12* or ISSNInternational Standard Serial Number:123123123 or ISSNInternational Standard Serial Number* . Does this clarifies a bit? Not certain if I can provide an answer in real implementation terms :) --Natasa 17:51, 14 February 2008 (CETCentral European Time)
Regarding wildcards in phrase search. We probably should document this limitation somewhere. The current specification "Rest_api_doc_SB_Search.pdfPortable Document Format" seems only to refer to CQLCommon Query Language which lists this use case, see -> masking. Do we automatically right-truncated phrases? --Inga 21:19, 19 February 2008 (CETCentral European Time)
Yes we definitely should document this limitation. No we do no automatic right-truncation of search-words. --Michael.hoppe 09:46, 20 February 2008 (CETCentral European Time)
Please note that we do not talk of a "phrase search" when we talk on identifier index. If the requirement was for a phrase search that would have been explicitly stated. I used "ISSNInternational Standard Serial Number" "asdasd" or "ISSNInternational Standard Serial Number asdasd" not as phrases in the examples, but to delimit the search criteria from other text. Maybe this was a confusion. --Natasa 09:32, 25 February 2008 (CETCentral European Time)

Normalization of ISSNInternational Standard Serial Number/ISBNInternational Standard Book Number

ISSNs and ISBNs may be specified with our without hyphens or blank. Therefore, a normalization would be required, e.g. by deleting all hyphens and blanks before the string is indexed. In addition, we might consider ISBN-10 to ISBN-13 conversion --Inga 16:50, 19 February 2008 (CETCentral European Time)

Question: Is it also possible to translate a cql query on the fly? It would be appreciated if the cql statement "escidoc.issn=0028-0836" would return the same result as "escidoc.issn=00280836". --Inga 16:50, 19 February 2008 (CETCentral European Time)

normalization should not only be done while indexing but also while searching. So the best place to do this is the analyzer, which analyzes while indexing and while searching. But we only may do this normalization for identifiers, so the analyzer has to decide upon this normaization dependent on the index-name. So we could do this for the index any-identifier but not for the index escidoc.metadata which also contains the identifiers. And then, all identifiers would get normalized, not only ISSNInternational Standard Serial Number and ISBNInternational Standard Book Number. --Michael.hoppe 09:58, 20 February 2008 (CETCentral European Time)
Different behaviors in escidoc:metadata and escidoc:identifier is probably confusing --Inga 12:32, 21 February 2008 (CETCentral European Time)
I am not sure why you want different behavior of identifiers in any-index and any-identifier index. When we can't use wildcards in phrase-queries, we always should index identifiers as ISSNInternational Standard Serial Number:dadads and not as ISSNInternational Standard Serial Number ddads, because we cannot search for ISSNInternational Standard Serial Number da* as phrase and then might find objects that have ISSNInternational Standard Serial Number=32143 and ISBNInternational Standard Book Number=dader. But we could search for ISSNInternational Standard Serial Number:da*! We additionally could index the value (dadads) then you also would be able to find only dadads.
Your cql-queries then would be:
  • escidoc.any-identifier=ISSNInternational Standard Serial Number:123-234
  • escidoc.any-identifier=ISSNInternational Standard Serial Number:12*
  • escidoc.any-identifier=ISSNInternational Standard Serial Number:*
  • escidoc.any-identifier=ISSNInternational Standard Serial Number
  • escidoc.any-identifier=123-234 --Michael.hoppe 10:34, 20 February 2008 (CETCentral European Time)
My ambition was to harmonize the article - which used "ISSNInternational Standard Serial Number 123-234" for any-index and "ISSNInternational Standard Serial Number:123-234" for any-identifier before I started the revision. I agree with your argumentation pro colons, but could you please explain, why the 4. example would work? Anyway, I would suggest to change the article accordingly --Inga 12:32, 21 February 2008 (CETCentral European Time)
Inga, you are right, 4. example wouldnt work. So we have to index ISSNInternational Standard Serial Number:zwtzrz and ISSNInternational Standard Serial Number and zwtzrz, then all examples will work --Michael.hoppe 10:05, 22 February 2008 (CETCentral European Time)
That was actually the requirement. Hopefully now is clearer? --Natasa 09:34, 25 February 2008 (CETCentral European Time)

Scope of indexing

All metadata records or only escidoc metadata records?

Do you want me to provide a generic lucene-index that contains data from all MDMetadata-records that you have in the items/container and not only from the internal-one with attribute name=escidoc?

I think there is no more "internal-one" metadata record. Depending on the content-model we use different metadata profiles. Current indexes only deal with pubman metadata profile (unfortunately named as escidoc index). How do you know which metadata record is internal? --Natasa 17:15, 13 February 2008 (CETCentral European Time)
The 'internal' md-record for me is the md-record with attribute name = 'escidoc'. The profile of this md-record doesn't really matter for me, as long as i can find the elements that have to get indexed. eg for index any-title md-record has to have elements title, alternative. Michael.hoppe 08:13, 14 February 2008 (CETCentral European Time)
We have not tried putting more than a single metadata record for an item/container (that is something we will do next month). How will then indexing be done? Probably we need to focus on single metadata record for indexing (in whatever profile it is) - as long as you have the information that this is the "default" one. --Natasa 17:15, 13 February 2008 (CETCentral European Time)
See above: default md-record is the md-record with attribute name='escidoc' --Michael.hoppe 08:13, 14 February 2008 (CETCentral European Time)

All elements from all records?

Should this Lucene-Index contain the data from all md-elements of all md-records?

Would dare to state for now: NO! We will not maintain many metadata records at the same time. Most probable use-case for existance of more metadata records would be: we ingest some data, we keep original metadata as a metadata record, but further we work only on "solution-supported-metadata-profile" which can be different from the originally ingested one. Therefore we probably would not index the original one for searching. --Natasa 17:18, 13 February 2008 (CETCentral European Time)

Naming of this index? Creation of new indexes?

How should we name the indexes?

I think the names are not the problem, we maybe need to talk a bit about - how do we create a new index easily - that is not only single-metadata index, but sometimes is a compound one (like in case of any-title index)? --Natasa 17:21, 13 February 2008 (CETCentral European Time)
For now, we do not have an easy method. We have to change the stylesheet that extracts the data out of the item/container.xml and writes the index-information-document. Then we have to recreate the index-database. (I am currently developing an 'admin-tool' that can do the recreation).-- Michael.hoppe 08:17, 14 February 2008 (CETCentral European Time)
"I am currently developing an 'admin-tool' that can do the recreation" - Michael, that is a great news! --Natasa 11:20, 14 February 2008 (CETCentral European Time)

Clarification on Organization index

On Indexing of organization PIDS

It is important to understand some constraints and limitations regarding the indexing of organizational unit PIDPersistent Identifer or Identification (Path-list) with the PubItem

  • Q: what happens when the organizational structure is changed i.e. when the organizational unit is assigned with a new parent? Should all corresponding PubItem indexes be updated?
  • A: according to the Organizational Units life-cycle, they have a status "new", "opened", "closed".
    • when in status "new": organizational unit can not be associated with a PubItem, because is still not made "official". In this stage any re-parenting (i.e. assigning of new parents, removing of old parents) can take place
    • when in status "opened": organizational unit can be associated with a PubItem, it is "official". In this stage re-parenting can not take place
    • when in status "closed": organizational unit can be associated with a PubItem, can not be associated with new children or parents i.e. no changes are allowed (but can be associated with other OrgUnits via "successor", "predecessor" relations).
      According to this settings, there will be no need to re-index PubItem indexes in case of re-parenting. The backround of the idea is, even if an orgunit is in status "opened" and one must re-parent it - should not be allowed, because of the real change of the organizational structure - is simply a new organizational unit (even if with the same name), and should be related with "successor" or "predecessor" to the newly created organizational unit. Logically, the old organizational unit has been changed and does not exist "officially" as in previous context.
  • Q: what happens with successor/predecessor relations when searching?
  • A: The system should be made "smart" to inform the user who is searching, that when searching for a specific organizational unit, that unit had a "successor" or a "predecessor" and to ask the user if she would also like to search in that organizational unit (thus the scalability problem can be substantially reduced, as: we will not re-index already existing PubItems and the frequency of "successors", "predecessors" and changing of the organizational unit structure is not high.

Question & Remark: The fact that an open organizational unit cannot be re-parented is not in sync with the former usage scenarios? Please note that I support any measures which simplify the OUOrganizational Unit management. But not-enabling re-parenting only shifts the problem, because this will force the administrative users to create additional OUs relation "isSuccessorOf" to an existing OUOrganizational Unit.

An issue in here is really why to allow re-parenting in case when organizational unit is already "opened"? Re-parenting means something had changed in the organizational structure - thus from that moment onwards it is a new organizational unit actually. This is simplifying a lot.--Natasa 09:39, 25 February 2008 (CETCentral European Time)

To my understanding, indexing of the "Parent-path-list" is only a workaround solution (right?). What we probably require is a fast relation service which returns all objects related [in a specific type] to an object and can be called recursively. This would give solutions the option to retrieve the IDs of all predecessors/successors/children/parents and build the search query on the fly - if the user choose so. --Inga 19:04, 19 February 2008 (CETCentral European Time)

Indexing of the "Parent-path-list" is not necessarily a workaround solution. It is becoming important for fast retrieval of results. I doubt that there is better alternative than making long queries with "OR/AND" (Or we should be more modest with our requirements i.e. when searching for "MPDLMax Planck Digital Library" one gives only results for "MPDLMax Planck Digital Library" not as well for the MPDLMax Planck Digital Library children (that was exactly the reason for extra parent-path-list index!). --Natasa 09:39, 25 February 2008 (CETCentral European Time)

Search index names&logic synchronization

In Progress

Setting-up the rules

Shall talk about the

  • indexing context (publication | virrelement | faces)
  • composition (derivation)
  • clear name of which metadata is used for indexing

Automatic index names

As an improvement for this requirement, we would like to set-up a rule for indexes. The rule should be the following:

1) Make automatic index of the qualified path of all metadata/attributes such as:

         * publication.title
         * publication.alternativeTitle
         * publication.genre
         * publication.source.title
         * publication.source.genre
         * publication.event.title


Where qualified means that for item-level structures partly the path is ignored (possible?, makes sense?) e.g.:

  • components element is ignored
    • therefore instead of having index of components.component.file-name we have index such as component.file-name
  • for content-model-specific properties full path is to be taken as we can not know in advance the structure in it e.g.:
    • content-model-specific.local-tags.local-tag

The below logic we still do not have in the metadata, but if we specify source within source then it would be like:

         * publication.source.source.genre
         * publication.source.source.title

Compound/Derived indexes

Compound index names are dependent on the metadata set and are created by request, we need to make sure that we do have correct index names when requesting new index (in future we can expect to register xslt transformations for derivation on content model level):

        * publicaton.compound.publication-titles
           ** index of all titles of the publication on publication level (title, alternativeTitle)
IMO is enough to name it publication.compound.title--Friederike 10:58, 11 February 2009 (UTCCoordinated Universal Time)
        * publication.compound.any-titles
           ** index of all titles of the publication on any level (publication.title, publication.alternativeTitle, 
              source.title, source.source.title, source.alternativeTitle, event.title, event.alternativeTitle)
Add new option ANY (compound | any | void) * publication.any.title
To make it a rule one could say compound referres to all object on same level, any referres to all objects on same level and below--Friederike 10:58, 11 February 2009 (UTCCoordinated Universal Time)
        * publication.compound.source-titles
           **index of all titles of the publication on source level (source.title, source.alternativeTitle, if wished also 
              source.source.title, source.source.AlternativeTitle - not necessary now)
Respectivly: *publication.source.compound.title--Friederike 10:58, 11 February 2009 (UTCCoordinated Universal Time)

The index publication.source.any.title would deliver also the (not yet possible) titles of a sources source.


The option comound or any always reffers to the object in front of it.

Therefor publication.compound.title gives title & alternativeTitle. And publication.source.compound.title gives source.title & source.alternativeTitle


These are to be created based on our requirements

Person indexes

See also PubMan_Indexing#Persons_.28escidoc.any-persons.29 Stated: Creator.Person.CompleteName with Creator.CreatorType = "Person"

we need compound indexes such as

      *publication.compound.creator.publication-person
         **This will index all creator persons on publication level
      *publication.compound.creator.any-person
         **This will index all creator persons on any level (publication, source, source.source)
      *publication.compound.creator.source-person
         **This will index all creator persons on source level only (if wished also source.source level)


Organization indexes

See also PubMan_Indexing#Organization Logic is the same, index names proposal change:

    * escidoc.organization-name => automatically becomes  publication.creator.organization.organization-name
    * escidoc.any-organization-pids  =>   becomes publication.compound.publication-organization-pids
           **if indexing only publication creators organizations
    * escidoc.any-organization-pids  => becomes publication.compound.any-organization-pids 
           ** if indexing all publication and source creators affiliations organizations


Creator indexes

We need compound index for searching any type of creator, and sorting by any type of creator

       * publication.compound.publication-creator-names

To index (also needed respective sortkeys) all names of publication creators independently of whether this is a person or organization.

NOTE: This is anyway a requirement, as we are sorting by creator names, and these can be organization or person names. There is not an index that does it only for publication creators. Existing index does it also for source creators.

@TODO: check if similar compound index we also need for PIDs e.g. publication.compound.publication-creator-pids

Discussion/Comments

Faces

  • In my opinion, this is not relevant for Faces, because in Faces we only have very clear indexes (one per attribute).--Kristina 10:17, 6 February 2009 (UTCCoordinated Universal Time)
Ok, but even in this case we have clear naming convention for face indexes such as:
     *faces.emotion instead of escidoc.emotion
     *faces.age instead of escidoc.age


ViRRVirtueller Raum Reichsrecht

Section moved to VIRR Development Page

General

  • Perhaps it makes sense to add escidoc to beginning of indexes, like:
*escidoc.publication.compound.creator
*escidoc.faces.album.creator

because then we could also query like

*escidoc.compound.creator

Possible use case: find all escidoc items where Mr. X was creator of.

*escidoc.compound.title delivers all title on first level 
 **(publication.title, publication.alternativetitle, virrelement title, etc. )
*escidoc.any.title 
 **deliviers all title elements in all escidoc items