Difference between revisions of "PubMan Indexing"
(22 intermediate revisions by 9 users not shown) | |||
Line 1: | Line 1: | ||
This article lists the search indexes required by the [[PubMan]] solution, focusing the external-repository view as described in the use case [[PubMan Func Spec Search]]. Please note that additional indexes are needed to fulfill administrative tasks on the repository. | '''''The information on this page is deprecated. Please check page [[ESciDoc_Search_Index_Publication_metadata_context | Publication metadata context]] for more accurate information.''''' | ||
This article lists the search indexes required by the [[Portal:PubMan|PubMan]] solution, focusing the external-repository view as described in the use case [[PubMan Func Spec Search]]. Please note that additional indexes are needed to fulfill administrative tasks on the repository. | |||
Initially, the content of this page was copied from the file Systemspecification_Pubman.doc (FE_PM_04 Indexing item data), but the requirements will only be maintained on this page. Please use the [[Talk:PubMan_Indexing|discussion page]] to comment on the existing or to submit new requirements. | Initially, the content of this page was copied from the file Systemspecification_Pubman.doc (FE_PM_04 Indexing item data), but the requirements will only be maintained on this page. Please use the [[Talk:PubMan_Indexing|discussion page]] to comment on the existing or to submit new requirements. | ||
Line 27: | Line 30: | ||
== Genre (escidoc. | == Genre (escidoc.any-genre) == | ||
*Publication.Genre | *Publication.Genre | ||
== Persons (escidoc.any-persons) == | == Persons (escidoc.any-persons) == | ||
Line 45: | Line 47: | ||
*Organization.PID and PIDs of all Organizations in the organizational path hierarchy to the top level organizations (i.e. the PID of the authors affiliations and PIDs of all parent organization). | *Organization.PID and PIDs of all Organizations in the organizational path hierarchy to the top level organizations (i.e. the PID of the authors affiliations and PIDs of all parent organization). | ||
'''Example:''' | '''Example 1:''' | ||
*A PubItem "Title1" has authors: | *A PubItem "Title1" has authors: | ||
**Udo Müller, affiliated to Max-Planck Institute for Psycholinguistics (PID2) | **Udo Müller, affiliated to Max-Planck Institute for Psycholinguistics (PID2) | ||
Line 64: | Line 66: | ||
Even if the Author affiliations in the descriptive metadata are directly related only to PID2 and PID4 | Even if the Author affiliations in the descriptive metadata are directly related only to PID2 and PID4 | ||
'''Example 2:''' | |||
*A PubItem "Title2" has authors: | |||
**Organization: Max-Planck Institute for Plasma Physics (PID5) | |||
**Organization: Max-Planck Institute for Psycholinguistics (PID2) | |||
*If the organizational unit structure is as follows: | |||
MaxPlanckSociety (PID1) | |||
|---Max-Planck-Institute for Psycholinguistics (PID2) | |||
|---Max-Planck Institute for Plasma Physics (PID5) | |||
|__Department 1 (PID4) | |||
PlasmaphysicsSociety (PID6) | |||
|---Max-Planck Institute for Plasma Physics (PID5) | |||
|__Department 1 (PID4) | |||
*Outcome: The index on Organization.PID should contain the following values for the PubItem "Title2": | |||
PID2, PID5, PID1, PID6 | |||
Even if the Organizations in the creator descriptive metadata are directly related only to PID2 and PID5 | |||
Please note: [[Talk:PubMan_Indexing#Clarification_on_Organization_index|clarification]] | Please note: [[Talk:PubMan_Indexing#Clarification_on_Organization_index|clarification]] | ||
Line 82: | Line 104: | ||
==Dates | ==Dates== | ||
'''escidoc.any-dates''' | |||
All publication dates, including following elements | All publication dates, including following elements | ||
* <code>publication.dcterms:created</code> | * <code>publication.dcterms:created</code> | ||
Line 91: | Line 115: | ||
* <code>publication.dcterms:issued</code> | * <code>publication.dcterms:issued</code> | ||
'''Search index: escidoc.created''' | |||
* indexed field: <code>publication.dcterms:created</code> | |||
'''Search index: escidoc.modified''' | |||
* indexed field: <code>publication.dcterms:modified</code> | |||
'''Search index: escidoc.dateSubmitted''' | |||
* indexed field: <code>publication.dcterms:dateSubmitted</code> | |||
'''Search index: escidoc.dateAccepted''' | |||
* indexed field: <code>publication.dcterms:dateAccepted</code> | |||
'''Search index: escidoc.published-online''' | |||
* indexed field: <code>publication.dcterms:published-online</code> | |||
'''Search index: escidoc.issued''' | |||
* indexed field: <code>publication.dcterms:issued</code> | |||
==Event (escidoc.any-event) == | ==Event (escidoc.any-event) == | ||
Line 99: | Line 140: | ||
==Internal Identifier (escidoc. | ==Internal Identifier (escidoc.any-identifier) == | ||
Internal identifiers available for the items and its components: | Internal identifiers available for the items and its components: | ||
*ID and PID of item, PID of files | *ID and PID of item, PID of files | ||
==External Identifier (escidoc.any-identifier) == | |||
==External Identifier (escidoc. | |||
All additional identifiers specified within the metadata record, including | All additional identifiers specified within the metadata record, including | ||
* <code>publication.dc:identifier</code> | * <code>publication.dc:identifier</code> | ||
Line 113: | Line 153: | ||
=== Special requirements for indexing external identifiers === | === Special requirements for indexing external identifiers === | ||
Identifier type and identifier value should be indexed together so that users are able to find an item with identifier type ISSN and identifier value 123-234 in all of following "search by identifier" criteria: | Identifier type and identifier value should be indexed together so that users are able to find an item with identifier type ISSN and identifier value 123-234 in all of following "search by identifier" criteria: | ||
* <code>escidoc. | * <code>escidoc.any-identifier="ISSN 123-234"</code> | ||
* <code>escidoc. | * <code>escidoc.any-identifier="ISSN 12*"</code> | ||
* <code>escidoc. | * <code>escidoc.any-identifier="ISSN *"</code> | ||
* <code>escidoc. | * <code>escidoc.any-identifier="ISSN"</code> | ||
* <code>escidoc. | * <code>escidoc.any-identifier="123-234"</code> | ||
Please note [[Talk:PubMan_Indexing#Normalization_of_ISSN.2FISBN|remarks on normalization]] | Please note [[Talk:PubMan_Indexing#Normalization_of_ISSN.2FISBN|remarks on normalization]] | ||
==Source (escidoc.any-source) == | ==Source (escidoc.any-source) == | ||
Line 130: | Line 169: | ||
== Components Properties == | == Components Properties (escidoc.component.<elementname>)== | ||
Additional indexes are required for following component properties | Additional indexes are required for following component properties | ||
Line 140: | Line 179: | ||
Maybe the best option to index all component properties by default. | Maybe the best option to index all component properties by default. | ||
[[Category: | ==Future Development== | ||
===release/submission comment=== | |||
Via the administrative search it should be possible to search for the release/submission comment. | |||
===CompleteName Index=== | |||
It should be possible to make an exact search for creators like "Kondic, Nicole". With the new framework it will be probably possible to make indexes. Then we should think of a index syntax for persons. | |||
complete name index tokens: "persons.lastname, person.firstname" | |||
complete name index tokens: "persons.lastname person.firstname" | |||
==Related Pages== | |||
[[ESciDoc_Services_Search%26Export|PubMan Search & Export]] | |||
[[Category:PubMan_Functional_Specification|Indexing]] |
Latest revision as of 09:50, 19 February 2010
The information on this page is deprecated. Please check page Publication metadata context for more accurate information.
This article lists the search indexes required by the PubMan solution, focusing the external-repository view as described in the use case PubMan Func Spec Search. Please note that additional indexes are needed to fulfill administrative tasks on the repository.
Initially, the content of this page was copied from the file Systemspecification_Pubman.doc (FE_PM_04 Indexing item data), but the requirements will only be maintained on this page. Please use the discussion page to comment on the existing or to submit new requirements.
Required Indexes[edit]
All released items have to be retrievable via the search engine. Lucene should always index the last released version of database should provide following indexes:
Any Field (escidoc.metadata)[edit]
- All elements defined in the functional Metadata set specification are indexed for the search.
- In addition, following properties are indexed:
<Id of item>
,<PID of item>
,<PID of file>
- Files of any type are indexed, except of content category "correspondence" and "copyright transfer agreement".
Special requirements for identifiers[edit]
In "any-field" index identifiers should be indexed as "identifier-type" and "identifier-value" as two separate tokens e.g.
ISSN
ISBN 232323184738
URI http://mpdl.mpg.de/123
Please note remarks on normalization
Example Queries:
escidoc.metadata=ISSN
- to return all records which have an identifier of type ISSNescidoc.metadata="ISBN 1234*"
- to return all records which have an identifier of type ISBN starting with 1234escidoc.metadata="URI http://mpdl.mpg.de"
- to return all records which have an identifier of type URI with the exact value "http://mpdl.mpg.de"
Genre (escidoc.any-genre)[edit]
- Publication.Genre
Persons (escidoc.any-persons)[edit]
- all Creator.Person.CompleteName with Creator.CreatorType = "Person"
Organization[edit]
by name (escidoc.organization-name)[edit]
- all Organization.Name for each language separately and for all languages at once
by identifier (escidoc.any-organization-pids)[edit]
- Organization.PID and PIDs of all Organizations in the organizational path hierarchy to the top level organizations (i.e. the PID of the authors affiliations and PIDs of all parent organization).
Example 1:
- A PubItem "Title1" has authors:
- Udo Müller, affiliated to Max-Planck Institute for Psycholinguistics (PID2)
- Johanna Müllerin, affiliated to Department1 (PID4) of Max-Planck Institute for Plasma Physics
- If the organizational unit structure is as follows:
MaxPlanckSociety (PID1) |---Max-Planck-Institute for Psycholinguistics (PID2) |---Max-Planck Institute for Plasma Physics (PID5) |__Department 1 (PID4) PlasmaphysicsSociety (PID6) |---Max-Planck Institute for Plasma Physics (PID5) |__Department 1 (PID4)
- Outcome: The index on Organization.PID should contain the following values for the PubItem "Title1":
PID2, PID1, PID5, PID4, PID6
Even if the Author affiliations in the descriptive metadata are directly related only to PID2 and PID4
Example 2:
- A PubItem "Title2" has authors:
- Organization: Max-Planck Institute for Plasma Physics (PID5)
- Organization: Max-Planck Institute for Psycholinguistics (PID2)
- If the organizational unit structure is as follows:
MaxPlanckSociety (PID1) |---Max-Planck-Institute for Psycholinguistics (PID2) |---Max-Planck Institute for Plasma Physics (PID5) |__Department 1 (PID4) PlasmaphysicsSociety (PID6) |---Max-Planck Institute for Plasma Physics (PID5) |__Department 1 (PID4)
- Outcome: The index on Organization.PID should contain the following values for the PubItem "Title2":
PID2, PID5, PID1, PID6
Even if the Organizations in the creator descriptive metadata are directly related only to PID2 and PID5
Please note: clarification
Title (escidoc.any-title)[edit]
Following elements for each language separately and for all languages at once
publication.dc:title
publication.dcterms:alternative
Topic (escidoc.any-topic)[edit]
Following elements for each language separately and for all languages at once
publication.dc:title
publication.dcterms:alternative
publication.dcterms:tableOfContents
publication.dcterms:abstract
publication.dc:subject
Dates[edit]
escidoc.any-dates
All publication dates, including following elements
publication.dcterms:created
publication.dcterms:modified
publication.dcterms:dateSubmitted
publication.dcterms:dateAccepted
publication.published-online
publication.dcterms:issued
Search index: escidoc.created
- indexed field:
publication.dcterms:created
Search index: escidoc.modified
- indexed field:
publication.dcterms:modified
Search index: escidoc.dateSubmitted
- indexed field:
publication.dcterms:dateSubmitted
Search index: escidoc.dateAccepted
- indexed field:
publication.dcterms:dateAccepted
Search index: escidoc.published-online
- indexed field:
publication.dcterms:published-online
Search index: escidoc.issued
- indexed field:
publication.dcterms:issued
Event (escidoc.any-event)[edit]
Following elements for each language separately and for all languages at once
publication.event.dc:title
publication.event.dcterms:alternative
publication.event.place
Internal Identifier (escidoc.any-identifier)[edit]
Internal identifiers available for the items and its components:
- ID and PID of item, PID of files
External Identifier (escidoc.any-identifier)[edit]
All additional identifiers specified within the metadata record, including
publication.dc:identifier
publication.source.dc:identifier
publication.source.source.dc:identifier
Special requirements for indexing external identifiers[edit]
Identifier type and identifier value should be indexed together so that users are able to find an item with identifier type ISSN and identifier value 123-234 in all of following "search by identifier" criteria:
escidoc.any-identifier="ISSN 123-234"
escidoc.any-identifier="ISSN 12*"
escidoc.any-identifier="ISSN *"
escidoc.any-identifier="ISSN"
escidoc.any-identifier="123-234"
Please note remarks on normalization
Source (escidoc.any-source)[edit]
Following elements for each language separately and for all languages at once
publication.source.dc:title
publication.source.dcterms:alternative
publication.source.source.dc:title
publication.source.source.dcterms:alternative
Components Properties (escidoc.component.<elementname>)[edit]
Additional indexes are required for following component properties
component.properties.content-category
component.properties.visibility
component.properties.file-name
component.properties.pid
Maybe the best option to index all component properties by default.
Future Development[edit]
release/submission comment[edit]
Via the administrative search it should be possible to search for the release/submission comment.
CompleteName Index[edit]
It should be possible to make an exact search for creators like "Kondic, Nicole". With the new framework it will be probably possible to make indexes. Then we should think of a index syntax for persons.
complete name index tokens: "persons.lastname, person.firstname" complete name index tokens: "persons.lastname person.firstname"