ESciDoc Administrative Search

From MPDLMediaWiki
Revision as of 07:02, 26 March 2010 by Mih (talk | contribs)
Jump to navigation Jump to search



Input FIZ[edit]

What are the requirements for the administrative Search, using Lucene?

Currently we use the filters for admin-search, accessing the db-cache that contains all fedora-objects.

  • The db-cache provides search-capabilities for all properties and all metadata of an object.
  • Only the last version of an object is searchable.
  • It is possible to sort by all searchable fields
  • Additionally it is possible to apply the special filter-criteria "user" and "role" which filters

the list of retrieved objects with access-rights of the given user with the given role.

  • If no "user" and "role" filter is applied, the list of retrieved objects is filterd

with the access-rights of the current user with all of his granted roles.

  • The new administrative search will use Lucene as underlying search-database.
  • The Lucene administrative indexes will contain additional fields for access-rights filtering

Access-Rights are filtered during search by expanding the search-query with the access-rights filter.

  • Are the requirements stated above also the requirements for the administrative search using a lucene-index?
  • Are there additional requirements?
    • search in fulltext?
    • search older versions?
    • custom search-result schemas?
  • Index design (one lucene-index containing all object-types or one lucene-index per object-type)
  • Indexing Performance (requirement: synchronous indexing)
    • reindexing of complete trees when members are added/removed from container
  • Proposal:
    • Only use lucene administrative index for fedora-objects (items, containers, contexts, org-units, content-relations, content-models).
    • Leave old filter methods for objects in internal database (user-accounts, user-groups, grants, roles, statistics).

Input MPDL[edit]

On functionalities:

  • DB Cache allows filtering by exact value.
  • Administrative search should allow search as regular search (also supported wildcards)
  • Additional requirements
    • fulltext searching:
      • administrative search shall also allow searching in fulltexts depending on user privileges
        • if that would be resolved with administrative search, maybe is good to understand implications for extension of the normal search with respect to privileges on fulltexts
    • custom search-results schemas: not clear
  • searching older versions
    • so far we did not have any special requirement to search for older versions of a resource
    • proposal: stick with this rule, however, we need to actually be able to search through both latest versions and latest releases with admin search
  • index design: not certain on implications - first impressions:
    • items/containers - single index;
    • OUS, contexts, content-models-> separate indexes;
    • content-relations: start with separate index and see - we have not solution development at the moment based on content relations .. we need to check what are real scenarios (e.g. get me all resources tagged created after 2010 and tagged as "my publications") ... this kind of query would require more complex indexing strategy ..
  • reindexing of complete trees when members are added/removed from container ... not clear why .. maybe some more explanation in here
    • related: see collaborator-role descriptions at JIRA

Outcome of initial Discussion[edit]

  • Administrative search will be realized with Lucene-Indexes containing additional fields for permission-filtering.
    • Additional fields are:
      • permissions-filter.objecttype
      • permissions-filter.context-id
      • permissions-filter.PID
      • permissions-filter.parent-id
      • permissions-filter.component-id
      • permissions-filter.created-by
      • permissions-filter.version-status
      • permissions-filter.public-status
  • The permission-filtering is done by expanding the query with a subquery that restricts the search-result to the objects the current user may see.
  • subquery is generated at search-time
    • AA-Service is asked to generate subquery dependent on the roles the current user has granted
  • Besides the fields needed for permission-filtering, the admin-indexes will contain fields for each property- and each metadata-element and (in case of item) for fulltexts.
  • When searching in fulltexts, still whole item-XML is returned as search result and not content of fulltexts. Therefore it is not necessary that the permission-filter filters for rights to see component-content.
  • Search will behave just like the normal search (wildcards etc)
  • Whenever an object changes (create/update), it is updated in the admin index
  • Admin index will always contain latest version of an object and additionally (if released) the latest release.
  • Dependent on the rights of the user, search result either contains latest version or latest release of the object.
  • The admin indexes are written asynchronously
    • Few seconds of delay between end of update-operation and availability of changed object in the index.
  • search-result returns full eSciDoc-XML-Representation
  • we will have 5 different admin-indexes, containing the following objects:
    • items/containers
    • organizational-units
    • contexts
    • content-models
    • content-relations
  • Allow additional filter "role"
    • in search-query? as additional parameter? not clear how this fits in an srw-request
    • subquery is generated only for the given role and user
    • Example: ???? (Someone please provide an example)
  • Sorting?


  • For some roles, hierarchies of objects have to get resolved
    • eg Collaborator with scope on container may see all objects that are in the child-hierarchy of the container.
    • permission-filter fields of each object below the container must contain the parent-hierarchy-tree.
    • whenever a member of a container is added/removed, all objects of the child-hierarchy of this member have to get reindexed.
    • --> Do some more evaluation on parallel searching (one index containing the property/metadata fields, another index containing the permission-filter fields) as this could prevent reindexing of properties/metadata and fulltext


  • search all statusses