Talk:Surrogate Items

From MPDLMediaWiki
Jump to navigation Jump to search


The concept of a surrogate item allows the re-use of the existing scientific content in an eSciDoc repository across contexts. This is achieved by adding an implicit relation referring the original item. The relation is expressed by an additional item property holding the reference to the original item resource.

The representation of a surrogate item consists of the components, content-streams and the metadata records which are inherited from the original item and the properties, resources, and relations of the surrogate object itself. Optionally, the surrogate item may be enriched with additional metadata records. Due to issues with the different views on the original item depending on access rights, a surrogate object may only point to a released object (see below for details).

Referencing the Original Item[edit]

The optional property “origin” in the properties-section of the item representation contains a reference to the original item, which can be either a fixed or floating reference. The original item itself should be a regular item, not a surrogate item. A fixed reference contains a version number suffix in the item ID and points to the specified released version of the original item that the surrogate item represents. A fixed reference to a non-released version is not allowed. A floating reference does not contain a version number suffix in the item ID and points to the latest released version that the surrogate item represents. One has to be aware, if a new release of the original item is created, the surrogate item representation changes while using a floating reference. The inherited parts of the original item are explicitly tagged in the surrogate item representation by the attribute “inherited”, set to the value “true”. The XML schema of Item is therefore extended to allow for this additional attribute in the elements md-record, content-stream and component. It has a default value “false”, which avoids the necessity to change existing items in the repository. The representation of regular items is not affected by this schema change. Metadata If a surrogate item has no own metadata records, all metadata records of the original item are inherited and therefore all elements “md-record” in the surrogate item representation contain the attribute “inherited” with the value “true”. Consequently, the implicit DC-metadata record generated from the mandatory metadata record “escidoc” is also inherited from the original item.

If a surrogate item has own metadata records, they appear in the surrogate item representation. As a result, the surrogate item representation may include metadata records from the original item and metadata records from the surrogate item. As metadata records are identified by their name, this behavior may lead to conflicts with identical names in both the original and the surrogate item. The conflicts are automatically resolved by an overlay mechanism. Metadata records of the surrogate item hide the metadata record of the original item. Thus, the user can decide whether to complement the metadata of the original item with own metadata records or to locally replace them. It is always clear whether a metadata record is a local (to the surrogate item) or inherited original metadata record by checking the “inherited” attribute of the element. If a metadata record owned by the surrogate item, which is hiding a metadata record of the original item with the same name, gets deleted, the original metadata record will re-appear, clearly indicated by the “inherited” attribute. This rule also applies to the mandatory metadata record: if a surrogate item has its own mandatory metadata record, this record will hide the original mandatory metadata record and thus appear in the surrogate item representation (without the attribute “inherited”). Accordingly, a DC- metadata record is generated from it according to the XSLT prescribed by a Content Model for the surrogate item (which might be different from the XSLT of the original item!).

Components, Content-streams, Resources, Relations and Properties

All components and content streams of a surrogate item are always inherited from the original item and therefore always have the attribute “inherited” with a value “true” in the surrogate item representation.

A surrogate item provides its own virtual resources, relations and properties, not the ones of the original item. Therefore, the “resources”, “relations” and “properties” elements never have the attribute “inherited”.

Can a surrogate item have also own components?--Natasa 07:31, 31 July 2009 (UTC)

Behavior of Surrogate Items User can execute all CRUD operations and perform state changes on surrogate item resources using the standard ItemHandler API methods, just as with regular items.

All elements of the surrogate item representation tagged with the attribute inherited=“true”and inherited elements “components” and “content-streams” are always ignored on create and update operations. Thus because properties and resources cannot be modified, only metadata records without the attribute “inherited” and content relations will be stored during create and update (content relations never have the attribute “inherited”, as the surrogate item representation doesn’t include the content relations of the original item).

The lifecycles of a surrogate item and of the original item are independent of each other. The version history of a surrogate item therefore refers to its own history, not the one of the original item.

A user cannot create a surrogate item pointing to a withdrawn item. But a representation of an existing surrogate item is not affected, if its original item is withdrawn, unless user privileges to access the original item are changed due to a status change.

Users can filter and search for surrogate items based on the surrogate item representation, which is a combination of surrogate item and the original item.

Impact on behaviour of the method ItemHandler.release()

The business logic of the method ItemHandler.release() checks whether the item it is working on is referenced by a floating reference by any surrogate items. In that case the representations of these surrogate items are updated in the internal cache and the search index.

this is indeed a very critical operation. Imagine one has 20 surrogate items for an original item what would happen. --Natasa 07:31, 31 July 2009 (UTC)
in general i think this kind of denormalization of the data should not be taken lightly. it may be easier to only denormalize in the lucene index.--Robert 08:33, 31 July 2009 (UTC)
would think the same goes also for searching, one may only imagine that there could be 20 surrogate items for an original that need to be re-indexed.--Natasa 09:56, 31 July 2009 (UTC)

Access rights

To access a surrogate item representation a user has to have both: access rights on a surrogate item and read access rights on the referenced version of the original item. If a user has privileges to access a surrogate item, but no privileges to access a referenced version of the original item, it can however access the sub resources “properties”, “relations”, “md-records”, “resources” owned by a surrogate item. An attempt to access the whole surrogate item representation in this case results with an AuthorizationException containing a message with a fail reason and the information about a possibility to access sub resources.

the latest is problematic for processing. Maybe the attempt to access the whole surrogate item representation in this case needs to result in delivery of the surrogate item resources. Inherited metadata records, content streams and components should be represented by reference in the delivered representation. When user accesses these references s/he will get authorization exception for particular subresource of the original item. --Natasa 09:03, 31 July 2009 (UTC)

The method create of the ItemHandler throws an Athorization Exception, if a creator of a surrogate item has no privileges on a referenced version of the original item. If the creator lost the privileges to access the original item in the meantime, an attempt to update the surrogate item via the update method of the ItemHandler throws the AuthorizationException containing a message, which informs about a possibility to update the data owned by the surrogate item via update methods on the sub resources.

--Natasa 09:03, 31 July 2009 (UTC) again, same issue as with the retrieval, it is very problematic:
  • What if a creator has privileges for original item as a whole, but not for the components? Will in this case create method fail?
  • if creator only wants to modify the metadata record of the surrogate item and does not touch the inherited at all, the update operation will fail asking the user to make the very same operation but only on the metadata record. This is a bit heavy and complex to process in fact, as the client must always first retrieve the complete surrogate item, and then check what are the sub-resource privileges by actually trying to perform the operation (this is very same issue also for retrieval). --Natasa 09:03, 31 July 2009 (UTC)

Reason to reference a released version of the original item.

The requirement that the referenced version of the original item has to be released avoids the confusion arising from situations with floating references to the original item. In the case of a floating reference either the latest release or latest version of the original item will be delivered with a surrogate item representation by the business logic depending on the user access rights on the original item. If the creator of a surrogate item has unrestricted read access privileges on the original, the content of the metadata records owned by the surrogate item can be based on the content of the latest version of the origin. A user who has restricted access on the original item will get the surrogate item representation of the latest release of the original item. He can be confused because the own metadata of the surrogate item does not match the inherited content.



• The DB Cache will contain the surrogate item properties and all metadata records the user chose to expose in the surrogate item representation. This allows for filter operations within the context of the surrogate objects with metadata from the original object.

I see this also as disadvantage, as with update of original item many records would be changed. DB cache is relational database (at least for now) and this fact should be used. But in this case the problem is that the DB cache contains only the latest version. Maybe would be an idea to first resolve the db-cache and then approach surrogates? --Natasa 10:21, 31 July 2009 (UTC)

• The Search Service retrieves the surrogate item representation and the content from the original object, thus allowing for mixed queries of both surrogate and original metadata records combined with the original full-text within the context of the surrogate items.


  • automatical refresh of an inherited surrogate item content is problematic in case of floating references from following reasons:
    • the owner of the surrogate item may differ from the owner of the original item (implicitely in this case, the owner of the original item is changing the content of many surrogate items)
  • as a surrogate item is same as any other item, why would one prevent creating "surrogate" of a "surrogate"?
    • inheritance problem
  • maybe indeed the concept of a surrogate items has to be approached from more lighter aspect
    • mixing of metadata records, cache and indices is a bit tricky part
    • above especially, due to the fact that authorization is dynamically changed from the original item, which makes the surrogate item indeed not a standard item, but something different
  • if surrogate items as derived from original items contain only references to sub-resources of original items (or to the original item itself) one could still provide additional methods such as:
      • create surrogate item (user could specify which parts to copy to the target surrogate item or none)
      • refresh surrogate item(to another version or to the latest version of origin item)
      • de-reference (with or without copying during the dereferencing process)
      • information in the surrogate item representation on the latest modification date, referenced release and latest release of original item
    • additional method could try to attempt to retrieve the "merged" representation of a surrogate item, and in this case the retrieval restriction described above could be useful, but the client would have the choice which method to start with (retrieveItem, or e.g. retrieveSurrogateItem)
  • maybe we should think of this as indeed separate service and not as part of core? --Natasa 10:42, 31 July 2009 (UTC)