Difference between revisions of "ESciDoc Item List"

Revision as of 11:41, 1 October 2007

This page is the result of the eSciDoc workshop held in September 2007. It should serve for collecting ideas, questions and constraints, discovered while evaluating the filtering of items according to a "mini item" information object. Please feel free to use the page

Basic Idea[edit]

Base filter methods on a "mini item" = a specific data stream with minimal metadata. The metadata would be created "content model" specific with creation/update of an item.

The idea of "list metadata" is derived from the necessity to generate Dublic Core Metadata for each Fedora object (and therefore for each eSciDoc object). This generated subset of the solution specific metadata may be used for filtering/searching eSciDoc objects. The entries in the Dublic Core Metadata datastream named "DC" of a Fedora object are automatically written to the resource index. The assumption is that the core metadata entries for one object - that are written to the DC datastream of the object - are sufficient for searching/filtering.

If this leads to the usage of DC metadata as container for search specific properties this approach is of course failed.

Prototype[edit]

Boundaries for prototype by FIZ:

with AA
without paging
list format RSS in RDF

Requirements (delivered by MPDL):

item list should be sortable by:
item list currently include following metadata:
- list hier
transformation rules are available:

Format / Representation[edit]

OpenSearch[edit]

One idea is to use OpenSearch to select items and represent a list of selected items.

OpenSearch defines a format to describe search engines and four search metadata elements ("totalResults", "startIndex", "itemsPerPage" and "Query") to be included in XML response documents.

A search interface is described by an OpenSearch description document as an URL. Suggested kinds of search requests are HTTP GET and HTTP POST requests. A description document contains a request template in this manner. A request template contains parameters that are replaced by specific values before a request is executed. OpenSearch defines some basic search parameters and allows the use of custom parameters. A description document as well as a OpenSearch response may contain detailed descriptions on the capabilities of the search engine (e.g. which custom parameters may be used).

If OpenSearch should be used for eSciDoc - e.g. between PubMan and the ObjectManager - some questions arise:

Where should a description document be located?
Should a search include custom parameters like 'created-by', 'context' etc. and aren't such parameters solution specific?
Is OpenSearch - at all - a solution task? The Framework offers search capabilities with SearchAndBrowse and ObjectManagers filter methods and a solution may describe the search and generate costumized responses in a solution specific way (e.g. to generate a PubMan-Search-Firefox-Plugin).

OpenSearch defines elements to afford autodiscovery of search engines. This implies the usage of generic seach clients. Firefox is able to automatically discover search engines by pertinent elements in websites and generate a search field - including autocompletion if provided by the search interface - in the browsers toolbar. The situation between PubMan - or other solutions - and "the framework" does not fit to such features. The filter methods of the ObjectManager are described as part of the API documentation and used by solution developers. Autodiscovery seams to be something that a solution may offer to its users.

As an example you may have a look at http://solarphysics.livingreviews.org/refdb/search which links to the OpenSearch description here http://solarphysics.livingreviews.org/refdb/search?format=osd Search results look as follows: http://solarphysics.livingreviews.org/refdb/search?journal=lrsp&a=Abart%2C+R.&tM=substring&js=lrsp&s=yearDesc&l=10&_action_search=Search&format=rss

Note: OpenSearch does not lead to a specific result format. To talk about OpenSearch is to talk about search engine discovery and generic query generation etc. Frank 13:37, 1 October 2007 (CEST)

RSS 1.0[edit]

RSS in version 1.0 supports extension with custom-made elements because it is a subset of RDF. The other way around it would be possible to enrich a RDF response from the resource index (triplestore) with RSS 1.0 predicates. RSS 1.0 may be interpreted in a semantic-web context and in a newsfeed context.

See EScidoc_Container_Toc.

Questions & Discussion[edit]

How to version the changes required if a change in the transformation requires a change of all item of a specific content type

Proposed solution can be treated as a "temporary workaround" and a test for having all metadata further placed into proper storage. At present, the "list metadata" are part of the Fedora system datastream (DC metadata) that is "pushed" into the relational database. Note: only non-qualified DC metadata is allowed. --Natasa 12:32, 26 September 2007 (CEST)

To Do: think of "moving" the "list metadata " (or all descriptive metadata) into a RDF storage (this will take substantial rework, but will give also substantial benefits for interoperability, discovery - and of course - performance etc.) --Natasa 12:32, 26 September 2007 (CEST)

The idea to use the DC datastream is because of that the entries are automatically written to the triplestore, isn't it? It would be nice to write all entries from the solution specific metadata to the triplestore but then it must be transfered in a flat well-known structure. The DC-mapping is an approach to do that. -- Frank

The entries from DC datastream are automatically extracted from the DC datastream of a Fedora object and inserted into the Fedora resource index. And yes, that is actually the problem with the Fedora DC Support. Our metadata set is much reacher, so if we only extract DC we have basic (not qualified) DC metadata. But our metadata schemas are not flat. --Natasa 10:36, 28 September 2007 (CEST)

Trying to summarise: It would be nice to use DC metadata for enrich list views because it contains core metadata of a resource and is written to the resource index. Unfortunately not all entries from mpdl-escidoc-metadata can be mapped to DC. An other idea is to write the mpdl-escidoc-metadata to the resource index, how ever we do that.

Question: Is it possible to bring mpdl-escidoc-metadata to a flat structure in order to write the entries to the resource index or is that nearly the same problem as to map it to DC? Frank 13:30, 1 October 2007 (CEST)

To check: Are standarc DC data sufficient to create the information required to display item lists, i.e. are all required data for sorting and displaying available in DC core?
- If not: To change the functional specification or to "disemploy" the standard?
  - or think of something else like keeping the standard but moving to metadata RDF storage? --Natasa 10:36, 28 September 2007 (CEST)

Should a change of the "mini item" data stream
1. create a new logical version?
2. create no new logical version?

To consider for start: In case when metadata presented in the "list-metadata" are not sufficient and a new transformation is needed - this is not (from aspect of content versioning) changing the version of the content item - it just provides a new "enriched view" on the content item. In this case if very necessary: utility should be developed that is modifying the content items (but not creating new versions of them) as it is a redundant information derived from the actual metadata of the item. --Natasa 12:32, 26 September 2007 (CEST)