ESciDoc Developer Workshop 2007-12-19

Date: December 19-20, 2007 Location: FIZ, Karlsruhe Participants (MPDL): Michael, Robert, Wilhelm, Natasa Start time: 12.00 19.12.2007 End time: 17.00 20.12.2007

Update of last workshop results

 * Update of last workshop, see also ESciDoc Workshop 2007-09-24, ESciDoc Developer Workshop 2007-08-07
 * eSciDoc Brainstorming results
 * TOC in Container: format, restrictions, occurrence (see also ESciDoc_Container_Toc)
 * Authority Files, see also (Control_of_Named_Entities, Talk:Service_for_Control_of_Named_Entities)
 * Workflow Manager, see also ESciDoc_Workflows
 * Item lists - prototype, results, see also ESciDoc_Item_List
 * Item lists prototype, mandatory metadata item properties to be in the minimal item list for publication items are defined at ESciDoc_Item_List
 * Item lists - sorting order, see requirements for Search Topic below.
 * Open search - check the possibility for enabling open search as described in ESciDoc_Item_List

XML Schemas
Consolidation of schemas for framework objects: the common elements in properties of framework resources should really be common, i.e. in the same name space across resources; no more item:created-by, ou:created-by, ... but only escidoc:created-by. I also have a feeling that if this is not happening soon, it never will. -Robert 11:41, 14 November 2007 (CET) outcome
 * create a name space and schema for common properties element of all resources --Natasa 13:07, 13 December 2007 (CET)
 * 1) MPDL agrees with this change.
 * 2) will be done with V1.0 eSciDoc Core
 * 3) MPDL wants schedule, should be done in January Update: according FIZ update will be delivered 15.02.2008--Natasa 13:25, 13 February 2008 (CET)
 * 4) communicated via "Verteilerliste"

Release procedures and Data migration
outcome
 * Release procedures (releases, release tests)
 * decide on strategy for data migration (when XSDs are changed)
 * migrate each object on retrieve
 * migrate all objects of the repository at the time the new release is installed via XSLT
 * create a transformation service for the migration of objects (on demand or the compelte repository)
 * 1) needed
 * 2) MPDL Pilot repository some 1000
 * 3) no version change
 * 4) MPDL: don't use "migrate each object on retrieve"
 * 5) there might be a need for all three methods
 * 6) start with "migrate all objects in one step"

Authorization
outcome
 * Authorization, see also ESciDoc_Authorization_Authentication and related pages
 * concept how to map solution actions to core services actions
 * how to enable trust if the solution already authorized the action of a user
 * how to easily create new policies/modify existing policies
 * 1) FIZ will start with a test, using triple store for AA
 * 2) which consequences are for "defining Rules"

No input provided until 13.02.2008--Natasa 13:28, 13 February 2008 (CET)

OrganizationalUnits handler
Note: clarifications moved to Talk:PubMan_Func_Spec_Organizational_Unit_Management to avoid overloading of the current agenda page and as proposed by Harald. --Natasa 16:15, 10 December 2007 (CET) See also: PubMan_Organizational_Unit_Management, PubMan_Func_Spec_Organizational_Unit_Management, Talk:PubMan_Collections_and_Organizational_Units, Requirements for R3

outcome
 * 1) Changes in part, may be Metadata, will be free to the application
 * 2) there will be a default schema
 * 3) mapping to properties
 * 4) FIZ will set up a proposal (see First proposal agreed with Mfr/Nbu/FSch--Natasa 13:30, 13 February 2008 (CET))

Context handler
see description on page Context handler talks
 * additional methods, improvement of existing methods (admin descriptors, possibility to add new context types e.g. CitationStyles, Validation)--Natasa 17:08, 16 November 2007 (CET)Clarified partly: Context types are not limited.
 * revisit Admin descriptor (what is relevant and what not in the current admin descriptor structure)
 * member lists, see also Talk:ESciDoc_Services_ContextHandler

outcome
 * 1) AdminDescriptor
 * 2) not mandatory
 * 3) AdminDescriptor is part of the object
 * 4) multiple AdminDescriptors should be possible
 * 5) Content will be handled by the application only not by the core-service
 * 6) no mapping from AdminDescriptor to properties are needed right now
 * 7) stick with   

PID
Proposal to move the topic to the next eSciDoc workshop Jan/Feb 2008--Natasa 10:55, 7 December 2007 (CET)
 * In my opinion PID handling is very important. Especially because we have a (inconsistent) implementation. In consequence i would propose to remove the PID assignment methods till we have the implementation of an agreed concept. Frank 12:09, 7 December 2007 (CET)


 * finalize concept for PID-Impelementation in eSciDoc
 * There is a Talk Talk:PubMan PID related to that topic but no concept.
 * the concept already exists for PID and is agreed between us previously. This concept paper was discussed before. On the Talk page there is also a proposal how to deal with PIDs so please take a look at the discussion about Issue 316 and Object PIDs and Version PIDs

outcome decided objectPID VersionPID
 * 1) no released items without an objectPID
 * 2) release method will check if there is an objectPID
 * 3) create and update allows to assign an objectPID before release
 * 4) release will assign an objectPID, if no one was assigned before, a URL has to be delivered (will be implemented later)
 * 5) assign ObjectPID any time before release
 * 6) the user who can change the object is allowed to update the objectPID before release
 * 7) is there a need to tag if the PID was assigned by the assign-method or not?
 * 1) assigne a VersionPID at any time after an objectPID is assigned before the release
 * 2) the release will check if there is a versionPID for that version, if not the release fails.

There are still open questions to VersionPIDs:
 * How versionPIDs are assigned?
 * PID for component (binary file) is needed (for PubMan)
 * component-PID without versioning
 * clarify if HANDLE can handle subresources?
 * VersionPIDs are only assigned for released versions
 * VersionPID will be assigned by eSciDoc-Core in release-method of a version ??
 * how the URL for a VersionPID will be set?

Steffen will set up a document

Statistics

 * Status of Statistics
 * possible to gather the current statistics also with additional info on logged-in (anonymous) users?
 * no need for concept. http://www.escidoc-project.de/issueManagement/show_bug.cgi?id=347 in Bugzilla is fixed and will not be reopened
 * the requirement is to "rework" the definitions and the reports to also provide statistics for all users and only for registered (i.e. logged-in users)

outcome
 * 1) only one requirement: user-Id is needed, (no additional data of the user),
 * 2) set up a new report definition (Michael) Issue 347 comment6, Nr1

Searching service

 * mixture of language specific metadata indexes into a one search database
 * this requirement needs to be discussed together. The main issue in here is that we do have the following examples for searching:
 * search all items where title in german is "wissenschaft" and abstract in english contains "science information" - the results should be found with proper stemming options - that would mean that the exact "phrase" search through current escidoc_all database will not give back a correct set of results.

outcome
 * 1) introduce Fuzzy-search in language-combined index, do some stopword removal


 * sorting order (p and P are sorted within each other -> case insensitive, german umlaute seem to be handled, as if they where ue, oe, ae, characters like [ are listed before A/a.).
 * this requirement simply states that there is a case insensitive sorting order and that we have to treat german umlaute in the following manner when indexing for sorting (ä/Ä should be treated as ae, ö/Ö should be treated as oe etc.)

outcome
 * 1) MPDL will support FIZ with the rules (set-up and put in colab, see also http://colab.mpdl.mpg.de/mediawiki/images/b/bf/PubItemVOComparator.java --Natasa 13:41, 13 February 2008 (CET))


 * 1) MPDL: is fine with that
 * FIZ: result should be in the same format as used in lists


 * search results - how to get items and containers within same search result (after better checking the schemas this is possible as complete item.xsd or container.xsd are in the search result --Natasa 15:12, 17 December 2007 (CET))


 * search indexes development
 * administrative searches (should index properties of an item, should index also non-released items)
 * end user searches (should not index properties of an item, should index released items only)
 * special requirements for identifiers (see Indexing requirements for "search-by-identifier" search, Indexing requirements for identifiers in "any-field" search)
 * added after the workshop (--Natasa 16:46, 23 January 2008 (CET)): Indexes for components, see also Bugzilla issue 424

outcome
 * 1) not all metadata needs to be indexed (see Indexing requirements for identifiers in "any-field" search)
 * 2) search for "ISSN 23424-2342-3" should be possible (any-identifier index)
 * 3) MPDL and FIZ will re-think this issue
 * 4) more fexible would be: use specific metatdata-search-fields in the query instead of creating specific index-fields for combinations of metadata-fields,
 * 5) sorting of combined fields is needed

Content Model

 * Content Models, in progress (probably not so successfull attempt to make it clearer, lacking diagrams for better readability and extra revision of terminology --Natasa 18:33, 13 December 2007 (CET))

not discussed
 * 1) some requirements are needed soon

Item handler
outcome
 * extend filters to "last-modified-since", "context", "related-to" (not only with id of the item, but also with a set of relation types)
 * 1) FIZ will check if filters to "last-modified-since" can be handled by tripleStore
 * 2) filter "related-to" will have to be clarified
 * 3) filter "context" can be added
 * 4) paging is important
 * 5) sorting is important, more than one sort field should be possible
 * 6) MPDL will check if the requirements can be reduced, since the existing requirements can't be fullfilled with the "short-list" as done today


 * separation Filter/Search - Frank
 * to clarify this topic better: to check the possibility that current filter methods are actually available via standard search interface i.e. the possibility to have separate indexes for all items to enable searching by item properties (or item-lists), having limit/offset and order-by clause--Natasa 18:44, 13 December 2007 (CET)

There should be a description for each new/changed method, its filters and their meanings see page Item Handler

outcome
 * 1) FIZ re-think "filtered-Lists" and "Search" again to find a "better" solution
 * 2) see PubItemVOComparator.java for the metadata comparator in PubMan.

Container handler

 * define usage of Admin descriptor in Container, see also Admin descriptor of Context
 * container member list enrichment, see also Talk:ESciDoc_Services_ContextHandler
 * List of members?

outcome
 * 1) no AdminDescriptor needed any more, will be removed
 * 2) replace in retrieve of a container or context the existing "member-list" with the "short-list" check with Talk above

New service Ingestion
outcome
 * posting of named graphs of objects, pushing/pulling functionality?
 * see initial requirements and use case specification at Upload file in structured format
 * 1) eSciDoc-Core will provide a bulg-ingest service of objects in eSciDoc format
 * 2) FIZ will describe this new service
 * 3) the service needs to be configurable for each ingest
 * 4) MPDL will deliver additional information to the configure task

not discussed
 * for more advanced ingestion requirements that will come up in future see Named graph posting example

Relations

 * Providing content relations together with the item or better standalone and use addContentRelations, removeContentRelations methods.
 * create, retrieve, modify, update relation objects, register new relation types
 * see concept on page Content relations

outcome
 * 1) the existing relation handling is tied to the Fedora way to handle relations, so we should look for a other solution (not related to Fedora)

or
 * 1) MPDL will check the existing relation-conzept and update it
 * 2) FIZ will check if and how the new conzept can be realized

or
 * 1) FIZ checks the first relation implementation if it can be enriched

Service repository

 * Service repository (for all, not only for core services)
 * Service repository of eSciDoc
 * Service repositories in general

eSciDoc Infrastructure Road map

 * Release 1.0
 * Release 1.1