ESciDoc Developer Workshop 2008-10-14 15

ESciDoc

Date: 14-15.10.2008 Start time: 12:00 14.10.2008

Location: München (eSciDoc Team meeting)

Participants MPDL:

Participants FIZ: Matthias Razum, Rozita Fridman, Frank Schwichtenberg, Steffen Wagner, André Schenk, Michael Hoppe, Harald Kappus

=Agenda=

Last Video conference outcome

 * available at ESciDoc Developer Workshop 2008-10-07

(pre)Release 1.0 critical issues (14.10)

 * thread-safe code
 * component-level visibility and authorization
 * stability of pre-release 1.0 -> core works fine 1-2 days, afterwards it is slowing down -> work can be continued after restart

Outcome

 * thread-safe code is resolved, needs to be tested will be available in the next development build
 * component-level visibility - not possible to have it for release 1.0.
 * it is only possible to have the group handler and not integrated it into the cache (due to vacations, not possible before january 09)
 * for MPDL it makes no sense to have the group handler without component level visibility
 * stability - something newly communicated from MPDL. MPDL to montitor better and make some more tests and see more in details conditions in which the system fails. FIZ works on improvement of garbage collection. MPDL should put this issue as a ticket in Bugzilla as long as it happens again.

Control of named entities/Authority records
MPDL needs to establish authority records for:


 * persons
 * subjects
 * journal names
 * languages

MPDL develops separate service for these. At present journal names and languages are implemented. Upcoming subjects, persons.

Export service extension

 * export of metadata
 * export of components + license (defined on ctx/ctr level)

Containers and their members(14.10)

 * Discussion on types of containers and types of membership
 * Is the solution with surrogate items straight forward?
 * Would it be good to relate this to the CModel-ling discussion
 * Adding members
 * E.g., should it be possible to create a Container with members?

Outcome

 * Surrogate items will be used to assign them to a container when in different context
 * to use as surrogate items we need some more utility methods such as create surrogate item (for a given item/container) etc. (when we better precise the requirements we will know if these methods should actually be core service methods or to be developed at MPDL Intermediate service level)
 * the role of Album editor is missing
 * MPDL will have to preise better this role and FIZ will implement it
 * TODO in future: it will be very nice to have simple API for creation of roles (i.e. associate actions with roles and conditions) and granting the role for a context.l
 * FIZ agrees had already started some planning on this topic as there are also other requirements
 * The whole XACML again seems to be complex and not usable for the complete team
 * FIZ has to check how to simplify (if possible) but not loose the functionality so far and e.g. component level access rights definition
 * For solution of FACES problems:
 * albums will be created as containers
 * users will not add items as members of the container, but as relations of this container to items (which are in another context). This will resolve the context issue and will avoid having surrogate items, but will leave only with references. In this case, all these items will be related to the album, but only via content relations and not via structural relations.
 * MPDL will make a test and see if this is acceptable for FACES solution
 * TODO FIZ, MPDL: chech how search for "selcted criteria items as members of this albums" can be performed

CModel and CModel handler

 * Agreement on the first sketch of CModel components (what it should define, how it should be defined, etc.)

Added Colab Page with reworked proposal Content Model Object. Frank 13:56, 13 October 2008 (UTC)

Outcome

 * Discussion will be postponed again, but there is a workshop scheduled in Karlsruhe to resolve this issue . The workshop date will be 24-25.11.08 (MPDL: Natasa, Willy, Michael, start from 24.11, 12.00 )

Batch operations and optimistic locking strategy

 * does it makes sense to give list of pairs (object-ids, object last-modification-dates) when working with batch operations?
 * if we provide list of object-ids and last-modification-date then the core logic should be changed to check not for equality of last-modification-date but for object-last-modification-date less-then-or-equal from provided date parameter.
 * how would the solution determine a candidate for this date parameter, if not by scanning the last modification dates of all resources - in which case it could just as well send them to the framework? i would keep the batch operations api as close as possible to simply chaining single operations.Robert 14:20, 7 October 2008 (UTC)
 * chaining of single operations requires as well the change in the logic of validation of last modification date as described above --Natasa 10:01, 9 October 2008 (UTC)
 * i think this change would mean departing from the idea that multiple solutions may manipulate the same items. the exact timestamp is actually some sort of password. you can't just give a date you come up with - like "now" - and be sure your operation will succeed. Robert 10:38, 9 October 2008 (UTC)
 * locking of items before batch operations?

Outcome

 * not discussed

Support for Japanese characters(14.10)

 * full text searches
 * searches
 * mixture of Japanese and English language in same object

Outcome

 * for FT indexing there are two options: have the fonts used embedded into the PDF (not probable) or include more fonts into the text extractor
 * The problem with the metadata in different languages (english, japanese) in a same metadata record is communicated and FIZ understands the problem. MPDL points that it is a very important feature for February 2009
 * FIZ will check if there is option to have fuzzy search with Japanese (will check Lucene support for it).

Relation Handler

 * see proposal on Content relations and handler interface description proposal
 * compare to current implementation
 * "tagging" of items

Outcome

 * for simplicity we will first start using Item Handler to create new objects which are related together. The relation objects will be Tags and or Relations, but there is not going to be a separate Relation handler.
 * Definition of relation ontologies was still and issue and not completely clarified. It still needs to be checked again with the content model discussion.

TOC Handler(14.10)

 * revisiting
 * types of structural relations

Outcome

 * MPDL was stating the following requirements:
 * the TOC object is a specialized item, but it should not be retrievable via the item handler.
 * the TOC object must be related to a container already during the creation (as the TOC object can not exist without a container and we would not risk to have "floating" objects with TOC content model).
 * TOC relations to a container should distinguish from the standard member-of-container structural relation, because a TOC object can itself be a member of another container (not of the container that it describes).
 * probably this would require change of the member-list.xsd schema
 * FIZ understood the requirements and they can not be delivered before 1.0.
 * FIZ and MPDL will work on concrete definition of the interfaces and the logic for the TOC. It has to be in sync with the CModel concept definition, until end of the year.
 * FIZ questioned if TOC Handler should simply be separate handler or we need method extensions of the Container handler. For MPDL is fine whatever the solution is and as long as above requirements are implemented
 * FIZ argued that the TOC object can be separate with no relation to an item and this was the reason to have TOC Handler
 * MPDL does not see in this manner, as TOC object if no related to a container simply can be an Item (and can be named TOC, but is just naming in this case).
 * At the moment MPDL can use the current implementation of the TOC handler, but not as a permanent solution

PubMan questions

 * component-pid seems to fail
 * wrong last-modification-date with assignComponentPid


 * JibX Transformation fails on "foreign" objects


 * released item just via search?


 * list of mime-types
 * by validation-service and infrastructure


 * logout


 * update before submit?


 * user creates item for/in other OU


 * submit comment action
 * just hard to find submit button again

Outcome

 * Assignment of component pid seems to use wrong last modieifcation date. MPDL will check this issue.
 * JIBX Transformation: should report an error when transformation for lists is not defined
 * FIZ Proposed usage of: Dom4J +JIbX
 * PubMan Workspaces should also provide possibility to filter for "all" in pull-down status list (NOTE: was not possible because of not proper filter solution, now it is time to change it at MPDL site).
 * list of mime-types allowed by the core service and by PubMan should be synchronized (MPDL will check what is possible, both core and validation service can use same allowed-formats-config file in fact).
 * logout from PubMan works properly in the last version, previously we had problems with this function from the core
 * update before submit - is only possible from the edit form. Therefore pubMan assumes if users started "submit" from edit form that she had changed the item, therefore saves the changes first and then submits.
 * action menus on submit (accept, withdraw etc.) forms are not user-friendly (MPDL will check already for R4)

Migration of data between stable releases

 * discuss FIZ approach Migration of Escidoc Data (FOXMLs)

Outcome
FIZ explained the migration concept for data between stable releases. If we migrate the data from non-stable (i.e. dev) release to stable release that may cause problems for the data migration in the dev-release.

Group handler

 * status and issues

Outcome

 * Group handler may be delivered with 1.0 but will not be integrated fully with AA
 * AA support for group-based access level can not be delivered before approx January 09
 * both teams decided that Group handler as single focus makes no sense before both aspects are ready and implemented
 * MPDL will be able to use/test only after it's delivery (Jan09)

Miscellaneous
Outcomes writen for each topic below
 * White space removal in XML representation
 * MPDL sees no issues - fine - reason for proposal is saving about 40% memory consumption
 * unify SOAP/REST representation
 * Both teams agreed it is better to do it already with release 1.0
 * (Input-update 28.10.2008 - will be delivered not with 1.0 but with 1.1)
 * eSciDoc metadata records delivery restriction in retrieval of items, containers, components
 * The requirement to restrict delivery of the metadata records to "escidoc" labeled metadata records was highly questioned why?
 * mostly critical for big item lists (traffic load) - but not essentially critical
 * MPDL if necessary will have to argument
 * Instant eSciDoc
 * Needs to be approached, but not before end of the year
 * Digilib
 * improved integration
 * MPDL has to test
 * can statistics be generated from data after they are defined (e.g. various retrieval statisctics)?
 * statistics can be generated from data, but the raw data are deleted after 6 months
 * need to find solution for raw data longevity
 * JHove in item handler
 * still highly questioned if it is the right solution to be at the item handler level
 * Event logs (Updates, creation with comments)
 * see also PubMan Logging,  View item event log
 * was questioned and discarded for the time being as FIZ see no reason to enable end user to provide comments during updates

Interoperability(14.10)

 * enabling of other formats for searches/item lists
 * RSS
 * JSON
 * OAI-PMH


 * see also PubMan Web syndication feeds

Ingestion of data

 * METS ingestion
 * EndNote ingestion
 * Duplicate checking during ingestion

Workplan until end 2008/begin 2009

 * the goals of the next major Infrastructure release