MPDL NIMS Developer Meeting 2009-01-26

MPDL,ESciDoc,NIMS  Restricted Access to MPDL

Preliminary agenda proposal for 26th and 27th of January (visit of Mikiko and Masao@MPDL)

Monday, 26th of January

 * 09:30-10:30: Welcome, Updates
 * Malte, Mikiko,
 * update agenda for the next 2 days
 * Planning next visit Japan (March 6-8)

Session 1

 * 10:30-12:00: Working session "User profile"
 * Melanie, Nicole, Mikiko, Masao, Natasa, Rupert, Friederike et al.
 * Researcher profile - Sample input for Researcher portofolio of NIMS 6 scientists
 * Wordpress - Building procedures for Blogging or other facilities using PubMan
 * Statistics/Visualisation - Discussion on implementation for further visualization and service description (e.g. filtering based on timeline, refererr and IP address)

Outcome session 1

 * By end of February 2009
 * PubMan: MPDL needs to provide the following functionality:
 * PubMan 4.1 delivered as a minimum at qa server
 * PubMan links to Cone researcher page available for Dr.Todoroki and 5-6 other scientists
 * Cone data available for Dr. Todoroki and 5-6 other scientists
 * Cone researcher page implemented and available for the upper mentioned people
 * WordPress blog instance: Setting up of one instance for Dr.Todoroki
 * The WordPress blog will be linked from the Cone researcher page (based on the personal URL data of the Person cone entry)


 * Cone - Persons
 * Researcher portfolio shall have language tags. For NIMS researcher pages can be created in English, Japanese and Chinese language
 * More than single personal URL will be available
 * Modification on the data model: date-from and date-to shall be related to the position, therefore Current position shall not be present but shall be moved to the affiliations data block
 * Personal URLs shall have description as a free text
 * Need to maintain multiple languages for persons data
 * Need to maintain properties of persons entries (AB: discuss and check the possibility of moving persons to eSciDoc items, with RDF metadata, probably another reason to make them escidoc item?), NIMS wants to have a field in the researcher portfolio which shows who last modified the page and when (future versioning)
 * Probably not possible by end of February (to check in addition)


 * Statistics-Visualization
 * IP Tracking:
 * NIMS would like to track complete IPs
 * German law does not allow complete IPs, but hidden last IP digits
 * MPDL will configure this as a property ( for the running PubMan instance, or Context? - to be checked)
 * New statistics
 * New: Accumulate statistics per person identifier
 * Examples: Number of item retrievals of all items of this person, number of file downloads of all items of this person, also other statistics possible to be accumulated for a single person ID
 * New: we need to add session-Id to the statistical records (so that statistics, overview of user behaviour within a session can be derived)
 * IP-referrer data
 * MPDL can track these data, but at the moment can not provide detailed analysis of the URLs
 * First step: NIMS will get the access.log files and provide some visualization on their own
 * Please make sure to only send anonymized access log files.--Robert 11:55, 29 January 2009 (UTC)
 * Hm, that would be tricky as we have to filter on our own (what comes from NIMS, what comes from us), NIMS wanted non-anonimyzed data to make own visualization.--Natasa 12:58, 29 January 2009 (UTC)
 * it's not like we can decide which IP addresses to keep and which to anonymize. i guess the best would be if NIMS got it's own pubman install running.--Robert 13:08, 29 January 2009 (UTC)

Session2

 * 13:00-14:30: Working session "Work flow"
 * Masao, Mikiko, Nicole, Melanie, Natasa, Rupert et al.
 * Work flow of NIMS case: Depositer:Easy/Full submission->Save (pending)->Release (released)
 * GUI aspects in Release 4.0: NIMS comments on ver4, more guide text for depositors?
 * 14:30-15:30 VideoConference (Mikiko, Mehlhorn, Malte)

Outcome session 2

 * After March 2009
 * Current PubMan workflow is too strict
 * need to introduce another Modification workflow, next to the Simple publication workflow
 * Super-Depositor: who is able to created, edit, release and modify own items, without intermediate QA action
 * Qa-Action: in this new workflow is optional, and is done by Librarian by informal request from evtl Super-Depositors (no need to have special "submitting" workflow in this case)
 * Provide a functionality to restore a previous version state (into last item version)
 * This is also a requirement for other solutions (VIRR), therefore probably to provide it on common service level
 * Statistics: Better visualization of other statistics, re-checking on piwik adoption
 * GUI Issues
 * PubMan messages (validation): Independently on the number of messages, they should always be displayed in a message box
 * File upload: not clear, all properties shall be shown immediately. They will be greyed-out and input will be forbidden as long as file is not uploaded. Check also locator behavior with this respect.
 * Enum lists: shall have one fixed order (genres, review-type, authors role etc.) as at present we can not establish sorting rules (for different languages)
 * Pull-down list shall contain the Japanese language (in original language name/script)
 * extra skin for NIMS, button-like visualization of the headers

Tuesday, 27th of jan
Michael Hoppe @MPDL

Meetings in Room 148

Session 1

 * 10:00-12:30: Working session "Access control"
 * Nicole, Melanie, Natasa, Mikiko, Masao, Michael Hoppe et al.
 * organizational control is essential, such as NIMS level is high priority

Outcome Session 1

 * February 2009
 * Institutional access:MPDL proposes a solution with privileged viewer
 * During discussion we have agreed that NIMS will have 2 collections, one for very private (non shared) private fulltexts, and one where private fulltext can be shared with users who have privileged view
 * Multilingual names of collection
 * Collection names will be created in English and in Japanese (in single metadata field), to allow in current implementation visibility of both names for NIMS audience
 * Collection description will be created in separate fields in the collection description. All descriptions should be available in the collection description page
 * Switch the label "closed" to "inactive"


 * From March 2009
 * The Institutional access, group handler and collaborator roles are implemented by FIZ, but not fully tested and there is not yet known release date from FIZ side. This would mean that MPDL can not provide this functionality before March2009
 * GUI modification: GUI team must check on the scenarios for assigning access levels to files
 * Affiliation history
 * Check better GUI issues for affiliation history
 * Copyright metadata (enable links to license and license icons in the display for files, images etc.)

Session 2

 * 13:30 - 15:00: Working session "Searching, japanese characters et al."
 * Masao, Natasa, Michael Hoppe et al.
 * Searching, indexing japanese characters
 * Other topics where Michael Hoppe might contribute?

Outcome Session 2

 * Discussed the problems with searching
 * The Japanese search is not yet implemented and not properly tested, as it was not clear what to use for indexing:
 * main goal to have Java tool that can be easily integrated
 * PdfBox, iText, AdobeViewerBean, JPedal did not work
 * XPDF was discarded, it is now considered again
 * problem: it is C-code and needs recompilation every time with new configuration
 * considered to try it as a command line, to change the configuration of the indexer to invoke it via command line
 * MPDL proposed to try using the Java Native Interface to directly invoke C- code from java classes, thus avoiding extra command line invocation
 * FIZ also needs to test the Japanese indexing with XPDF workaround for bigger data set to check on the performance
 * XPDF extraction will be used only if extracting text from PDF files are failed. Otherwise, only the standard extraction module (PDFBox) will be used.
 * Other issues to be precised with indexing (NIMS will send Fulltext examples such as):
 * PDFs that contain images (i.e. OCR-ed text)
 * PDFs that contain images and layers of text
 * PDFs that contain mathematical formulas
 * At present, search component can not distinguish between these different PDF contents, as they are same Mime type. Need to run tests to check what is possible (not before March 2009)
 * Cover page: assign cover page for items, that will contain some information and will be displayed in item lists and item views as thumbnail
 * NIMS to send examples, MPDL can introduce it, but not before March 2009
 * Support for indexing of other fulltext formats is needed such as: .PS format, .PPT


 * Applying JapaneseAnalyzer for tokenizing Japanese fulltext
 * FIZ will send the definition of Japanese characters to NIMS. NIMS will crosscheck it with FIZ.
 * Searching results and sorting criteria:
 * Sorting by relevance (score) plus additional sorting-criteria - (FIZ)
 * Search results shall deliver relevance (score) (FIZ)

Session 3

 * 15:00-16:00: Working session "Re-use FACES"
 * Mikiko, Masao, Kristina, Ulla, Natasa, Bastien, Rupert, Friederike et al.)
 * metadata schema for experimental files, such as STM for demonstration
 * Dissussion on metadata spec for pictures provided by a NIMS researcher.
 * Discussion for concrete plans of launching such application. (First target will be on Diamond pictures)

Outcome Session 3

 * see outcome at Faces_Diamonds

Additional agenda (Masao)
(high priority; by the end of Febrary)
 * Introducing Japanese resource bundles (with dev)
 * => agreed to use trunk directly.Masao 16:18, 28 January 2009 (UTC)
 * Schedule for Wordpress & Researcher portofolio
 * tbd.
 * Obtaining raw access_log file for demonstrating extention of Statistics service
 * => via scp (next week) / getting files directly.Masao 16:18, 28 January 2009 (UTC)
 * Data i18n on OrgUnit&Context (with SvM?)
 * Examples of PDF files that contain images and overlayed text in Japanese (NIMS)

(low priority; maybe after April)
 * Setting up a development environment at NIMS (with dev)
 * dev-server at NIMS
 * Shibboleth/LDAP integration
 * testing how-to
 * New fetching function with CrossRef (with dev?)
 * JuNii2 mapping (with SvM?)


 * Plan for next meeting (Video conference)