Talk:EDoc to PubMan migration/MPIPL

From MPDLMediaWiki
Jump to navigation Jump to search

Answered Questions[edit]

  • eDoc users won't be migrated to PubMan
  • Karin will be the owner of the migrated eDoc items
  • mapping of eDoc Collections to PubMan Org. Units see: EDoc_to_PubMan_migration/MPIPL#Migration_procedure
  • "special usage" of fields have been checked and have been taken into account
  • only released eDoc records will be moved to PubMan
  • duplicate problem was solved by MPIPL, they deleted all duplicate records on eDoc

Questions regarding OU for MPIPL[edit]

We we are doubting we made the right decision in february regarding the org units.

Background Nicole presented the ppt with 3 options:

  • Projects and Departments listed as separate org units without any relations between them
  • Departments as org units, Project information in metadata
  • Projects as subcategories of departments incorporated in one org unit

We choose for Option 3!

Advances as we saw then, were:

  • predefined structure (controlled list)
  • relations between departments and projects are visible

BUT, real data don't fit into these rigid rules.

It's already difficult to make reliable current structure of projects belonging to groups, let alone maintaining this scheme in the future. Some units don't belong to a department at all, like eg. Junior Research Groups which are being considered Projects but have no institute's department above them.

  • We are seriously thinking to switching to Option 1!

Advantages of Option 1

  • department and project can be chosen individually
  • no need for maintaining matrix stating which project belongs to which department
  • possibility to host projects under the institute's level
  • no need to duplicate projects under department
  • easier for migration: project=edoc collection (so there is a 1:1 mapping),

authors have one department (can also be mapped 1:1) (in our present migration procedure I would have to assign up to 8 org units for some users and we would have to delete 7 after migrating).

Disadvantages two org units have to be filled in by user, with possibility to forget one or make non-plausible choices (must be checked with workflow)


  • What does this mean in terms of metadata? I guess the xml output will change? Are there different

fields for org units as departments and org units as projects?

  • How can a selection on department or project be made? Or do we simply all department A Group, B Group, etc... and

all projects A Project, B Project, etc.

  • Are there any future OrgUnit developments regarding Metadata, structure, relations we should take into account?
  • Do we really not mis something essential? It's kind of weird that we were so certain in february.
  • It will probably mean we have to ask Zest to alter there script.

--Karin 10:53, 25 September 2008 (UTC)


  • YB workflow
  • full text migration, ask if user with privileged view is needed to see full texts with private access level
    • user with privileged view is needed --Nicole 14:36, 9 September 2008 (UTC) outcome of telco with Karin
    • we will migrate all full texts, set to all full texts except for the public ones "private" visibility, create users with priv. view, set locators to eDoc (for non public full texts), the eDoc file visibility will also be put to PubMan into the component MD.

Open Issues with QA migration[edit]


Indexing apparently doesn't work well, we can't find all publications which makes testing very enduring. For example in edoc MPIIPL publication year 2008 has 96 entries, only 23 are found with search.

should be solved with next migration --Nicole 09:02, 13 January 2009 (UTC)

Virtual collections from edoc have been migrated, that wasn't intended , eg. from collecton endnote_import_2004_06. Haven't checked other items yet, from other virtual edoc collections.

JIRA tickets created--Natasa 12:05, 20 November 2008 (UTC)
should be fine now --Nicole 09:02, 13 January 2009 (UTC)

edoc collection (Categories and Concepts across Language and Cognition) hasn't been migrated due to spelling differences (capital Concepts on Pubman and small concepts on eDoc) should be fixed so that the Capital 'C' should be on PubMan

JITA tickets created --Natasa 12:05, 20 November 2008 (UTC)
should be fine now --Nicole 09:02, 13 January 2009 (UTC)

Mapping issues specific voor MPIPL[edit]

Specific routines for MPIPL

  • all full texts which aren't public should have content-category publisher version
  • PHD Thesis, date of approval is also date published in print
  • all journal articles should have review status 'peer review'

can you do this?--Karin 22:35, 4 December 2008 (UTC)

will be implemented --Nicole 07:45, 15 January 2009 (UTC)

Advanced Search[edit]

Search for genre=book gives you book chapters as well with the results

According the search specification, the search for genres uses index "any-genres". Therefore for "book" genre search it will also give "book-chapters" genre search (as they have source with genre "book"). Issue reported in JIRA ( We will ask FIZ for extra index. PubMan will be changed only after R4.0 if this indeed is the requirement. --Natasa 12:00, 20 November 2008 (UTC)
No news currently on that issue. --Nicole 09:02, 13 January 2009 (UTC)

The same applies of course also for examples like proceedings And proceedings paper. Another issue is that a search on genre searches in GENRE of the publication PLUS genre of source, see eg this query: Your exact query was: ( escidoc.any-genre="proceedings" ) and ( escidoc.content-model.objid="escidoc:persistent4" ) gives me also a publication with genre conference report because the SOURCE GENRE is proceedings I tried to refine with NOT source genre is proceedings but the query didn't accept the NOT from the pull-down menu in the GUI. (Your exact query was: ( escidoc.any-genre="proceedings" ) and ( escidoc.any-source="proceedings" ) and ( escidoc.content-model.objid="escidoc:persistent4" = this is not what I specified in the query!! I specified NOT. I was curious about this refinement because Natasa's statement sounded as if a distinct search for a genre might not be possible, so at least the refinement should work! --Karin 12:46, 20 November 2008 (UTC)

Search for creator role. I wanted to know all publications by author Brown. With this query:uery : Your exact query was: ( escidoc.any-persons="brown" ) and ( escidoc.creator.role="AUTHOR" ) and ( escidoc.content-model.objid="escidoc:persistent4" ) I also got items where Brown is editor, like e.g. Do I have to specify more? If I specify for creator role is editor the search are limited to editor role ( escidoc.any-persons="brown" ) and ( escidoc.creator.role="EDITOR" ) and ( escidoc.any-organizations="psycholinguistics" ). --Karin 22:01, 4 December 2008 (UTC)

tested on dev. PubMan. Seems to work fine there with newly indexed items. Hope, it will be fine with new PubMan release. --Nicole 09:25, 13 January 2009 (UTC)
  • Just to be sure: I hope dev. PubMan is not the same release as on QA, because also with newly indexed items from dec 17th, Advanced search for

author=levinson and organization=psycholinguistics, the search also retrieves this item: where Levinson is an editor of special issue.--Karin 12:31, 14 January 2009 (UTC)

      • Advanced searching on genre and author still not possible with release 4.1 on march 6th. At some point we really need to know how many publications there are of a person as an author, how may journal articles are being produced by the institute and how many book chapters, etc. Still not possible! UTLIMATE DEADLINE IN TIME BEFORE next fachbeirat WICHI S IN autumn --Karin 15:20, 6 March 2009 (UTC)2009!!!

Genre types Migration 1 of 12 november 2008[edit]

--Natasa 12:04, 20 November 2008 (UTC) JIRA tickets created

Journal article fine, apa export works fine

book data are not fully migrated, eg pages are missing, APA export: authors/editors, publishing place and publishers name are missing

PublishingInfo.Publisher, PublishingInfo.Place and TotalNumberOfPages have not been created in PubMan. Checked mapping. It is described there correctly :-) --Nicole 13:43, 19 November 2008 (UTC)
checked item on qa pubman. Missing fields are in now. --Nicole 09:28, 13 January 2009 (UTC)

Thesis Missing data - name, place university - pages - MPI Series in Psycholinguistics (in EDOC part of ...) - no locator citation stye: APA (in press) instead year -> separate issue have to look into this

PublishingInfo.Publisher, PublishingInfo.Place and TotalNumberOfPages have not been created in PubMan. Checked mapping. Also the creation of the source failed. What is written in the eDoc field relType=ispartof should be written into source.title and the source.genre should be set to "Series". Also the identifier (URL in eDoc record) has not been created like in the mapping: Identifier.IdType.Other and value in Identifier.Id. --Nicole 13:58, 19 November 2008 (UTC)


Talk missing data -conference details (name, place, date) example

Event.Title, Event.Place, Event.StartDate have not been created as specified in the mapping. --Nicole 14:24, 19 November 2008 (UTC)


Missing data: - conference details (name,place, date) - Publisher details (name, place, pages) - APA: only year and title are displayed (NOT editor, event) example: edoc id: 291041

PublishingInfo.Publisher, PublishingInfo.Place, Event.Title, Event.Place, Event.StartDate and Event.EndDate have not been created as specified in the mapping. --Nicole 15:24, 19 November 2008 (UTC)

POSTER Missing data: - conference details - publishing details: year Example: EDOC id: 305652

Event.Title, Event.Place, Event.StartDate and Event.EndDate have not been created as specified in the mapping. --Nicole 15:24, 19 November 2008 (UTC)

Working paper: (= Paper in Edoc) missing data: - no year - no locator example: EDOC id: 127485

TotalNumberOfPages has not been created as specified in the mapping. --Nicole 15:24, 19 November 2008 (UTC)

Proceedings Paper missing data - conference details (name, place, date) - publisher details (name, place - Physical Description (DVD) example versus --Karin 16:32, 19 November 2008 (UTC)

Special issue missing data - place of publication, publisher, total number of pages - APA: no editors ! example versus --Karin 16:32, 19 November 2008 (UTC)

Karin, Meggie, and Annemieke

--Karin 10:57, 15 November 2008 (UTC)

Genre types Migration 2 of 28 november 2008[edit]


  • all data migrated
  • question about mapping of 'Date of approval' which is migrated to 'Date accepted', at least for our institute that also date published in print.
don't understand the feedback :-( --Nicole 09:52, 13 January 2009 (UTC)
  • Hallo Nicole, es sieht so aus, als ob unsere Dissertation 'in press' wÄren, weil auf eDoc als Datumfeld nur 'Date of Approval' beim Dokumenttyp

Thesis vorkommt. Unsere Dissertationen sind aber gedruckte Publikationen in einer Reihe, d.h. das Date of Approval koennte eigentlich ein Date published sein, so dass auch die Referenz richtig rauskommt. Das Mapping vom Dokumenttyp Thesis kann ich nicht so schnell finden, befürchte aber das es nicht viel hilft. Was jetzt? --Karin 12:44, 14 January 2009 (UTC)

Hallo Karin, ich würde vorschlagen, dass wir in dem Fall das date of approval auf date published in print mappen oder eine Ausnahme für die APA Zitierung hinzufügen. Was wäre denn aus deiner Sicht praktischer? --Nicole 07:47, 15 January 2009 (UTC)


  • all data migrated
  • mapping issue: if the event only lasted one day, we only entered the start date in edoc,

in pubman now appears a '-'. Is the edoc input incorret or the pubman display not very nice?

entered in JIRA as improvement --Nicole 07:47, 15 January 2009 (UTC)

Proceedings papers

  • Proceedings papers -> if published with metadata event, source, publishing date, like APA is fine.

don't understand the feedback :-( --Nicole 09:52, 13 January 2009 (UTC)
  • the pubman items are deleted, so we have to see in the new migration. [Feedback by Karin]

Special issue

APA editors missing in export

will be part of APA revision. --Nicole 09:52, 13 January 2009 (UTC)


will be part of APA revision. --Nicole 09:52, 13 January 2009 (UTC)

Proceedings paper

  • year has to be at the beginning of the apa export, with month if possible
will be part of APA revision. --Nicole 09:52, 13 January 2009 (UTC)

book chapter

  • strange behavior of the names of the source editors, if only ONE source editor is present:

if only ONE editor in source is listed in edoc as: surname, first name or surname, initial first name --> then the name appears in PubMan as two names with first name 'null'. This only seems to happen IF the first name is written out, it doesn't seem to happen if the firs name is entered as an initial. see: ( or (http;// Ps: this is a publication with edoc status submitted (!) as opposed to: (one editor, initial) or (full names BUT more editors!)--Karin 15:55, 8 December 2008 (UTC)

will ask dev. team to check what is wrong here --Nicole 09:52, 13 January 2009 (UTC)
is already done --Kurt 10:25, 13 January 2009 (UTC)
good!--Karin 12:48, 14 January 2009 (UTC)
  • book editors are duplicated ( a book is published in a series, the names of the editors of the book are duplicated into the fields of the editors of the series ALTHOUGH no editors names were entered into eDoc. Of course this gives strange citation styles--Karin 15:55, 8 December 2008 (UTC)

will ask dev. team to check what is wrong here --Nicole 09:52, 13 January 2009 (UTC)
is already done --Kurt 10:25, 13 January 2009 (UTC)
fine!--Karin 12:48, 14 January 2009 (UTC)

Migration 3 of 23 december 2008[edit]

Testmigration where only 3 OUS where migrated, i.e. 'Categories and Concepts across Language and Cognition', 'Decoding Continous Speech', 'Language in Action' = 253 documents.


  • Firefox Version isn't working well in edit mode. Input fields for Source and Organization aren't visible. Firefox 3.0.5 seems to work well. System should be compabatible to at least Firefox 2.x Versions.
sent bug report to dev. team --Nicole 07:48, 16 January 2009 (UTC)

Rupert: This is already fixed and will be in place after next installation. --Rupert 08:16, 16 January 2009 (UTC)

  • REST interface isn't working, it does not return results.--Karin 14:57, 15 January 2009 (UTC)
problem with jibx, Michael is working on it. --Nicole 07:48, 16 January 2009 (UTC)
  • Sorting of results doesn't work, neither with browse or advanced search. Only sorting by date seems to work. I also tried sorting by creator, genre and title, but it didn't sort correctly.--Karin 09:52, 20 January 2009 (UTC)
Entered as bug. --Nicole 08:46, 23 January 2009 (UTC)

APA export

  • Journal article: The title of the Journal is in bold if I choose the PDF option (not with html export!), that shouldn't be the case.--Karin 15:06, 21 January 2009 (UTC)
Asked Despoina to clarify with Vlad. --Nicole 08:46, 23 January 2009 (UTC)

Testing of the migrated data on Jan 21 2009 Please check if mapped according to specification. --Karin 14:51, 21 January 2009 (UTC)

  • EDOC field: Comment of the author/creator is not visible in the PubMan-item, see PubMan item: Example: EDOC --> PubMan

Comment of the author will not be migrated to PubMan. See mapping. --Nicole 09:04, 23 January 2009 (UTC)
  • EDOC field " Relations: Is part of----" is only partially displayed in PubMan. The comment ( in which we give the series number) is not visible anymore, as a consequence the series sequence number is lost in PubMan. All our MPI dissertations are published as part of the series "MPI series in Psycholinguistics" and have a separate series number accordingly. So it's important this number is visible.

Example: -->PubMan and vs

Extended mapping. Until now only the value was mapped and not the comment. --Nicole 09:04, 23 January 2009 (UTC)
  • Book Chapter (of a book that has been published as part of a series). The series number appears twice in PubMan. First as a volume number of source 1 (this is incorrect, because source one is the booktitle) and then as a volume/issue number of source II ( this is correct because source II is the title of the Series. vs

Mapping is okay, implementation has to be changed. Sent mail to Julia to correct. --Nicole 09:23, 23 January 2009 (UTC)
  • Proceedings paper (of Proceedings which have been published as part of a Series). The title of the series and the volume number do not appear in the PubMan item.

Example: vs -->

Functional changes in edit mode

  • with locator the content category 'supplementary material' is predefined, which makes sense, but then we should really make the purpose of that field more obvious--Karin 16:05, 15 January 2009 (UTC)
Rupert, please check. --Nicole 07:49, 16 January 2009 (UTC)

Rupert: This is not the case on QA: It is similar to file upload instead an can get any content type. --Rupert 08:16, 16 January 2009 (UTC)

Testmigration 4 of 28 january 2009[edit]

General remarks[edit]


- sorting doesn 't work in workspace and search results--Karin 16:59, 2 February 2009 (UTC)

MPI specific issues[edit]

specific issues aren't implemented[[1]], meaning pdfs don't have content-category 'publisher's pdf AND journal articles don't have review status 'peer review' as had been specified.--Karin 16:59, 2 February 2009 (UTC)

The issue with the date of a thesis has been implemented

GUI interface[edit]

There is still wrong. If I want to modify this item, I only see the item up to source 2 and special issue, but no creator etc. If I want to save the item the message: System Messages and Warnings:* The role of a creator of a source is not provided. comes up and I can't save the item.--Karin 16:59, 2 February 2009 (UTC)

Mapping of OrgUnit[edit]

Something went wrong with mapping/indexing of OU 'Event Representation (closed)'. The items of the edoc collection are migrated to PubMan but no index for the collection has been build. If you browse to the collection no items are found and with the affiliations it lists: escidoc:persistent22, see eg. 20:58, 3 February 2009 (UTC) Thank you for reindexing the Event Representation collection today!!--Karin 12:16, 5 February 2009 (UTC)

Discussion of APA revised[edit]

  • Journal article
  • bugs:

- omit comma after LAST author is listed, also if as in this case only one author is listed examples:, looks like this: Brown P., (2008). Up, down, and across the land: Landscape terms and place names in Tzeltal. Language Sciences, 30151-181. looks like: Chen A., Gussenhoven C., & Rietveld T., (2004). Language specificity in perception of paralinguistic intonational meaning. Language and Speech, 47(4)311-349.

Correct APA in Abstract I inserted correct APA citation in abstract: 16:59, 2 February 2009 (UTC)

- volume number and pages are pasted together, probable cause: no issue is listed in data. --Karin 16:59, 2 February 2009 (UTC)

This reference should read: Brown P. (2008). Up, down, and across the land: Landscape terms and place names in Tzeltal. Language Sciences, 30, 151-181. Title of journal should be italic.

Testmigration 5 of 05 february 2009[edit]

Test of migration 5 of January, 5th 2009. (testing already began during import)

First of ALL: All of our special requests have been taken care of, so thank you very much for the quick correcting after the migration last week. So far, we have found only some minor issues, see below --Karin 14:47, 5 February 2009 (UTC)

Mapping issues[edit]

  • genre book chapter, editor name torn apart

The bug that the editor name was was torn apart, in Lastname, Null and Firstnamme, Null as two names (as reported with migration from 28 nov 2008 under heading book chapter)has been solved UNLESS the Name of the editor is a 'strange' one: eg Gaskell, M. Gareth, eg. and 14:47, 5 February 2009 (UTC) -- MFranke 15:34, 5 February 2009 (UTC) I realized that there are a few more issues with names:

   * The format "Gaskell, M. Gareth" will be added to our AuthorDecoder tool
   * The format "Grønn, Atle" will be added to our AuthorDecoder tool
   * The nulls in the givennames will be removed when empty
   * We won't fix for the following author strings because they have a scrambled format: "Schiller, Niels O.; Ferreira, Victor S.; Alario F.-Xavier", "Cornips, L.; Doetjes J."

We found these typos in edoc. They were editors of special issues and that is a field where one has to type 'free' text. So we corrected the above mentioned authors. Were those the only ones?--Karin 10:21, 6 February 2009 (UTC)

It was edocid: 328404 and 225793. --Nicole 11:37, 6 February 2009 (UTC)
  • Issue number of Journal not migrated if Title of issue has been populated

this happens with Genre Journal Article publ in a special issue, number of issue not migrated to pubman

Also with the genre Proceedings Papers: 14:47, 5 February 2009 (UTC)

  • physical description in edoc we entered 'CD-Rom' in the field physical description

We discussed this previously with Nicole, but we do not remember what the outcome was, most probably we did put it in the wrong edoc field and/or the field won't be migrated. If it isn't in the edoc to pubman mapping can we get a list of edoc numbers where we wrote in the field physical description--Karin 14:47, 5 February 2009 (UTC)

GUI issue[edit]

Proceedings start and end-date Label is: Start-/End Date, but field value is presented as: 2007-08-31 - 2007-09-03, All the dashes seem a bit confusing, but that is maybe a matter of taste.--Karin 14:47, 5 February 2009 (UTC)

Testmigration 6 of 23 february 2009 with Release 4.1[edit]

REST interface[edit]

  • queries with no limitation of max records or records above 1000 give an error, please provide correct query for our whole institute --Karin 16:50, 23 February 2009 (UTC)
correct query should be:"escidoc:22007" and escidoc.content-model.objid=escidoc:persistent4&exportFormat=APA&outputFormat=snippet&language=all&sortKeys=&sortOrder=ascending problem results from inconsistence in data structure. Michael is informed and currently working on a solution for the problem. --Nicole 10:44, 24 February 2009 (UTC)
  • Home page abbreviated when logged in. When I'm logged in, I don't see the whole homepage, I e.g. do not see the part about the Pubman interface. I am using Firefox 3.0.6. When logged out, I see the whole homepage of PUbMan.--Karin 16:50, 23 February 2009 (UTC)
Please press F5. Then it should work. The problem is, that the old CSS information is still saved in the browser. --Nicole 10:44, 24 February 2009 (UTC)
I heard that solution before, BUT it doesn't work. I now tried to demonstrate via a wink file. (the file is actually too big for colab, ..)login behavior Can you reproduce this problem?--Karin 11:50, 24 February 2009 (UTC) Problem is solved now. I expected the PUbman homepage to be the same for a logged in user than for a logged out user, which isn't the case. So I think it is fine, now--Karin 14:08, 26 February 2009 (UTC)

Browsing and Searching[edit]

  • while browsing the Organization Tree, there are no items attached to 'Max Planck Institute for Psycholinguistics', 'Acquisition Group' and 'Comprehension Group'
results from inconsistence in data structure. Michael is informed and currently working on a solution for the problem. --Nicole 10:44, 24 February 2009 (UTC)
works now. --Nicole 08:16, 26 February 2009 (UTC)
  • Search on all fields in the search box and/or in advanced search for the word 'acquisition' gives no results, whereas there are a lot of items with 'Acquisiton Group' and or the word acquisition in the title
results from inconsistence in data structure. Michael is informed and currently working on a solution for the problem. --Nicole 10:44, 24 February 2009 (UTC)
works now. --Nicole 08:16, 26 February 2009 (UTC)
  • Advanced Search doesn't seem to work at all--Karin 16:50, 23 February 2009 (UTC)
results from inconsistence in data structure. Michael is informed and currently working on a solution for the problem. --Nicole 10:44, 24 February 2009 (UTC)
works now. --Nicole 08:15, 26 February 2009 (UTC)

Local Tags[edit]

Editing local tags gives Error: type Exception report

message description The server encountered an internal error () that prevented it from fulfilling this request. exception javax.servlet.ServletException: #{EditItem.acceptLocalTags}: java.lang.NullPointerException--Karin 16:50, 23 February 2009 (UTC)

entered bug in our bug tracking tool. Will be solved as soon as possible. --Nicole 10:44, 24 February 2009 (UTC)
should work after next release on QA, which will be this Friday or next Monday. --Nicole 08:18, 26 February 2009 (UTC)

APA errors[edit]

  • --Makarenko 10:54, 25 February 2009 (UTC) Fixed. The problem was with exactly 2 source editors.
  • editors of the series (source2) have been added to the editors of the book (mabye we should just delete the editors of the series out of edoc?)

  • --Makarenko 10:14, 25 February 2009 (UTC) We have no handling of two sources in APA citation style. Should I stick on the 1st source? Yes, stick to the first source, I hope that occurrences with multiple sources don't happen too often. --Karin 11:58, 25 February 2009 (UTC)
  • --Makarenko 15:18, 25 February 2009 (UTC) Fixed.
  • book: no publishing information --Karin 12:10, 24 February 2009 (UTC)

  • --Makarenko 11:54, 25 February 2009 (UTC) Fixed.