Talk:PubMan Func Spec eSciDoc To eDoc Mapping

From MPDLMediaWiki
Revision as of 17:56, 22 November 2010 by Makarenko (talk | contribs) (→‎eDoc Export method)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Open Issues[edit]

Other[edit]

  • could you search if we filled in a source2 on Pubman with publication year 2009? If so, we want to check the mapping.


Resolved Issues[edit]

Article[edit]

Book[edit]

InBook[edit]

Conference paper[edit]

  • Free keywords are mapped to thesaurus instead of free keywords (might be a general error not only this genre - but I can't find the other example at the moment)
    • dcterms:subject element of pubman item is the list of the keywords, thus it can be directly mapped to edoc keywords field
    • dc:subject elements are list of disciplines taken from controlled vocabulary. Can be mapped to the eDoc element discipline --Makarenko 18:35, 9 February 2010 (UTC)
      • agreed --Karin 14:34, 10 February 2010 (UTC)
  • Date of publication not mapped
    • I need more precise mapping dates in whole. Please, revise. --Makarenko 18:35, 9 February 2010 (UTC)
      • I think the 'problem' lies with the online dates. I checked again with the new upload. There are dates (which are mandatory on eDoc) if a date published in print (dcterms:issued xsi:type="dcterms:W3CDTF">2009</dcterms:issued>) x5.snippet is available. However if no date in print is available, no date was mapped to edoc. Please, then take the date published online, like <eterms:published-online xsi:type="dcterms:W3CDTF">2009</eterms:published-online>. These proceedings will only be published online. Is this enough specs?--Karin 15:30, 10 February 2010 (UTC)
      • --Makarenko 17:29, 10 February 2010 (UTC): Done.
  • Separator of editors have to be '; '
    • Done --Makarenko 18:35, 9 February 2010 (UTC)
  • URI not mapped (see: http://edoc4.gwdg.de/display.epl?mode=doc&id=408193 http://pubman.mpdl.mpg.de/pubman/item/escidoc:101988:16 )
    • @Karin, do you mean the missing URI for the source? This info is actually not needed for the Yearbook, therefore we ignored it. --Ulla 14:56, 10 February 2010 (UTC)
      • Yes, I meant that URI. It's ok, since its not needed for the yearbook.
  • Place of publication and publisher not mapped
    • Done. --Makarenko 18:43, 9 February 2010 (UTC)
      • no, its'not yet done. Is this the same issue as with the inbook publishing info data? --Karin 11:00, 11 February 2010 (UTC)
      • Shall we handle this case as we handle InBook now? (map publishing info of source to publishing info of publication).--Friederike 12:35, 15 February 2010 (UTC)
      • As discussed with Karin, please handle Conference Paper like InBook for this issue.--Friederike 13:41, 15 February 2010 (UTC)
        • --Makarenko 14:56, 15 February 2010 (UTC): Fixed.

Issue[edit]

Thesis[edit]

  • thesis has to be mapped to GENRE PHD-Thesis (this is what we used before on eDoc)
    • --Makarenko 19:02, 9 February 2010 (UTC): Done.
  • date published in print should be mapped to date accepted on eDoc
    • --Makarenko 19:02, 9 February 2010 (UTC): See my comment above.
  • title of source is not mapped
    • Not relevant for the genre. (see: http://edoc4.gwdg.de/display.epl?mode=doc&id=408446 http://pubman.mpdl.mpg.de/pubman/item/escidoc:107934 ) --Makarenko 19:02, 9 February 2010 (UTC)
      • That maybe true, but is is our one and only series ....We can live it and enter it by hand...--Karin 15:32, 10 February 2010 (UTC)
        • --Makarenko 16:15, 10 February 2010 (UTC): Not needed do it manually, just define the mapping :)
        • Is this not anyway mapped? Please check: Mapping: "If source@type = series: zim_transfer.record.publication.source.inseries.titleofseries " this should be the case here I think. The source.title mapping only seperates for different source genres, not for different publication genre. --Friederike 16:20, 10 February 2010 (UTC)
        • Clarified with Vlad: A thesis can not have a series (source in general) in edoc. therefore he will not map it.--Friederike 13:27, 12 February 2010 (UTC)
        • Ok, that's true we had a workaround on eDoc. --Karin 12:27, 15 February 2010 (UTC)

Mapping of Genres[edit]

  • Thesis
    • PubMan genre type thesis has to be mapped to GENRE PHD-Thesis (this is what we used before on eDoc) or accordingly to degree type on PubMan. In the 2009 data for the yearbook we only used PhdThesis.--Karin 17:59, 9 February 2010 (UTC)
      • --Makarenko 11:33, 10 February 2010 (UTC): Done.
        • The pubman 'date published in print' should be mapped to 'date of approval'. this has to be done. At the moment there is no date with the PhD thesis on the edoc test upload.--Karin 15:35, 16 February 2010 (UTC)
          • --Makarenko 16:54, 16 February 2010 (UTC): Done.

Other[edit]

General remarks for Mapping MPIPL--Karin 17:44, 9 February 2010 (UTC)

  • these first mapping specs are based on YEARbook data for 2009. Not all genres and metadata elements which have been used on PubMan are therefore specified yet.
    • --Ulla 14:59, 10 February 2010 (UTC) Agreed. We focused on the genre types and elements needed for the Yearbook (Pflichtfeldertabelle)
  • external affiliations; we do not want them on eDoc. It looks very chaotic sometimes all these external affliations which can't be assigned to one author. It could make sense if just the external affiliations of the MPI authors of a publication are shown, but not those of the other authors. But that's not a priority.
    • --Makarenko 19:13, 9 February 2010 (UTC): Removed.
  • the dates appear as they are on Pubman either as yyyy, yyyy-mm, or yyyy-mm-dd. Could we just have the yyyy on eDoc.
    • --Makarenko 19:13, 9 February 2010 (UTC): Fixed.
      • I forgot to specify one exception. The start and end dates of an event should be migrated to edoc as they appear on pubman. --Karin 10:21, 23 February 2010 (UTC)
        • --Makarenko 11:17, 23 February 2010 (UTC): Done.
  • Free keywords seem to have been mapped to the field thesaurus
    • --Makarenko 19:13, 9 February 2010 (UTC): Se my comment above.


  • With some genres not all links (URLs etc) have been migrated
    • See my comment above. --Makarenko 19:13, 9 February 2010 (UTC)
    • In addition to the PubMan item Id, which will be imported as "localid" to eDoc, the fulltext (components and locators) will be also available on edoc, as URL.
      Karin, please decide:
      • a)We import all components (fulltexts&locators) to eDoc, independent from visibility level on PubMan. If a component (fulltext or locator) has visibility level other than public, the user cannot access without log-in to PubMan (or the external system).
      • b)We import only components (fulltexts&locators) to eDoc which have visibility level "public" on PubMan.
      • If you want, we can provide this imported components always as "clickable" links
      • If you want, we can set default comment for imported components (e.g. "Fulltext available via PubMan http://pubman.mpdl.mpg.de)

--Ulla 15:05, 10 February 2010 (UTC)

  • After talk to Karin, it is not necessary to link all fulltexts, as we already have a link to the whole publication in PubMan. The comment for this link on edoc should be: 'More information or fulltext available via PubMan: URL'. --Friederike 14:08, 15 February 2010 (UTC)
    • --Makarenko 14:32, 15 February 2010 (UTC): Unfortunately, the file comment cannot be passed during the upload, that can be only done with an edoc SQL-query.
      • Decided: After final transformation Vlad will set the description value. (not possible during transformation).--Friederike 14:51, 15 February 2010 (UTC)

Mapping of MPIPL docaffiliations[edit]

Name eSciDoc (prod) eDoc
MPI für Psycholinguistik escidoc:55201 11683
Adaptive Listening escidoc:55207
Categories across Language and Cognition escidoc:55211
Comparative Cognitive Anthropology Group escidoc:55209 19528
Communication Before Language Group escidoc:55208 19526
Evolutionary Processes in Language and Culture escidoc:55210
Individual Differences in Language Processing Group escidoc:102879
Information Structure in Language Acquisition escidoc:55212
Language Acquisition Group escidoc:55202
Language and Cognition Group escidoc:55204 19524
Language and Genetics escidoc:55213
Language Comprehension Group escidoc:55203
Language in Action escidoc:55214
Library escidoc:55221
Multimodal Interaction escidoc:55216
Neurobiology of Language Group escidoc:102880
Other Research escidoc:55217
Syntax, Typology, and Information Structure escidoc:63282
Technical Group escidoc:55220
The Dynamics of Multilingual Processing escidoc:55218
Unification escidoc:55219
Decoding Continuous Speech escidoc:55222
Event Representation escidoc:55223
Gesture escidoc:55224
Language Production Group escidoc:55205
Language Production Group Levelt escidoc:55206
Neurocognition of Language Processing escidoc:55225
Phonological Learning for Speech Perception escidoc:55227
Pioneers of Island Melanesia escidoc:55226
Sign Language Typology escidoc:55228
Space escidoc:55229
The Comparative Study of L2 Acquisition escidoc:55230
The Dynamics of Learner Varieties escidoc:55231
The Neurobiology of Language escidoc:55232
The Role of Finiteness escidoc:55233
Utterance Encoding escidoc:55234
Mechanisms and Representations in Comprehending Speech escidoc:55215
  • Affiliations in edoc:
MPI für Psycholinguistik (edoc:11683) - mpgunit
Comprehension Group (edoc:19523) - mpgsunit
Language and Cognition Group (edoc:19524) - mpgsunit
Acquisition Group (edoc:18865) - mpgsunit
Production Group (edoc:19525) - mpgsunit
Adaptive Listening (edoc:19527) - mpgsunit
Communication Before Language Group (edoc:19526) - mpgsunit
Comparative Cognitive Anthropology Group (edoc:19528)- mpgsunit
Evolutionary Models of Language Change Group (edoc:19529) - mpgsunit


eDoc Export method[edit]

my @amap = ('genre','corporatebody','title','bundletitle','titlealt','markuptype',
'markuptitle','language','publisher','publisheradd','datepublished',
'datemodified','dateaccepted','datesubmitted','spage','epage',
'artnum','journaltitle','journalabbreviation','issuetitle','issuenr',
'issuecontributorfn','issuecorporatebody','volume','invitationstatus',
'nameofevent','placeofevent','dateofevent','enddateofevent','booktitle',
'titleofproceedings','proceedingscontributorfn','bookcreatorfn',
'bookcontributorfn','bookcorporatebody','editiondescription',
'titleofseries','seriescontributorfn','seriescorporatebody','os',
'osversion','platform','instremarks','abstract','markupabstract',
'authorcomment','versioncomment','discipline','keywords',
'educationalpurpose','enduser','phydesc','numberofwords','toc',
'pubstatus','refereed'
);

my %map = (
    title => 'title',
    titlealt => 'alttitle',
    genre => 'type',
    language => 'language',
    publisher => 'publisher',
    publisheradd => 'publisheradd',
    corporatebody => 'corporatebody',
    dateaccepted => 'fdateaccepted',
    datepublished => 'fyear',
    datesubmitted => 'fdatesubmitted',
    datemodified => 'fdatemodified',
    artnum => 'artnum',
    issuenr => 'issue',
    volume => 'vol',
    spage => 'spage',
    epage => 'epage',
    journaltitle => 'journal',
    journalabbreviation => 'journalabbr',
    editiondescription => 'editiondes',
    booktitle => 'booktitle',
    bookcreatorfn => 'bookcreatorfn',
    bookcontributorfn => 'bookcontributorfn',
    bookcorporatebody => 'bookcb',
    issuecontributorfn => 'issuecontributorfn',
    issuecorporatebody => 'issuecb',
    titleofproceedings => 'titleofproceedings',
    proceedingscontributorfn => 'proceedingscfn',
    seriescontributorfn => 'seriescontributorfn',
    seriescorporatebody => 'seriescontributorcb',
    issuetitle => 'issuetitle',
    bundletitle => 'bundletitle',
    titleofseries => 'titleofseries',
    abstract => 'abstract',
    authorcomment => 'comment',
    versioncomment => 'vercom',
    discipline => 'discipline',
    educationalpurpose => 'educationalpurpose',
    enduser => 'enduser',
    instremarks => 'instremarks',
    keywords => 'freekeywords',
    invitationstatus => 'invitationforevent',
    numberofwords => 'noofwords',
    os => 'os',
    osversion => 'osversion',
    nameofevent => 'nameofevent',
    placeofevent => 'placeofevent',
    dateofevent => 'fdateofevent',
    enddateofevent => 'fenddateofevent',
    platform => 'platform',
    pubstatus => 'pubstatus',
    phydesc => 'phydes',
    refereed => 'refereed',
    toc => 'toc',
    markuptype => 'markuptype',
    markuptitle => 'markuptitle',
    markupabstract => 'markupabstract'

);

sub getInExportSchema(){
    my ($self,$flag,$mywrt) = @_;

    unless($flag==0 || $flag==1 || $flag==2){
        croak "method signature failure";
    }
    if(@_<2){
        croak "method signature failure";
    }

    my $res = $self->{_res};

    if(!$mywrt){
        $mywrt=$self->{wrt};
    }

    my ($tmp,$mm);

    if($flag){
        # 1 Start the Document Top Container
        $mywrt->startTag('record', "id"=>"$self->{_id}");
    }else{
    # For OAI Export
        $mywrt->startTag('record');

        # Header (attribute status TODO:)
        $mywrt->startTag('header');

        # Identifier
        $mywrt->startTag('identifier');
        $mywrt->characters('oai:mpg.edoc:'.$$res{id});
        $mywrt->endTag('identifier');
        # DateStamp
        $mywrt->startTag('datestamp');
        $mywrt->characters($$res{'fts'});
        $mywrt->endTag('datestamp');
        # setSpec TODO:

        $mywrt->endTag('header');

        $mywrt->startTag('metadata'); # HACK TODO:

        $mywrt->startTag('record',"xmlns"=>"http://edoc.mpg.de/doc/schema/export/",
            "xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance",
            "xsi:schemaLocation"=>"http://edoc.mpg.de/doc/schema/export/
          http://edoc.mpg.de/doc/schema/zim_export_temp.xsd"
        );

    }



    # 1.1 Start the metadata container
    $mywrt->startTag('metadata');

    # 1.1.1 Start the basic container
    $mywrt->startTag('basic');

    foreach $mm (@amap){
# dirty hack to substitute year with dateaccept for 14,15,16 genres
        if($mm eq "datepublished"&&grep(/^$$res{type}$/,('Thesis','PhD-Thesis','Habilitation'))&&$self->_ck($$res{fdateaccept})){
            $mywrt->startTag("dateaccepted");
            $mywrt->characters($$res{fdateaccept});
            $mywrt->endTag("dateaccepted");
        }elsif($self->_ck($$res{$map{$mm}})){
            $mywrt->startTag($mm);
            $mywrt->characters($$res{$map{$mm}});
            $mywrt->endTag($mm);
        }
    }
    # ftext
    if(defined($self->{_fturl}) && @{$self->{_fturl}} > 0){
        foreach $tmp (@{$self->{_fturl}}){
            $mywrt->startTag('fturl',"viewftext"=>$$tmp{viewftext},"filename"=>$$tmp{filename}, "size"=>$$tmp{size});
            $mywrt->characters(uri_unescape("§BASE§/get.epl?fid=$$tmp{fileid}&did=".
            "$$tmp{docid}&ver=$$tmp{ver}"));
            $mywrt->endTag('fturl');
        }
    }

    # 1.1.1 Close the basic container
    $mywrt->endTag('basic');


    if(defined($self->{_pep}) && @{$self->{_pep}} > 0){

        # 1.1.2 Start the creators container
        $mywrt->startTag('creators');

        foreach $tmp (@{$self->{_pep}}){
            my $ie = ($$tmp{mpgpeople} == 1 ? 'mpg':'unknown') ;
            my $myrole = lcfirst($$tmp{type});
            my $creatorType = ($$tmp{isgroup}==0 ? 'individual':'group');

            $mywrt->startTag('creator',"internextern"=>"$ie","role"=>"$myrole",
            "creatorType"=>"$creatorType");
            $mywrt->startTag('creatorini');
            $mywrt->characters($$tmp{'initials'});
            $mywrt->endTag('creatorini');

            $mywrt->startTag('creatornfamily');
                $mywrt->characters($$tmp{'name'});
            $mywrt->endTag('creatornfamily');
            if($$tmp{fname}!~/^\s*$/){
                $mywrt->startTag('creatorngiven');
                    $mywrt->characters($$tmp{'fname'});
                $mywrt->endTag('creatorngiven');
            }

            $mywrt->endTag('creator');
        }

        # 1.1.2 Close the creators container
        $mywrt->endTag('creators');
    }

    # Relations Container
    # 1.1.3 Start the relations container
    $mywrt->startTag('relations');

    if(defined($self->{_rels}) && @{$self->{_rels}} > 0){
        foreach $tmp (@{$self->{_rels}}){
            my $type = lc($$tmp{'type'});
            my $reltype = lc($$tmp{'reltype'});

            $mywrt->startTag('relation',"type"=>"$type",
            "reltype"=>"$reltype");
            $mywrt->characters($$tmp{'identifier'});
            $mywrt->endTag('relation');

        }
    }
        # 1.1.3 Close the relations container
        $mywrt->endTag('relations');

    # Identifiers Container
    # 1.1.4 Start the identifiers container
    $mywrt->startTag('identifiers');
    if(defined($self->{_idents}) && @{$self->{_idents}} > 0){
        foreach $tmp (@{$self->{_idents}}){
            my $type = lc($$tmp{'type'});

            $mywrt->startTag('identifier',"type"=>"$type");
            $mywrt->characters($$tmp{'identifier'});
            $mywrt->endTag('identifier');

        }
    }
    # 1.1.4 Close the identifiers container
    $mywrt->endTag('identifiers');


    # 1.1 Close the metadata container
    $mywrt->endTag('metadata');



    # 1.2 Start the docaff container
    $mywrt->startTag('docaff');

    if($self->_ck($$res{'externalaff'})){
        $mywrt->startTag('docaff_external');
        $mywrt->characters($$res{'externalaff'});
        $mywrt->endTag('docaff_external');
    }

    if($self->_ck($$res{'project'})){
        $mywrt->startTag('docaff_researchcontext');
        $mywrt->characters($$res{'project'});
        $mywrt->endTag('docaff_researchcontext');
    }

    # Document Affiliations
    if(defined($self->{_daf}) && @{$self->{_daf}} > 0){
        foreach $tmp (@{$self->{_daf}}){
            $mywrt->startTag('affiliation');

            $mywrt->startTag('mpgunit', "id"=>"$$tmp{'aff_unit'}");
            $mywrt->characters($$tmp{'aff_unitdesc'});
            $mywrt->endTag('mpgunit');
            if($$tmp{'aff_subunitdesc'} &&
                $$tmp{'aff_subunitdesc'}!~/^\s*$/){
                $mywrt->startTag('mpgsunit', "id"=>"$$tmp{'aff_subunit'}");
                $mywrt->characters($$tmp{'aff_subunitdesc'});
                $mywrt->endTag('mpgsunit');
            }
            if($$tmp{'aff_subsubunitdesc'} &&
                $$tmp{'aff_subsubunitdesc'}!~/^\s*$/){
                $mywrt->startTag('mpgssunit', "id"=>"$$tmp{'aff_subsubunit'}");
                $mywrt->characters($$tmp{'aff_subsubunitdesc'});
                $mywrt->endTag('mpgssunit');
            }

            $mywrt->endTag('affiliation');
        }
    }



    # 1.2 Close the docaff container
    $mywrt->endTag('docaff');

    # 1.2a Start of MPG yearbook status...

    my $inyb = $yb->inYB( $res );
    if ( $inyb != 0
        &&  (
                ( $$res{'edoc_status'} eq 'Submitted' &&  grep /^$$res{'grp'}$/, @WC ) ||
                ( ($$res{'edoc_status'} eq 'Submitted' || $$res{'edoc_status'} eq 'Released') &&  grep /^$$res{'grp'}$/, @AC )
            )  ) {
        $mywrt->startTag('MPGyearbook', status=>$$res{'edoc_pubstatus'}==1?'Recommended':'Released');
        $mywrt->characters($inyb);
        $mywrt->endTag('MPGyearbook');
    }

    # 1.2a End of MPG yearbook ownership...


    # 1.3 Start the metametadata container
    $mywrt->startTag('metametadata');

    $mywrt->startTag('lastmodified');
    $mywrt->characters($$res{'metalastmodified'});
    $mywrt->endTag('lastmodified');


    if( !($flag && $flag==2)  && $self->_ck($$res{'userid'})){
        my $cuser=§MODPRE§::CORE::User->new(userid=>$$res{'userid'});
        $mywrt->startTag('owner');
            if($self->_ck($cuser->getFullname())){
                $mywrt->startTag('fullname');
                $mywrt->characters($cuser->getFullname());
                $mywrt->endTag('fullname');
            }
            if($self->_ck($cuser->getemail())){
                $mywrt->startTag('email');
                $mywrt->characters($cuser->getemail());
                $mywrt->endTag('email');
            }
            if($self->_ck($cuser->getmof())){
                $mywrt->startTag('insid');
                $mywrt->characters($cuser->getmof());
                $mywrt->endTag('insid');
            }
        $mywrt->endTag('owner');
    }


    # 1.3 Close the metametadata container
    $mywrt->endTag('metametadata');



    # 1.4 Start the rights container
    if ($self->_ck($$res{'copyright'})){
        $mywrt->startTag('rights');
            $mywrt->startTag('copyright');
            $mywrt->characters($$res{'copyright'});
            $mywrt->endTag('copyright');
        $mywrt->endTag('rights');
    }

    unless($flag){ # HACK TODO:
        $mywrt->endTag('record');
        $mywrt->endTag('metadata');
    }

    # 1 End the top container
    $mywrt->endTag('record');


}