Difference between revisions of "Talk:EDoc to PubMan migration"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 23: Line 23:
*Placed on colab, not to loose them
*Placed on colab, not to loose them


The query returns the number of possible authors, and the number of entries in docs.
**The query returns the number of possible authors, and the number of entries in docs.
Those with more entries may possibly have different name variants (for first names).
Those with more entries may possibly have different name variants (for first names).


''select substr(p2.uml_idx,1,position(',' in p2.uml_idx)), count(*) from people p2 where p2.col=73 and p2.rm is null and archivalgrp(p2.grp)=1 group by substr(p2.uml_idx,1,position(',' in p2.uml_idx)) order by 2 desc
''select substr(p2.uml_idx,1,position(',' in p2.uml_idx)), count(*) from people p2 where p2.col=73 and p2.rm is null and archivalgrp(p2.grp)=1 group by substr(p2.uml_idx,1,position(',' in p2.uml_idx)) order by 2 desc
''
''
**The query returns all mpg authors that match the uml_idx criteria above (extended to mpg only)
select distinct p1.name, p1.fname, substr(p1.uml_idx,1,position(',' in p1.uml_idx)) from people p1
where substr(p1.uml_idx,1,position(',' in p1.uml_idx))
in (
select substr(p2.uml_idx,1,position(',' in p2.uml_idx))
from people p2
where p2.col=73 and p2.rm is null and archivalgrp(p2.grp)=1
and p2. mpgpeople=1
group by substr(p2.uml_idx,1,position(',' in p2.uml_idx)) )
and p1.mpgpeople=1
and p1.rm is null
and p1.col=73 and archivalgrp(p1.grp)=1
order by 3, 1,2 asc

Revision as of 08:05, 8 September 2008

Journal vocab: Documentation of Despoina's work[edit]

FIRST STAGE: PROCEDURE for cleaning NON-SFX JOURNAL NAMES.
Filter text: coalesce(sfxid,)= and rm=0.
Sorting: edoctitle ascending

  1. Run the filter and check each row (search entry in ZDB, Web, etc.). If check in ZDB not successful: check edoc record and google for edoc record title
    1. For entries which are not journals: set rm=1
    2. For entries which appear to be a series (but not a journal): set rm=2
    3. For entries which neither has been found via ZDB or edoc title Google search: set rm=3
    4. For entries which neither has been found in eDoc: set rm=5
  2. Replace journal abbreviations with full-journal name. TIPP: Synchronization in case not certain or in case when journaltitle is abbreviation: check journal abbreviation in ZDB


(PLEASE NOTE: Titles starting with the word "Proceedings" were left out at this phase; they should be dealt with at another point)

SECOND STAGE: PROCEDURE for merging existing journal entries
THIRD STAGE: Define "Ansetzungsregeln"


Queries to match names[edit]

  • Placed on colab, not to loose them
    • The query returns the number of possible authors, and the number of entries in docs.

Those with more entries may possibly have different name variants (for first names).

select substr(p2.uml_idx,1,position(',' in p2.uml_idx)), count(*) from people p2 where p2.col=73 and p2.rm is null and archivalgrp(p2.grp)=1 group by substr(p2.uml_idx,1,position(',' in p2.uml_idx)) order by 2 desc


    • The query returns all mpg authors that match the uml_idx criteria above (extended to mpg only)

select distinct p1.name, p1.fname, substr(p1.uml_idx,1,position(',' in p1.uml_idx)) from people p1 where substr(p1.uml_idx,1,position(',' in p1.uml_idx)) in ( select substr(p2.uml_idx,1,position(',' in p2.uml_idx)) from people p2 where p2.col=73 and p2.rm is null and archivalgrp(p2.grp)=1 and p2. mpgpeople=1 group by substr(p2.uml_idx,1,position(',' in p2.uml_idx)) ) and p1.mpgpeople=1 and p1.rm is null and p1.col=73 and archivalgrp(p1.grp)=1 order by 3, 1,2 asc