Difference between revisions of "Talk:Control of Named Entities/eDoc HowTo"
(New page: == Journal vocab: Documentation of Despoina's work == FIRST STAGE: PROCEDURE for cleaning NON-SFX JOURNAL NAMES.<br> Filter text: coalesce(sfxid,'')='' and rm=0. <br> Sorting: edoctitle a...) |
|||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
== Journal vocab: Documentation of Despoina's work == | == Journal vocab: Documentation of Despoina's work == | ||
FIRST STAGE: PROCEDURE for cleaning NON-SFX JOURNAL NAMES.<br> | ===FIRST STAGE: PROCEDURE for cleaning NON-SFX JOURNAL NAMES.<br>=== | ||
Filter text: coalesce(sfxid,'')='' and rm=0. <br> | Filter text: coalesce(sfxid,'')='' and rm=0. <br> | ||
Sorting: edoctitle ascending | Sorting: edoctitle ascending | ||
Line 15: | Line 15: | ||
(''PLEASE NOTE:'' Titles starting with the word "Proceedings" were left out at this phase; they should be dealt with at another point) | (''PLEASE NOTE:'' Titles starting with the word "Proceedings" were left out at this phase; they should be dealt with at another point) | ||
SECOND STAGE: PROCEDURE for merging existing journal entries<br> | ===SECOND STAGE: PROCEDURE for merging existing journal entries<br>=== | ||
===THIRD STAGE: Define "Ansetzungsregeln"=== | |||
* we will check ZDB, EZB and ISSN Registry for "Ansetzungsregeln" | |||
====ISSN Registry==== | |||
As they use MARC21/UNIMARC, I suppose they also use AACR. So we should check maybe there. | |||
====ZDB cataloging rules==== | |||
* [http://support.d-nb.de/iltis/katricht/zdb/4000.pdf Hauptsachtitel] | |||
* [http://support.d-nb.de/iltis/katricht/zdb/3220.pdf Ansetzungssachtitel] | |||
* [http://support.d-nb.de/iltis/katricht/zdb/4005.pdf Titel von Unterreihen fortlaufender Sammelwerke] | |||
* [http://support.d-nb.de/iltis/katricht/zdb/4212.pdf Nebentitel] | |||
* [http://support.d-nb.de/iltis/katricht/zdb/4213.pdf Haupt- und Nebensachtiteln und den Zusätzen] | |||
* [http://support.d-nb.de/iltis/katricht/zdb/3260.pdf Sachtitelformen für Nebeneintragungen] | |||
==Queries to match names== | ==Queries to match names== |
Latest revision as of 12:19, 30 September 2008
Journal vocab: Documentation of Despoina's work[edit]
FIRST STAGE: PROCEDURE for cleaning NON-SFX JOURNAL NAMES.
[edit]
Filter text: coalesce(sfxid,)= and rm=0.
Sorting: edoctitle ascending
- Run the filter and check each row (search entry in ZDB, Web, etc.). If check in ZDB not successful: check edoc record and google for edoc record title
- For entries which are not journals: set rm=1
- For entries which appear to be a series (but not a journal): set rm=2
- For entries which neither has been found via ZDB or edoc title Google search: set rm=3
- For entries which neither has been found in eDoc: set rm=5
- Replace journal abbreviations with full-journal name. TIPP: Synchronization in case not certain or in case when journaltitle is abbreviation: check journal abbreviation in ZDB
(PLEASE NOTE: Titles starting with the word "Proceedings" were left out at this phase; they should be dealt with at another point)
SECOND STAGE: PROCEDURE for merging existing journal entries
[edit]
THIRD STAGE: Define "Ansetzungsregeln"[edit]
- we will check ZDB, EZB and ISSN Registry for "Ansetzungsregeln"
ISSN Registry[edit]
As they use MARC21/UNIMARC, I suppose they also use AACR. So we should check maybe there.
ZDB cataloging rules[edit]
- Hauptsachtitel
- Ansetzungssachtitel
- Titel von Unterreihen fortlaufender Sammelwerke
- Nebentitel
- Haupt- und Nebensachtiteln und den Zusätzen
- Sachtitelformen für Nebeneintragungen
Queries to match names[edit]
- Placed on colab, not to loose them
- The query returns the number of possible authors, and the number of entries in docs.
Those with more entries may possibly have different name variants (for first names).
select substr(p2.uml_idx,1,position(',' in p2.uml_idx)), count(*) from people p2 where p2.col=73 and p2.rm is null and archivalgrp(p2.grp)=1 group by substr(p2.uml_idx,1,position(',' in p2.uml_idx)) order by 2 desc
- The query returns all mpg authors that match the uml_idx criteria above (extended to mpg only)
select distinct p1.name, p1.fname, substr(p1.uml_idx,1,position(',' in p1.uml_idx)) from people p1 where substr(p1.uml_idx,1,position(',' in p1.uml_idx)) in ( select substr(p2.uml_idx,1,position(',' in p2.uml_idx)) from people p2 where p2.col=73 and p2.rm is null and archivalgrp(p2.grp)=1 and p2. mpgpeople=1 group by substr(p2.uml_idx,1,position(',' in p2.uml_idx)) ) and p1.mpgpeople=1 and p1.rm is null and p1.col=73 and archivalgrp(p1.grp)=1 order by 3, 1,2 asc