PubMan OA Statistics

MPDL,PubMan,ESciDoc This page is a collection of requirements by the OA Team concerning statistical data from eDoc and PubMan.

=Requirements for the current eDoc Statistics= The aim is to measure the Open Access performance of the MPS in relation to the eDoc institutional Repository.

Items to be considered:

 * With status "Released"
 * With status "Yearbook"

Dimensions
The following dimensions should be derivable by the statistics:
 * Time
 * System (Released)
 * First Published (Online or Print)
 * Organization
 * MPS
 * Institutes
 * Section within MPS

Report format
For the monthly statistics report sent by eDoc, the following files should be generated in CSV-format:
 * 1) One file for released items per month
 * 2) One file for released items per calendar year accumulated
 * 3) One file for yearbook items per month
 * 4) One file for yearbook items per calendar year accumulated

Released items per month

 * Month_for_statistic;MPG;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached
 * Month_for_statistic;MPI_ID1;MPI_NAME1;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached
 * Month_for_statistic;MPI_IDn;MPI_NAMEn;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached
 * Month_for_statistic;MPI_IDn;MPI_NAMEn;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached

Released items per calendar year accumulated

 * Year_for_statistic;MPG;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached
 * Year_for_statistic;MPI_ID1;MPI_NAME1;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached
 * Year_for_statistic;MPI_IDn;MPI_NAMEn;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached
 * Year_for_statistic;MPI_IDn;MPI_NAMEn;System;Published;MetaData_Only;One_file_attached;One_file_OA_attached

Yearbook items per month

 * Month_for_statistic;MPG;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached
 * Month_for_statistic;MPI_ID1;MPI_NAME1;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached
 * Month_for_statistic;MPI_IDn;MPI_NAMEn;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached
 * Month_for_statistic;MPI_IDn;MPI_NAMEn;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached

Yearbook items per calendar year accumulated

 * Year_for_statistic;MPG;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached
 * Year_for_statistic;MPI_ID1;MPI_NAME1;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached
 * Year_for_statistic;MPI_IDn;MPI_NAMEn;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached
 * Year_for_statistic;MPI_IDn;MPI_NAMEn;Yearbook Year;MetaData_Only;One_file_attached;One_file_OA_attached

(";" used as separator here)

Additionally, the number of item views and fulltext views per month and per year shall be provided.

=Further Requiremenents= The current specification for statistics for PubMan can be found under PubMan statistics

Hit list of OA Authors

 * There is a need for having the number of metadata records, fulltext files and open access files per author. Out of this information a ranking list should be created.
 * Like:
 * Malte is an author of 20 articles, i.e. 0.8% of the records within the IR which is 15% of the MPI
 * Nicole is an author of 5 articles, i.e. 0.15% of the records within the IR which is 8% of the MPI
 * Anja is an author of 10 articles, i.e. 0.5% of the records within the IR which is 10% of the MPI
 * or:
 * Ulla is an author of 11 articles, i.e. 0.55% of the full text files in the IR which is 12% of the MPI
 * Inga is an author of 9 articles, i.e. 0.45% of the full text files in the IR which is 9% of the MPI
 * or:
 * Inga is an author of 7 articles, i.e. 0.23% of the OA full text files in the IR which is 5% of the MPI
 * Kristina is an author of 18 articles, i.e. 0.76% of the OA full text files in the IR which is 14% of the MPI
 * or:
 * 30% of Inga's publications are full texts files
 * 20% of Inga's publications are OA full text files

Conditions for this need

 * unique person names
 * Status: PubMan supports unique person names by CoNE. eDoc does not uniquely identify creators.--Ulla 11:15, 12 August 2009 (UTC)
 * ToDo for eDoc
 * go through all eDoc creators and identify dublicate entries, which will be considered as one author. For this all creators in eDoc will get an unique ID, so that the IDs for the same persons can be considered for a query. This might be helpful for the migration from eDoc to PubMan.

Questions

 * Is there any difference between authors and other types of creators?
 * Is it a problem that sum won't be 100?
 * Is the institute/collection of relevance for these requests, i.e. Malte is an author for 8% of the MPDL records within the IR? Answer from Anja: Yes we are interested (see additions to Nicole's question list above)

Usage of Journals per institute

 * There is a need to have overview on journals, in which a certain institute often publishes.
 * Like:
 * MPDL has 5 articles in Bibliotheksdienst
 * MPDL has 3 articles in B.I.T. online
 * Or:
 * Christoph has 5 articles in Bibliotheksdienst
 * Nicole has 2 articles in B.I.T. online

Conditions for this need

 * controlled list of journals
 * Status: As journal names are part of CoNE, i.e. controlled list. eDoc does currently not consider unique identification of journals--Ulla 11:20, 12 August 2009 (UTC)
 * ToDo for eDoc
 * go through all Journal titles on eDoc and identify dublicate entries, which will be considered as one journal. For this all journals in eDoc will get an unique ID, so that the IDs for the same journals can be considered for a query. This might be helpful for the migration from eDoc to PubMan.

Question
Is there also a for statistics on journal and their financial model, i.e. complete OA, open choice, subscription only,...

If not, what is the reason for requesting a rights statement to be added to the journal file, see Talk:ControlledVocab (step 3)?
 * --Ulla 11:20, 12 August 2009 (UTC): What information is needed? If specific articles are open access? Or if journal supports OA-Publishing? Or if publisher supports OA-Publishing? (for all articles? for some articles? via auhtor-pays-model, i.e. MPG-specifics?)

RSS feeds
Use personalized queries to feed RSS (e.g. for personal statistics)
 * --Ulla 11:21, 12 August 2009 (UTC): RSS feeds are available. We plan to provide RSS feed for specific person via Researhcer Portfolio ("Subscribe to RSS for this person")

IP-based statistics
IP-based statistical analysis

OA Golden Way
add "OA Golden" to license string per item, to understand number of OA items due to Golden Way
 * Can be done by using local tags. Problematic: Does the depositor/moderator know if certain article is "OA golden"? What is the statistical relevance of having a number of xx items with OA golden => we still do not know, what could potentially be the number of OA golden, right?--Ulla 11:24, 12 August 2009 (UTC)