PubMan Func Spec Statistics

From MPDLMediaWiki
Jump to navigation Jump to search

Scenarios[edit]

Basic item statistics[edit]

The user wants to see the download numbers of a specific publication as part of the detailed description of an item. He wants to understand the difference between downloads by not-logged in users (anonymous users) and logged-in users.

Interpretations/Analysis[edit]

  • google/google scholar

The user wants to understand the visibility of his research. He wants to understand how often is article is accessed based on a hit by google/google scholar search.

  • geographical distribution

The user wants to understand the geografical distribution of research interest.

  • domain statistics

The user wants to understand the background of users accessing his research and would like to understand the domain they are coming from, such as .com, .edu, .gov

  • institutional statistics

The user wants to know how if his publication is accessed by colleagues from the same or neighbour departments of his institution.

Visualisations[edit]

The user wants to get statistical data and its analyses/interpretations adequately visualised by graphs, timelines, diagrams and geografical maps.

E.g.: Google chart API

Reports[edit]

The user wants to generate statistical reports, such as:

Current coverage in the repository

  • The author wants to have an overview on all his records deposited and released
  • The institution wants to have an overview on all records of specific department deposited and released

Open Access

  • The author wants to have an overview on all his records with at least one OA component
  • The author wants to have an overview on the number of OA components
  • The institution wants to have an overview on the number of OA components per department
  • see Requirements for OA Statistics

Functional Specification[edit]

Basic item/component statistics[edit]

Status: implemented (see Use case view item statistics)

Schedule: R3

  • Numbers of retrievals for a specific item from the framework by users (anonymous/all).
    • Visibility: public
  • Numbers of file downloads for a specific item by users (anonymous/all).
    • Visibility: public

Downloads of files with content type “copyright transfer agreement” and “correspondence” are not counted.

  • Numbers of downloads for a specific file by users (anonymous/all
    • Visibility: public

Reports by Advanced Search[edit]

Status: in design (See Use case Advanced Search)

Schedule: R 4.0/ tbd

Statistical reports can be generated by running an advanced search (on demand or saved queries). Results are delivered as item lists.

Results of search can be either displayed or exported as PDF, XML, Layoutformat (current Standard Export)

For upcoming release (needs scheduling), we will include CSV to standard Export and Search&Export. Columns of CSV have to be defined, e.g. for multiple authors, for org units

See Use case for details on indexing and operators, for defining the query.

Reports by Administrative Search[edit]

Status: in specification (See Use Case Administrative Search)

Schedule: tbd

Statistical reports can be generated by running an administrative searches (on demand or saved queries). Results are delivered as item lists.

Results of search can be either displayed or exported as PDF, CSV, XML, Layout format (Standard Export and Search&Export has to be extended with CSV. Columns of CSV have to be defined, e.g. for multiple authors, for org units)


See Use case for details on indexing and operators, for defining the query.

Reports by filtering workspace[edit]

Status: implemented

Schedule: R3.8.

  • only visible for Depositor
  • all items pending/submitted/released/withdrawn/in rework by user
  • Exports possible (incl XML with R4.0)

Reports by extending statistical service[edit]

Status: in design

Schedule: tbd

  • track events and data per event:
    • events: track retrieval (i.e. retrieval of item detailed page), track every submission, track every release, track every withdrawn item, track every "fetch Metadata" (incl. source), track number of items exported in specific context, track every retrieval of a researcher portfolio
    • relevant for Faces: see Faces Statistics
    • data per event: who (IP needed to know the origin, erase last digits), context of item, itemID, affiliations (and all parents) of itemID, solutionID, status, date of event, logged in/not logged in user (including their OU/Institute), item has OA component/not.


  • Visualization of time lines
    • item submissions/releases of org unit over time
    • access to specific item over time
    • growth of OA components in org unit over time
    • download of files per OUs, Contexts
    • access to files from non MPS useres, from MPS users (which institute) and from users from a specific institute --Nicole 10:37, 18 December 2008 (UTC)

web statistics/web log[edit]

Status: in specification

Schedule: tbd

  • to be checked if AWSTATS or Piwic is suitable.
  • Piwic constraints: no authors, contexts, dates. Server needed.


Future developments[edit]

  • Co-authoring
    • The author wants to understand how many of his papers are co-authored.
    • The author wants to understand how many of his papers are co-authored with members of Institution X.
  • Cross-repository
    • The author wants to know the number of items related to his name across the collection/contexts in the repository.
  • Harvesting statistics
    • The public user wants to get an overview on the reception a certain preprint, which is not limited to the local repository. He would like to see statistics on the download of the preprint by summing the download numbers of all copies located in distributed repositories.
  • Citation metrics/Research evaluation
  • Private statistics
    • Some statistical information might be access restricted to the author himself or administrative staff.
  • OA Statistics
    • The OA policy department would like to prepare a statistical report on increase of OA publications in the last 2 years, with details on monthly level, for the MPS. (See more scenarios under OA statistics)
  • Admin User statistics
    • who often was the easy, advanced, admin search used
    • how many logins per day/week/month
    • often used search terms