MPDL IT Infrastructure/Planning

MPDL

=Estimates=

PubMan

 * Number of Users Estimate in 12/24 months
 * Number of Items Estimate in 12/24 months
 * Number of Institutes in 12/24 months
 * Size of Publications in 12/24 months
 * Size of Supp-Material in 12/24 months

Total estimate based on migration data + evtl. growth from 20% per year 3TB until next 3 years * 2 (Backup) = 6TB

VIRR

 * Number of Images in 12/24 months
 * Size of Image Collection in 12/24 months

numbers from project proposal
- Known limitation: high resolution scans will not be managed in the repository.
 * MPIeR
 * 17000 pages VIRR
 * 500000 Pages Legal journals
 * 17000 pages Discursos
 * KHI Florenz
 * 52500 pages Giglio project
 * 40000 pages Translatio Nummorum
 * 7.500 volumes Rara (no of pages? standard 200 pags/book) => 1500000
 * 32.000 Art journals
 * Bibliotheka Herziana
 * 7000 pages Lineamenta (up to 750MB high resolution scans)
 * 386 Maps CIPRO - calculated 386 pages (?)
 * 274 other digitized resources (   Romguiden, Romveduten, Rom: Kirchen, Rom: Profanbauten, Antike Kunst, Sonstige Tafelwerke) (on of pages standard 200 pages/resource) => 34800
 * MPIB
 * 41.000 pages Institute series
 * 180000 pages ZFN

Original size to calculate: 2MB for all resolutions per page (new info) Sum of known pages: 886500 => 200KB Web resolution, 500MB (?), 30KB Thumbnail => 170GB ; 0.16 TB + ca. 422TB (with 500MB average size for all listed objects, see remark by Lineamenta above) Unkown numbers: 7500 volumes Rara, 386 Maps CIPRO, 274 other digitized resources

Taken for calculation: ca. 200 scans pro book (volume)

Sum of known/estimated no of pages : 886500+1500000+34800+386=2421686 pages Storage for pages required: 2421686 * 2MB = 4.6 TB * 2 (backup) = 9.2TB

IMEJI

 * Number of Images in 12/24 months
 * Size of Images in 12/24 months

Highly unpredictable numbers
 * setting to 500000 images for estimate
 * average size calculated for original: 1MB ->500GB
 * average size calculated for web-resolution+thumbnails: 330KB -> 170GB
 * estimate in total: max 1TB*2 (Backup) = 2TB

DARIAH

 * Not known
 * start with 500GB storage

AWOB

 * Number of projects in 24 months
 * Size of Objects in 24 months

Highly unpredictable numbers
 * set to 5 projects
 * set to 200GB per project


 * first run 1TB in total + backup = 2TB

List of Services
=Additional Information=

Capacity planning

 * in virtualized environment most important is (computing) processing power
 * from my experience with web applications (under moderate load), cpu is almost never an issue; memory and I/O on the other hand may be.Robert 08:24, 10 November 2010 (UTC)


 * how much is required by applications
 * how much we currently have available
 * how to distribute the load in a virtual environment
 * allocation of "room" for additional compute requirements or workloads


 * is heavy reliance on the instincts of the IT staff
 * there are tools to estimate - however most of them are expensive

Must do

 * Inventory
 * processing
 * workload per application (for each application) - additionally identify peaks
 * memory (how much is needed)
 * local storage (how much is needed)
 * network IO bandwidth
 * physical
 * number of processors
 * number of cores
 * processor speed
 * physical memory available.
 * I/O capabilities of the server
 * number of network interface cards
 * number of storage interface cards


 * Ownership and service requirements
 * server ownership
 * SLA - agree on server utilization, disaster recovery plans etc.


 * Virtualization
 * what can run in virt environment and what not
 * when planning migration from physical to virtual environment - make sure there is enough resources for migration and target system itself


 * Disaster recovery (DR)
 * define for which server to consider DR
 * define conditions on DR (immediate, after some time etc.)


 * Storage
 * Local
 * total=used+available
 * SAN
 * total=used+available

Plan
We have to plan for growth. Therefore we need to try to predict how capacity growth can be supported with respect to


 * Application inventory
 * current state
 * upcoming
 * SLA - define time to extension


 * Users & environments
 * Number of development servers, number of developer - users (dev, test, qa). Understand how this is to be distributed in future:
 * per solution
 * per coreservice
 * per project
 * combination of above
 * Special environments e.g.:
 * scalability/performance testing
 * cloud-technology research?
 * other?
 * MPG users - institutes
 * institutes and institute users per solution
 * number of institute-specific productive environments (e.g. calculate with current solutions, percentage of Institute number e.g. 10% for institute specific solutions to be brought quickly in the overall environment?)


 * external users
 * SLA/policies - do we do something for them
 * shall the capacity plan involve them or not?
 * Timing
 * define short-term activity plan
 * define long-term activity plan


 * Monitoring
 * establish regular monitoring of key factors for the plan

Planning scenarios

 * use current state and envision upon X% growth - only
 * agree on SLA with institutes in sense:
 * we can provide resources up to X
 * for all other types extensions agree on time to relize extension of the infrastructure
 * important (based on monitoring results, to estimate properly how much in average is used at the moment per institute for example)
 * use current state + X% growth and add Y% for unknown/unpredictable cases  (see also below as more concrete)
 * use current state + X% growth - add Y% selectively for institutes
 * e.g. for 10 of institutes we will need big storage and computing capacities (X store, Y processor, Z memory)
 * calculate this immediately as addition to the "regularly" envisioned growth