Christmas Server Monitoring

MPDL

=Introduction=


 * On big problem send all users an email.

This server sould be monitored (server url)
The normal domain name should also work (for example r-coreservice.mpdl.mpg.de).
 * Pubman (srv03.mpdl.mpg.de)
 * Live-Coreservice (srv02.mpdl.mpg.de)
 * Live-Fedora (srv01.mpdl.mpg.de)
 * Faces (vm04.mpdl.mpg.de)
 * ViRR (vm13.mpdl.mpg.de)
 * Diamonds (vm29.mpdl.mpg.de)
 * Live-r-Coreservice (vm31.mpdl.mpg.de)
 * Live-r-Fedora (vm33.mpdl.mpg.de)
 * Peer-Corservice (vm17.mpdl.mpg.de)
 * Peer-Pubman (vm18.mpdl.mpg.de)

What should be tested for availability?

 * check the homepage of the solution
 * make a search on the solution
 * open a result

Standard way to resove the problem?

 * If a solution or Coreservice isn't available
 * rcjboss stop
 * rcjboss start


 * If fedora isn't available
 * rcjboss stop
 * /opt/fedora/tomcat/bin/shutdown.sh
 * /opt/fedora/tomcat/bin/startup.sh
 * rjboss start


 * if we have problems with the Postgres-DB
 * rcjboss stop
 * /opt/fedora/tomcat/bin/shutdown.sh
 * rcpostgresql stop
 * rcpostgresql start
 * /opt/fedora/tomcat/bin/startup.sh
 * rjboss start

All other problems should be resolved in the bussiness time.

Where do you find Opennms?
http://monitor.mpdl.mpg.de:8080/opennms/login.jsp

What's important in Opennms?
You can find two very interesting options in OpenNMS


 * Outages
 * Click on Outages
 * Now Click on Current Outages

Here you can the all the problems on the servers. You can ignore vm34.mpdl.mpg.de and srv06.mpdl.mpg.de in this list.


 * Resource Graphs
 * Node list
 * Select the server from where you will get the stats
 * Click on Resource Graphs
 * Drag and Drop Node-level Performance Data
 * Click on Graph selection
 * Look on Load Average and System Memory Stats
 * If you click on the graphic und can focus the stats on one category
 * You can select period of time for the stats

If something is very high in the stats and the system will use a lot of swap, restart the Corservice.