[index.gif] Real-world configuration examples Here's some real-world configuration examples for monit. It can be helpful to look at the examples given here to see how a service is running, where it put its pidfile, how to call the start and stop methods for a service, etc. You are welcome to cut & paste configuration into your own monitrc control file. NB! please check and edit as needed, some ip-addresses and paths mentioned here may or will differ from your system. You may also want to checkout Christian Hopp's configuration example. He demonstrate how you can setup a clean and structured configuration utilizing the monit include statement (which he was responsible for). Each check for a particular service is written in it's own file and included in the top-level monit control file. Many of the service files are also well worth studying in their own rights. Although this example is for a Linux Debian distribution it is general enough to be of interest for any platform. * System Services + Cron (program timer) + Gdm (gnome desktop manager) + Inetd (internet service manager) + Syslogd (system logfile daemon) + Xfs (X font server) + YPBind (Yellow page bind daemon) + Net-SNMP (SNMP agent) + NTP (time server) + Nscd (name service caching daemon) * Name Services + Bind (chrooted) * FTP Services + Proftpd * Login Services + SSHD * WWW Services + Apache (web server) + Zope (appication server) + Squid (http/ftp proxy) + Privoxy (spamfilter proxy) * Mail Services + Postfix (mail server) + sendmail (mail server) + Qpopper (pop3 server) + Spamassassin daemon (spam scan daemon) + Amavis-new (mail virus scanner) * Virus Scanner + Sophie (virus scan daemon) + Trophie (virus scan daemon) + Clamavd (virus scan daemon) * Printing Services + LPRng (printer daemon) * Database Services + MySQL Server + OpenLDAP Server * File Services + Samba (windows file/domain server) * Sun ONE Services + iPlanetDirectoryServer (Sun ONE) + iPlanetMessagingServer processes (Sun ONE) + iPlanetCalendarServer processes (Sun ONE) * Misc Services + apcupsd (APC ups daemon) + Webmin (remote admin service) + STunnel (SSL tunnel) * Misc Usage + Watch and analyze crashdumps (Solaris) + Watch and analyze crashdumps (Linux) System Services Cron (program timer) When used with Solaris the init.d script needs a modification. Add the following line after start of cron according to the Monit FAQ: /usr/bin/pgrep -x -u 0 -P 1 cron > /var/run/cron.pid check process cron with pidfile /var/run/cron.pid group system start program = "/etc/init.d/cron start" stop program = "/etc/init.d/cron stop" if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on cron_rc check file cron_rc with path /etc/init.d/cron group system if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Gdm (gnome desktop manager) check process gdm with pidfile /var/run/gdm.pid start program = "/etc/init.d/gdm start" stop program = "/etc/init.d/gdm stop" if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } Inetd (internet service manager) check process inetd with pidfile /var/run/inetd.pid start program = "/etc/init.d/inetd start" stop program = "/etc/init.d/inetd stop" if failed host 192.168.1.1 port 25 protocol smtp then restart # e.g. exim if failed host 192.168.1.1 port 515 then restart # e.g. cups-l pd if failed host 192.168.1.1 port 113 then restart # e.g. ident if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } Syslogd (system logfile daemon) check process syslogd with pidfile /var/run/syslogd.pid start program = "/etc/init.d/sysklogd start" stop program = "/etc/init.d/sysklogd stop" if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } check file syslogd_file with path /var/log/syslog if timestamp > 65 minutes then alert # Have you seen "-- MARK --"? Xfs (X font server) check process xfs with pidfile /var/run/xfs.pid start program = "/etc/init.d/xfs start" stop program = "/etc/init.d/xfs stop" if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } YPBind (Yellow page bind daemon) check process ypbind with pidfile /var/run/ypbind.pid start program = "/etc/init.d/nis start" stop program = "/etc/init.d/nis stop" if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } Net-SNMP (SNMP agent) check process snmpd with pidfile /var/run/snmpd start program = "/etc/init.d/snmpd start" stop program = "/etc/init.d/snmpd stop" if failed host 192.168.1.1 port 161 type udp then restart if failed host 192.168.1.1 port 199 type tcp then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } NTP (time server) check process ntpd with pidfile /var/run/ntpd.pid start program = "/etc/init.d/ntpd start" stop program = "/etc/init.d/ntpd stop" if failed host 127.0.0.1 port 123 type udp then alert if 5 restarts within 5 cycles then timeout Nscd (name service caching daemon) check process nscd with pidfile /var/run/nscd/nscd.pid start program = "/etc/init.d/nscd start" stop program = "/etc/init.d/nscd stop" if 5 restarts within 5 cycles then timeout Name Services Bind (chrooted) check process named with pidfile /var/named/chroot/var/run/named/named.pid start program = "/etc/init.d/named start" stop program = "/etc/init.d/named stop" if failed host 127.0.0.1 port 53 type tcp protocol dns then alert if failed host 127.0.0.1 port 53 type udp protocol dns then alert if 5 restarts within 5 cycles then timeout FTP Services Proftpd check process proftpd with pidfile /var/run/proftpd.pid start program = "/etc/init.d/proftpd start" stop program = "/etc/init.d/proftpd stop" if failed port 21 protocol ftp then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } Login Services SSHD check process sshd with pidfile /var/run/sshd.pid start program "/etc/init.d/sshd start" stop program "/etc/init.d/sshd stop" if failed port 22 protocol ssh then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } WWW Services Apache (web server) Hint: It is recommended to use a "token" file (an empty file) for monit to request. That way, it is easy to filter out all the requests made by monit in the httpd access log file. Here's a trick shared by Marco Ermini, place the following in httpd.conf to stop apache from loggin any requests done by monit: SetEnvIf Request_URI "^\/monit\/token$" dontlog CustomLog logs/access.log common env=!dontlog In some cases init scripts for apache and apache-ssl are separated, e.g. Debian Linux. check process apache with pidfile /opt/apache_misc/logs/httpd.pid group www start program = "/etc/init.d/apache start" stop program = "/etc/init.d/apache stop" if failed host 192.168.1.1 port 80 protocol HTTP request /monit/token then restart if failed host 192.168.1.1 port 443 type TCPSSL certmd5 12-34-56-78-90-AB-CD-EF-12-34-56-78-90-AB-CD-EF protocol HTTP request /monit/token then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on apache_bin depends on apache_rc check file apache_bin with path /opt/apache/bin/httpd group www if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file apache_rc with path /etc/init.d/apache group www if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Zope (application server) check process zope with pidfile /opt/Zope/var/zProcessManager.pid start program = "/etc/init.d/zope start" stop program = "/etc/init.d/zope stop" group www if failed host 192.168.1.1 port 8080 protocol HTTP then restart if 5 restarts within 5 cycles then timeout every 5 alert foo@bar alert 123456@sms on { timeout } Squid (http/ftp proxy) check process squid with pidfile /opt/squid/logs/squid.pid group www start program = "/etc/init.d/squid start" stop program = "/etc/init.d/squid stop" if failed host 192.168.1.1 port 3128 then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on squid_bin depends on squid_rc check file squid_bin with path /opt/squid/bin/squid group www if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file squid_rc with path /etc/init.d/squid group www if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Privoxy (spamfilter proxy) check process privoxy with pidfile /opt/privoxy/var/privoxy.pid group www start program = "/etc/init.d/privoxy start" stop program = "/etc/init.d/privoxy stop" if 5 restarts within 5 cycles then timeout if failed host 192.168.1.1 port 8118 then restart alert foo@bar alert 123456@sms on { timeout } depends on privoxy_bin depends on privoxy_rc check file privoxy_bin with path /opt/privoxy/sbin/privoxy group www if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file privoxy_rc with path /etc/init.d/privoxy group www if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Mail Services Postfix (mail server) check process postfix with pidfile /var/spool/postfix/pid/master.pid group mail start program = "/etc/init.d/postfix start" stop program = "/etc/init.d/postfix stop" if failed port 25 protocol smtp then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on postfix_rc check file postfix_rc with path /etc/init.d/postfix group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Sendmail (mail server) check process sendmail with pidfile /var/run/sendmail.pid group mail start program = "/etc/init.d/sendmail start" stop program = "/etc/init.d/sendmail stop" if failed port 25 protocol smtp then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on sendmail_bin depends on sendmail_rc check file sendmail_bin with path /usr/lib/sendmail group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file sendmail_rc with path /etc/init.d/sendmail group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Qpopper (pop3 server) check process qpopper with pidfile /var/run/popper.pid group mail start program = "/etc/init.d/qpopper start" stop program = "/etc/init.d/qpopper stop" if 5 restarts within 5 cycles then timeout if failed port 110 type TCP protocol POP then restart alert foo@bar alert 123456@sms on { timeout } depends on qpopper_bin depends on qpopper_rc check file qpopper_bin with path /opt/sbin/popper group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file qpopper_rc with path /etc/init.d/qpopper group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Spamassassin daemon (spam scan daemon) check process spamd with pidfile /var/run/spamd.pid group mail start program = "/etc/init.d/spamd start" stop program = "/etc/init.d/spamd stop" if 5 restarts within 5 cycles then timeout if cpu usage > 99% for 5 cycles then alert if mem usage > 99% for 5 cycles then alert alert foo@bar alert 123456@sms on { timeout } depends on spamd_bin depends on spamd_rc check file smapd_bin with path /usr/local/bin/spamd group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file spamd_rc with path /etc/init.d/spamd group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Amavis-new (mail virus scanner) check process amavisd with pidfile /opt/virus/amavis-new/var/run/amavisd.pid group mail start program = "/etc/init.d/amavis-new start" stop program = "/etc/init.d/amavis-new stop" if failed port 10024 protocol smtp then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on amavisd_bin depends on amavisd_rc check file amavisd_bin with path /opt/virus/amavis-new/bin/amavisd group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file amavisd_rc with path /etc/init.d/amavis-new group mail if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Virus Scanner Sophie (virus scan daemon) check process sophie with pidfile /var/run/sophie.pid group virus start program = "/etc/init.d/sophie start" stop program = "/etc/init.d/sophie stop" if failed unixsocket /var/run/sophie then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on sophie_bin depends on sophie_rc check file sophie_bin with path /opt/virus/sophie/sophie group virus if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file sophie_rc with path /etc/init.d/sophie group virus if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Trophie (virus scan daemon) check process trophie with pidfile /var/run/trophie.pid group virus start program = "/etc/init.d/trophie start" stop program = "/etc/init.d/trophie stop" if failed unixsocket /var/run/trophie then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on trophie_bin depends on trophie_rc check file trophie_bin with path /opt/virus/trophie/trophie group virus if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file trophie_rc with path /etc/init.d/trophie group virus if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Clamav (virus scan daemon) check process clamavd with pidfile /var/run/clamd.pid group virus start program = "/etc/init.d/clamavd start" stop program = "/etc/init.d/clamavd stop" if failed unixsocket /var/run/clamd then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on clamavd_bin depends on clamavd_rc check file clamavd_bin with path /opt/virus/clamavd/clamavd group virus if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file clamavd_rc with path /etc/init.d/clamavd group virus if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Database Services MySQL Server The name of the pidfile consists usually of the fully quallified domainname and pidfile as extension. check process mysql with pidfile /opt/mysql/data/myserver.mydomain.pid group database start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop" if failed host 192.168.1.1 port 3306 protocol mysql then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on mysql_bin depends on mysql_rc check file mysql_bin with path /opt/mysql/bin/mysqld group database if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file mysql_rc with path /etc/init.d/mysql group database if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar OpenLDAP slapd (Debian package) check process slapd with pidfile /var/run/slapd.pid group database start program = "/etc/init.d/slapd start" stop program = "/etc/init.d/slapd stop" if failed host 192.168.1.1 port 389 protocol ldap3 then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on slapd_bin depends on slapd_rc check file slapd_bin with path /usr/sbin/slapd group database if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file slapd_rc with path /etc/init.d/slapd group database if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar File Services Samba (windows file/domain server) Hint: For enhanced controllability of the service it is handy to split up the samba init file into two pieces, one for smbd (the file service) and one for nmbd (the name service). check process smbd with pidfile /opt/samba2.2/var/locks/smbd.pid group samba start program = "/etc/init.d/smbd start" stop program = "/etc/init.d/smbd stop" if failed host 192.168.1.1 port 139 type TCP then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on smbd_bin check file smbd_bin with path /opt/samba2.2/sbin/smbd group samba if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check process nmbd with pidfile /opt/samba2.2/var/locks/nmbd.pid group samba start program = "/etc/init.d/nmbd start" stop program = "/etc/init.d/nmbd stop" if failed host 192.168.1.1 port 138 type UDP then restart if failed host 192.168.1.1 port 137 type UDP then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on nmbd_bin check file nmbd_bin with path /opt/samba2.2/sbin/nmbd group samba if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Printing Services LPRng (printer daemon) check process lprng with pidfile /var/run/lpd.515 group printer start program = "/etc/init.d/lprng start" stop program = "/etc/init.d/lprng stop" if failed host 192.168.1.1 port 515 type TCP then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } depends on lprng_bin depends on lprng_rc check file lprng_bin with path /opt/lprng/sbin/lpd group printer if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file lprng_rc with path /etc/init.d/lprng group printer if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Sun ONE Services iPlanetDirectoryServer slapd check process ldap-master with pidfile /usr/iplanet/ldapmaster/slapd-master-1/logs/pid start program "/usr/iplanet/ldapmaster/slapd-master-1/start-slapd" stop program "/usr/iplanet/ldapmaster/slapd-master-1/stop-slapd" if 5 restarts within 5 cycles then timeout if failed host 192.168.1.1 port 389 protocol ldap3 then restart alert foo@bar alert 123456@sms on { timeout } iPlanetMessagingServer MTA dispatcher check process mta-dispatcher with pidfile /usr/iplanet/msg-ims-1/config/pidfile.imta_dispatch start program "/usr/iplanet/msg-ims-1/imsimta start dispatcher" stop program "/usr/iplanet/msg-ims-1/imsimta stop dispatcher" group messaging if 5 restarts within 5 cycles then timeout if failed host 192.168.1.1 port 25 protocol smtp then restart alert foo@bar alert 123456@sms on { timeout } iPlanetMessagingServer MTA job controler check process mta-job_controller with pidfile /usr/iplanet/msg-ims-1/config/pidfile.imta_jbc start program "/usr/iplanet/msg-ims-1/imsimta start job_controller" stop program "/usr/iplanet/msg-ims-1/imsimta stop job_controller" group messaging if 5 restarts within 5 cycles then timeout if failed host 192.168.1.1 port 28442 then restart alert foo@bar alert 123456@sms on { timeout } iPlanetMessagingServer stored check process store with pidfile /usr/iplanet/msg-ims-1/config/pidfile.store start program "/usr/iplanet/msg-ims-1/start-msg store" stop program "/usr/iplanet/msg-ims-1/stop-msg store" if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timestamp timeout } group messaging check file stored.ckp with path /usr/iplanet/msg-ims-1/config/stored.ckp if timestamp > 10 minutes then alert group messaging check file stored.lcu with path /usr/iplanet/msg-ims-1/config/stored.lcu if timestamp > 15 minutes then alert group messaging check file stored.per with path /usr/iplanet/msg-ims-1/config/stored.per if timestamp > 70 minutes then alert group messaging iPlanetMessagingServer mshttpd check process webmail with pidfile /usr/iplanet/msg-ims-1/config/pidfile.http start program "/usr/iplanet/msg-ims-1/start-msg http" stop program "/usr/iplanet/msg-ims-1/stop-msg http" group messaging if 5 restarts within 5 cycles then timeout if failed host 192.168.1.1 port 80 protocol http then restart alert foo@bar alert 123456@sms on { timeout } iPlanetMessagingServer popd check process pop3 with pidfile /usr/iplanet/msg-ims-1/config/pidfile.pop start program "/usr/iplanet/msg-ims-1/start-msg pop" stop program "/usr/iplanet/msg-ims-1/stop-msg pop" group messaging if 5 restarts within 5 cycles then timeout if failed host 192.168.1.1 port 110 protocol pop then restart alert foo@bar alert 123456@sms on { timeout } iPlanetMessagingServer imapd check process imap4 with pidfile /usr/iplanet/msg-ims-1/config/pidfile.imap start program "/usr/iplanet/msg-ims-1/start-msg imap" stop program "/usr/iplanet/msg-ims-1/stop-msg imap" group messaging if 5 restarts within 5 cycles then timeout if failed host 192.168.1.1 port 143 protocol imap then restart alert foo@bar alert 123456@sms on { timeout } iPlanetMessagingServer madmand (SNMP subagent) check process snmp-subagent with pidfile /usr/iplanet/msg-ims-1/config/pidfile .snmp start program "/usr/iplanet/msg-ims-1/start-msg snmp" stop program "/usr/iplanet/msg-ims-1/stop-msg snmp" group messaging if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } iPlanetMessagingServer MMP (POP3/IMAP4/SMTP proxy) check process mmp with pidfile /usr/iplanet/mmp-ims2/pidfile start program "/usr/iplanet/mmp-ims2/AService.rc start" stop program "/usr/iplanet/mmp-ims2/AService.rc stop" group messaging if 5 restarts within 5 cycles then timeout if failed host 192.168.1.2 port 110 protocol pop then restart if failed host 192.168.1.2 port 143 protocol imap then restart alert foo@bar alert 123456@sms on { timeout } iPlanetCalendarServer csadmind check process calendar-admin with pidfile /usr/iplanet/SUNWics5/cal/bin/config /pidfile.admin start program "/usr/iplanet/SUNWics5/cal/bin/csstart service admin" stop program "/usr/iplanet/SUNWics5/cal/bin/csstop service admin" group calendar if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } iPlanetCalendarServer cshttpd check process calendar-http with pidfile /usr/iplanet/SUNWics5/cal/bin/config/ pidfile.http start program "/usr/iplanet/SUNWics5/cal/bin/csstart service http" stop program "/usr/iplanet/SUNWics5/cal/bin/csstop service http" group calendar if 5 restarts within 5 cycles then timeout if failed host 192.168.1.3 port 80 protocol http then restart alert foo@bar alert 123456@sms on { timeout } iPlanetCalendarServer csdwpd (database wire protocol) check process calendar-dwp with pidfile /usr/iplanet/SUNWics5/cal/bin/config/p idfile.dwp start program "/usr/iplanet/SUNWics5/cal/bin/csstart service dwp" stop program "/usr/iplanet/SUNWics5/cal/bin/csstop service dwp" group calendar if 5 restarts within 5 cycles then timeout if failed host 192.168.1.3 port 9779 protocol dwp then restart if cpu usage > 2% for 5 cycles then restart # There's a leak in csdwpd alert foo@bar alert 123456@sms on { timeout } iPlanetCalendarServer csnotifyd check process calendar-notify with pidfile /usr/iplanet/SUNWics5/cal/bin/confi g/pidfile.notify start program "/usr/iplanet/SUNWics5/cal/bin/csstart service notify" stop program "/usr/iplanet/SUNWics5/cal/bin/csstop service notify" group calendar if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } iPlanetCalendarServer enpd (event notification service broker) check process calendar-ens with pidfile /usr/iplanet/SUNWics5/cal/bin/config/p idfile.ens start program "/usr/iplanet/SUNWics5/cal/bin/csstart service ens" stop program "/usr/iplanet/SUNWics5/cal/bin/csstop service ens" group calendar if 5 restarts within 5 cycles then timeout if failed host 192.168.1.3 port 7997 then restart alert foo@bar alert 123456@sms on { timeout } Misc Services Apcupsd (APC ups daemon) check process apcupsd with pidfile /var/run/apcupsd.pid group ups start program = "/etc/init.d/apcupsd start" stop program = "/etc/init.d/apcupsd stop" if 5 restarts within 5 cycles then timeout if failed host 192.168.1.3 port 7000 type TCP then restart alert foo@bar alert 123456@sms on { timeout } depends on apcupsd_bin depends on apcupsd_rc check file apcupsd_bin with path /opt/apcupsd/sbin/apcupsd group ups if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file apcupsd_rc with path /etc/init.d/apcupsd group ups if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Webmin (remote admin service) check process webmin with pidfile /var/webmin/miniserv.pid group webmin start program = "/etc/init.d/webmin start" stop program = "/etc/init.d/webmin stop" if failed host 192.168.1.3 port 10000 then restart if 5 restarts within 5 cycles then timeout alert foo@bar alert 123456@sms on { timeout } check file webmin_rc with path /etc/init.d/webmin group webmin if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar STunnel (SSL tunnel) check process stunnel_pop3 with pidfile /opt/var/stunnel/stunnel.110.pid start program = "/etc/init.d/stunnel start_pop3" stop program = "/etc/init.d/stunnel stop_pop3" if failed host 192.168.1.1 port 143 type TCPSSL protocol POP then restart group stunnel alert foo@bar depends stunnel_init depends stunnel_bin check process stunnel_swat with pidfile /opt/var/stunnel/stunnel.901.pid start program = "/etc/init.d/stunnel start_swat" stop program = "/etc/init.d/stunnel stop_swat" if failed host 192.168.1.1 port 995 type TCPSSL then restart group stunnel alert foo@bar alert 123456@sms on { timeout } depends stunnel_bin depends stunnel_rc check file stunnel_bin with path /opt/sbin/stunnel group stunnel if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar check file stunnel_rc with path /etc/init.d/stunnel group stunnel if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor alert foo@bar Misc Usage Watch and analyze httpd crashdumps (Solaris) Setuid coredump allowed: coreadm -e proc-setid Monit set to watch the core timestamp change and send the backtrace: check file httpd_core with path /usr/apache/core if changed timestamp then exec "/bin/bash -c '/usr/bin/pstack /usr/apache/core | mailx -s http d_crash foo@bar'" Watch and analyze httpd crashdumps (Linux) Central coredump prepared: mkdir -p /var/crash/core chmod 1777 /var/crash/core sysctl -w kernel.core_pattern = /var/crash/core/core.%e.%t.%p sysctl -w kernel.core_setuid_ok = 0 sysctl -w kernel.core_uses_pid = 1 echo -e "bt\nquit" > /etc/gdb.batch echo "ulimit -c unlimited" >> /etc/sysconfig/httpd echo "CoreDumpDirectory /var/crash/core" > /etc/httpd/conf.d/core.conf Crontab based core aging: 10 1 * * * /usr/bin/find /var/crash/core/ -type f -mtime +1 -exec rm -f {} \; Monit set to watch the directory timestamp change and send last core backtrace: check directory httpd_core with path /var/crash/core if changed timestamp then exec "/bin/bash -c 'if [ `/bin/cat /tmp/monit_http d_core.tmp | head -1` != `/bin/ls /var/crash/core/core.httpd* | tail -1` ]; the n /usr/bin/gdb -x /etc/gdb.batch /usr/sbin/httpd `/bin/ls /var/crash/core/core. httpd* | tail -1 | tee /tmp/monit_httpd_core.tmp` | mail -s httpd_crash admin@f oo.bar webmaster@foo.bar; fi'"