Listerine is a simple functional monitoring framework that enables you to quickly create customizable functional monitors with full alerting capability.
Listerine is not a process monitoring framework like God but instead intended for functional test application monitoring.
This project was originally created as a self-service for Appboy as a replacement for more expensive services. Read the blog post introducing it.
Listerine enables you to monitor all levels of your web applications and services. Some common examples include:
- Ensure that your caching layer is functioning properly
- Make sure that you have X available Resque workers
- Make sure that your Resque Scheduler agent is online
- Check that your hosted database is online (e.g., if you use MongoHQ or RedisToGo)
- POST to your API and verify return values
gem install listerine
Listerine allows you to define simple script monitors that contain an assertion. When the assertion is true, the monitor has succeeded. When the assertion evaluates to false the monitor is marked as failed, sends a notification, and can run optional code on failure.
All monitors must contain both name
and assert
blocks. Unhandled exceptions are caught and treated as
failures, with the exception text and backtrace included in the notification. Here's an example:
require 'mysql2'
# Global configuration settings
Listerine::Monitor.configure do
# Configure the email address from which the alerts will be sent
from "[email protected]"
# Notify [email protected] on all failures
notify "[email protected]"
end
# Define the monitor
Listerine::Monitor.new do
name "Database online"
description "This monitor ensures that the database is online."
assert do
# Connect to MySQL. If the host is down, this will raise a Mysql2::Error exception, which will automatically
# be caught by the Listerine::Runner handler, treated as a failure, and the exception text and backtrace will
# be sent in the notification.
Mysql2::Client.new(:host => "host", :username => "test_user")
true
end
end
# Run all the monitors -- declare all monitors before this line
Listerine::Runner.instance.run
If you were to run this file, the output would look like:
* Database online PASS
To run this file regularly, schedule a cron job such as the below to run monitors every 2 minutes.
*/2 * * * * /path/to/monitor/file.rb
The same monitor can be run in multiple environments. To do so, specify a list of environments
when defining the
monitor. In your assert
block, you will have access to a method current_environment
which indicates the environment
in which the monitor is running.
When no environment is specified, the monitor is run in the :default
environment.
require 'resque'
Listerine::Monitor.new do
name "Resque workers"
description "This monitor makes sure that there is at least 1 Resque worker."
environments :staging, :production
assert do
if current_environment == :staging
url = "redis://staging:[email protected]:6379"
else
url = "redis://production:[email protected]:6379"
end
redis_info = URI.parse(url)
Resque.redis = Redis.new(:host => redis_info.host, :port => redis_info.port, :password => redis_info.password)
Resque.workers.length > 0
end
end
# Run all the monitors
Listerine::Runner.instance.run
When using multiple environments, you can also define different recipients for the failure notification. Set a
criticality level on the monitor using is
, and when defining recipients in the configure
block, indicate which
recipients are for which criticality level. Criticality levels are arbitrary symbols that you define. In this
example we'll use :critical
but it could be whatever you want.
Listerine::Monitor.configure do
from "[email protected]"
# This is the default recipient
notify "[email protected]"
# When an alert fails that is of criticality level critical, notify [email protected]
notify "[email protected]", :when => :critical
end
Listerine::Monitor.new do
name "Resque workers"
description "This monitor makes sure that there is at least 1 Resque worker."
environments :staging, :production
# This monitor is critical when running in the production environment
is :critical, :in => :production
# The criticality levels can be anything you want. It defaults to :default to notify the default recipient.
is :foobar, :in => :staging
assert do
# Setup a different connection based on the current environment
if current_environment == :staging
url = "redis://staging:[email protected]:6379"
else
url = "redis://production:[email protected]:6379"
end
redis_info = URI.parse(url)
Resque.redis = Redis.new(:host => redis_info.host, :port => redis_info.port, :password => redis_info.password)
Resque.workers.length > 0
end
end
# Run all the monitors
Listerine::Runner.instance.run
If a recipient is not declared for a criticality level, Listerine will use the default recipient.
Criticality levels can be set globally in the configure
block so you can make all production monitors :critical
,
etc.
Note: You don't need to use multiple environments to set criticality levels. These are perfectly valid monitors:
Listerine::Monitor.configure do
notify "[email protected]"
notify "[email protected]", :when => :critical
end
# Emails [email protected] when it fails
Listerine::Monitor.new do
name "Site online"
is :critical
assert do
# Some code to check that the site is online
end
end
# Emails [email protected] when it fails
Listerine::Monitor.new do
name "Internal wiki online"
assert do
...
end
end
You might not want to get notified the first time a monitor fails. When defining a monitor, you can define variables
to notify_after
some number of consecutive failures, and after you've received a notification, to
then_notify_every
x failures after that.
These options can be defined locally on a monitor, or globally set in the configure
block. By default, both values
are set to 1.
Listerine::Monitor.new do
name "Cache online"
description "This monitor connects to the cache and tries to set and then get a key."
# Don't notify until there have been 2 consecutive failures
notify_after 2
# After 2 failures, only send a new notification every 3 failures
then_notify_every 3
assert do
# Connect to cache
cache = ...
# Set key to value, then ensure that you can pull it from the cache
cache.set("key", "value")
cache.get("key") == "value"
end
end
When a monitor fails, you might want to take custom action. For example, you might want to reboot a machine after 5
consecutive failures. To do that, you can pass a block to if_failing
, which will be yielded to with the current
consecutive failure count.
Listerine also provides a wrapper around sending mail, Listerine::Mailer.mail(to, subject, body)
if you want to add
custom notifications.
Listerine::Monitor.new do
name "Cache online"
description "This monitor connects to the cache and tries to set and then get a key."
assert do
...
end
# Reboot the cache instance if there are 5 consecutive failures
if_failing do |failure_count|
if failure_count == 5
Listerine::Mailer.mail("[email protected]", "Rebooting the cache", "Cache failed 5 times, rebooting")
system("ec2-reboot-instances i-1234567")
end
end
end
The following are the global options you can set in the configure
block.
from
- The email address from which your notifications are sentnotify
with an optional:in => :environment
- The recipients of notifications
You can set the follow options globally in the configure
block to avoid having to redefine on each monitor:
is
- defaults to:default
notify_after
- defaults to1
then_notify_every
- defaults to1
Because a common use case we have is to check if a website is online, there is a simple helper assert_online
which
creates an assert
block to ensure that a website is returning a 200 status code.
Listerine::Monitor.new do
name "Site online"
is :critical
description "Makes sure that the site is online."
assert_online "http://www.example.com"
end
We use CloudFlare, and CloudFlare is somewhat flaky, so you can pass in :ignore_502 => true
to ignore 502 errors.
The name
field must be unique across all defined monitors.
Right now the persistence for these monitors is stored in a Sqlite3 database which is stored by default at
ENV["HOME"]/listerine-default.db. You can customize the path to this database in the configure
block:
Listerine::Monitor.configure do
persistence :sqlite, :path => "/data/monitors/functional.db"
end
Other persistence backends can be created by adhering to the interface Listerine::Persistence::PersistenceLayer
. Then
pass your new persistence layer in via the :persistence
option. Listerine will automatically find a class that is
the #capitalize
d version of the option, so :persistence => :mysql
will new up Listerine::Persistence::Mysql.
Listerine comes with a Sinatra-based front end, Listerine::Server
, for checking the latest status of your monitors and enabling/disabling
them on a per-environment basis.
Check examples/config.ru
for a functional example (including HTTP basic auth).
See Phusion's guide:
Nginx: http://www.modrails.com/documentation/Users%20guide%20Nginx.html#deploying_a_rack_app
If you want to load Listerine on a subpath, possibly alongside other apps, it's easy to do with Rack's URLMap
.
I don't really recommend this, since using a Sqlite backend, it means that your application server must also run the monitors, but you could conceptually do this if you want.
require 'listerine/server'
run Rack::URLMap.new \
"/" => Your::App.new,
"/monitors" => Listerine::Server.new
Check examples/config.ru
for a functional example (including HTTP basic auth).
You can also mount Listerine on a subpath in your existing Rails 3 app by adding require listerine/server
to the
top of your routes file or in an initializer then adding this to routes.rb
.
mount Listerine::Server.new, :at => "/monitors"
The Listerine server will pick up any monitors that have any run history, meaning if you delete a monitor it will still show up. For now, simply delete your Sqlite database and it will be recreated the next time the monitors run.
- Fork listerine
- Create a topic branch -
git checkout -b my_branch
- Push to your branch -
git push origin my_branch
- Create a Pull Request from your branch
- That's it!