Some of the most important work we do is keeping our clients’ sites running smoothly, preventing them from crashing and spotting bugs. Website monitoring helps make these invisible problems visible.
Whilst preventing sites from crashing might seem obvious, there are hundreds of other potential issues that are impossible to spot but that can also have a heavy business impact when they go wrong.
Website monitoring not only makes these problems visible, but it also helps you understand how key events such as marketing campaigns or major news events, are affecting your organisation and your site.
In this article, I’ll talk through how monitoring works and the tools we recommend for covering all the bases.
Different types of monitoring
There are three main types of monitoring: Application monitoring, Performance monitoring (APM) and Real user monitoring (RUM).
Application monitoring
Application monitoring measures fundamental metrics. It doesn’t attempt to diagnose any issues; it simply reports on failures, so you’ll be alerted if anything goes down.
Performance monitoring
Performance monitoring shows you how different systems and applications are performing. Are they running out of memory? Are they being flooded with requests or under attack? As these alerts are more granular, they enable quick issue diagnoses.
Real user monitoring (RUM)
RUM is a really useful form of monitoring because it shows what the users are actually seeing from when they hit the page to when the last byte is being served up.
So, in some instances, your site might be hit with heavy traffic but still be working well enough not to trigger an alert. Still, slow load times will be giving your user a terrible experience.
It helps you to identify:
- Preferred user environments: device, operating system and version, browser and version, and location
- Latency issues
- User concentrations
- Performance by location
- Load time as well as network, frontend, and backend durations
All of which help you optimise your performance and give you the insight you can use to improve the online experience.
We use a tool called 24x7 for monitoring and PagerDuty for 24/7 support notifications.
The best tools for website monitoring
Every site is different and you need to work with the best monitoring tools for your environment.
That said, these are the tools we use every day for almost every client we work with:
- Prometheus
- Nagios
- Sentry
- Cloud-based monitoring solutions such as Cloudwatch
Prometheus
Prometheus is well known in the monitoring world and for good reason. It's not just great at what does, it’s free and open-source.
The caveat is that you do need to hire a developer that understands the infrastructure of your systems to work with it properly.
Its sole purpose is to aggregate data across a wide set of servers, systems and applications.
However, that’s all it does.
To make it useful for senior management and stakeholders, we use Grafana as a reporting tool to pull out the data and present it in easy-to-understand dashboards.
It helps us to visualise data in graphs showing trends, such as high application usage between certain hours in the day, or corresponding to key events you want to monitor.
Nagios
Nagios is another tool we use to take care of fundamental alerts.
This runs simple checks and lets us know if a system is up or down.
What we love about Nagios is that it allows us to write plugins to ascertain if hard-to-access (known as obfuscated) aspects of a system are functioning correctly.
While both Prometheus and Nagios are very similar, we’ve found a place for both technologies across the different sites we manage.
Sentry
Sentry is another vital tool we use daily across multiple client sites to give us an extra level of granularity.
It logs and aggregates errors, allowing us to go in, diagnose and then fix specific issues.
This is the one tool we deploy across every environment.
Cloud based monitoring solutions
Cloud-based monitoring solutions such as Amazon’s Cloudwatch, Google’s Operations and Microsoft Azure Monitor offer managed services to monitor infrastructure out of the box.
These are all great tools, particularly if you’re already using a particular cloud-based platform. The drawback is that they tend to be pricey, whereas Prometheus is free.
No one tool to rule them all
There's no right or wrong answer as to what tool you use. It's all about whatever fits your solution best.
Sentry is a must-have for us. Then we choose between Nagios, Prometheus or a cloud-based monitoring solution, depending on the clients’ platform and preference. Or sometimes we use all of these tools for end-to-end monitoring.
Sleeping better at night
Bugs and errors are going to happen, such as the time Amazon Web Services ‘broke the internet’ thanks to a typo.
Monitoring can prevent the worst of these; minimising the risks and helping you spot potential threats as they arise. Website performance is too important to compromise.
Knowing you have 24/7 applications on the job, checking thousands of tasks, means everyone can sleep at night.