Uptime Checks is a service of Cloud Monitoring. You configure the service to check your system's health by sending requests to your applications, services, or URLs from various locations around the world. You can use the results of the checks as conditions in your alert policies, so you will be notified if system health is degraded.
An Alert Policy is a set of rules that determine whether your resources or groups are operating normally. The rules are logical conditions involving metric thresholds and uptime checks. For example, you can create a rule that your web site's average response latency must not exceed five seconds over a period of two minutes.
An alert occurs when an alert policy's conditions are met, causing an Incident to appear in the Incidents section of the Cloud Monitoring Console. Incidents remain open until the alert policy rules are no longer in violation or until the incident is manually closed.
You can associate notifications with alert policies. For example, alerts can send email or SMS notifications to people or services.
In this codelab, you'll learn how to create an Uptime check on a Compute Engine instance, attach an alerting policy to it, so that an incident from that policy will be created to notify you when the machine goes down.
The instructor will be sharing with you temporary accounts with existing projects that are already setup so you do not need to worry about enabling billing or any cost associated with running this codelab. Note that all these accounts will be disabled soon after the codelab is over.
Once you have received a temporary username / password to login from the instructor, log into Google Cloud Console: https://console.cloud.google.com/.
Here's what you should see once logged in :
Note the project ID you were assigned ( "
codelab-test003" in the screenshot above). It will be referred to later in this codelab as
Before we can enable monitoring, we will need some kind of infrastructure within this Google Cloud Platform project to actually monitor, so let us create that now.
We will create a Compute Engine instance with NGINX through the GCP Marketplace, so that we have a URL we can hit with a HTTP request to see if our resource is up and running.
Note: The first time you access Compute Engine, it will need to be enabled. This can take a minute or two, so please be patient.
To create the virtual machine:
We now have a resource that we can monitor!
Before we can use Stackdriver Monitoring, it must first be enabled for your project.
To use Stackdriver Monitoring with one of your projects, do the following:
You are now looking at the Stackdriver Monitoring Console. The information shown will vary depending on the Google (and AWS) services you are using and the monitoring features you have set up.
Now that monitoring is enabled, we want to create an Uptime Check. An uptime check is a process to make sure that a given resource is up and running all the time. There are a variety of ways that uptime checks can be made, including: HTTP, HTTPS, UDP and TCP.
For the purposes of this Code Lab, we will create a HTTP uptime check, to monitor our recently created NGINX web server.
To create the Uptime Check, on the left bar, click the Uptime Check > Uptime Checks Overview. Then click Add Uptime Check button on the top right.
From there, select the following options:
Click Test to make sure that your Uptime Check works correctly. You should get back a message with "Responded with 200 (OK) in ...".
Click save to save your Uptime Check.
Click No Thanks on the Alerting Policy question - we will do this in the next section.
Congratulations, You have now successfully created a Uptime Check!
Creating an Uptime Check is only half the battle. You will need something to notify you when a Uptime Check fails. This is where an Alerting Policy comes into effect.
There are multiple ways to create an Alerting Policy (as we saw earlier), but to create an Alerting Policy directly from your Uptime Check:
Under Target, select the following:
Under Configuration, select the following:
Now we need to configure how we want to be notified. There are lots of options, including PagerDuty integration, SMS, Slack, Hipchat, etc, but the easiest option for now is Email, so let's configure that:
Under Notification, select Email from the drop down, and enter an email address you would be happy to receive a notification.
It is often useful to include documentation with your alerts, outlining what the alert is for, and possible fixes or troubleshooting steps. For this code lab we will not add any documentation, but it is something you should consider for production systems.
This gives a convenient name to the Alerting Policy, so it can be recognisable when it creates an Incident.
You now have a Compute Engine instance that has it's uptime state monitored by a Uptime Check and a Alerting Policy