Integrity monitoring collects measurements from Shielded VM instances and surfaces them in Stackdriver Logging. If integrity measurements change across boots of a Shielded VM instance, integrity validation fails. This failure is captured as a logged event, and is also raised in Stackdriver Monitoring.
Sometimes, Shielded VM integrity measurements change for a legitimate reason. For example, a system update might cause expected changes to the operating system kernel. Because of this, integrity monitoring lets you prompt a Shielded VM instance to learn a new integrity policy baseline in the case of an expected integrity validation failure.
You can create a simple automated system that shuts down Shielded VM instances that fail integrity validation. You can expand the system so that it prompts Shielded VM instances that fail integrity validation to learn the new baseline if it matches a known good measurement, or to shut down otherwise.
You make this selection when you create the project, and it can't be changed. If your project doesn't use Cloud Firestore in Native mode, you will see the message "This project uses another database service" when you open the Cloud Firestore console.
The Shielded VM instance must have been restarted at least once.
gcloud
command-line tool installedUse Logging to export all integrity monitoring log entries generated by Shielded VM instances to a Cloud Pub/Sub topic. You use this topic as a data source for a Cloud Functions trigger to automate responses to integrity monitoring events.
resource.type="gce_instance" AND logName: "projects/YOUR_PROJECT_ID/logs/compute.googleapis.com%2Fshielded_vm_integrity"
replacing YOUR_PROJECT_ID with the id of your project. Note that there are two spaces after "logName:
".
Create a Cloud Functions trigger that reads the data in the Cloud Pub/Sub topic and that stops any Shielded VM instance that fails integrity validation.
main.py
.import base64
import json
import googleapiclient.discovery
def shutdown_vm(data, context):
"""A Cloud Function that shuts down a VM on failed integrity check."""
log_entry = json.loads(base64.b64decode(data['data']).decode('utf-8'))
payload = log_entry.get('jsonPayload', {})
entry_type = payload.get('@type')
if entry_type != 'type.googleapis.com/cloud_integrity.IntegrityEvent':
raise TypeError("Unexpected log entry type: %s" % entry_type)
report_event = (payload.get('earlyBootReportEvent')
or payload.get('lateBootReportEvent'))
if report_event is None:
# We received a different event type, ignore.
return
policy_passed = report_event['policyEvaluationPassed']
if not policy_passed:
print('Integrity evaluation failed: %s' % report_event)
print('Shutting down the VM')
instance_id = log_entry['resource']['labels']['instance_id']
project_id = log_entry['resource']['labels']['project_id']
zone = log_entry['resource']['labels']['zone']
# Shut down the instance.
compute = googleapiclient.discovery.build(
'compute', 'v1', cache_discovery=False)
# Get the instance name from instance id.
list_result = compute.instances().list(
project=project_id,
zone=zone,
filter='id eq %s' % instance_id).execute()
if len(list_result['items']) != 1:
raise KeyError('unexpected number of items: %d'
% len(list_result['items']))
instance_name = list_result['items'][0]['name']
result = compute.instances().stop(project=project_id,
zone=zone,
instance=instance_name).execute()
print('Instance %s in project %s has been scheduled for shut down.'
% (instance_name, project_id))
main.py
, create a file named requirements.txt
and copy in the following dependencies:google-api-python-client==1.6.6
google-auth==1.4.1
google-auth-httplib2==0.0.3
main.py
and requirements.txt
.gcloud beta functions deploy
command to deploy the trigger:gcloud beta functions deploy shutdown_vm --project YOUR_PROJECT_ID \ --runtime python37 --trigger-resource integrity-monitoring \ --trigger-event google.pubsub.topic.publish
replacing YOUR_PROJECT_ID
with the id of your project.
Create a Cloud Firestore database to provide a source of known good integrity policy baseline measurements. You must manually add baseline measurements to keep this database up to date.
lateBootReportEvent
log entry.jsonPayload
> lateBootReportEvent
> policyMeasurements
.lateBootReportEvent
> policyMeasurements
.0
in lateBootReportEvent
> policyMeasurements
.0
fields in lateBootReportEvent
> policyMeasurements
.lateBootReportEvent
> policyMeasurements
. Give them the same subfields as the first map field. The values for those subfields should map to those in each of the additional elements.If you are using a Windows VM, you will see more measurements thus the collection should look similar to the following:
main.py
.import base64
import json
import googleapiclient.discovery
import firebase_admin
from firebase_admin import credentials
from firebase_admin import firestore
PROJECT_ID = 'YOUR_PROJECT_ID'
firebase_admin.initialize_app(credentials.ApplicationDefault(), {
'projectId': PROJECT_ID,
})
def pcr_values_to_dict(pcr_values):
"""Converts a list of PCR values to a dict, keyed by PCR num"""
result = {}
for value in pcr_values:
result[value['pcrNum']] = value
return result
def instance_id_to_instance_name(compute, zone, project_id, instance_id):
list_result = compute.instances().list(
project=project_id,
zone=zone,
filter='id eq %s' % instance_id).execute()
if len(list_result['items']) != 1:
raise KeyError('unexpected number of items: %d'
% len(list_result['items']))
return list_result['items'][0]['name']
def relearn_if_known_good(data, context):
"""A Cloud Function that shuts down a VM on failed integrity check.
"""
log_entry = json.loads(base64.b64decode(data['data']).decode('utf-8'))
payload = log_entry.get('jsonPayload', {})
entry_type = payload.get('@type')
if entry_type != 'type.googleapis.com/cloud_integrity.IntegrityEvent':
raise TypeError("Unexpected log entry type: %s" % entry_type)
# We only send relearn signal upon receiving late boot report event: if
# early boot measurements are in a known good database, but late boot
# measurements aren't, and we send relearn signal upon receiving early boot
# report event, the VM will also relearn late boot policy baseline, which we
# don't want, because they aren't known good.
report_event = payload.get('lateBootReportEvent')
if report_event is None:
return
evaluation_passed = report_event['policyEvaluationPassed']
if evaluation_passed:
# Policy evaluation passed, nothing to do.
return
# See if the new measurement is known good, and if it is, relearn.
measurements = pcr_values_to_dict(report_event['actualMeasurements'])
db = firestore.Client()
kg_ref = db.collection('known_good_measurements')
# Check current measurements against known good database.
relearn = False
for kg in kg_ref.get():
kg_map = kg.to_dict()
# Check PCR values for lateBootReportEvent measurements against the known good
# measurements stored in the Firestore table
if ('PCR_0' in kg_map and kg_map['PCR_0'] == measurements['PCR_0'] and
'PCR_4' in kg_map and kg_map['PCR_4'] == measurements['PCR_4'] and
'PCR_7' in kg_map and kg_map['PCR_7'] == measurements['PCR_7']):
# Linux VM (3 measurements), only need to check above 3 measurements
if len(kg_map) == 3:
relearn = True
# Windows VM (6 measurements), need to check 3 additional measurements
elif len(kg_map) == 6:
if ('PCR_11' in kg_map and kg_map['PCR_11'] == measurements['PCR_11'] and
'PCR_13' in kg_map and kg_map['PCR_13'] == measurements['PCR_13'] and
'PCR_14' in kg_map and kg_map['PCR_14'] == measurements['PCR_14']):
relearn = True
compute = googleapiclient.discovery.build('compute', 'beta',
cache_discovery=False)
instance_id = log_entry['resource']['labels']['instance_id']
project_id = log_entry['resource']['labels']['project_id']
zone = log_entry['resource']['labels']['zone']
instance_name = instance_id_to_instance_name(compute, zone,
project_id, instance_id)
if not relearn:
# Issue shutdown API call.
print('New measurement is not known good. Shutting down a VM.')
result = compute.instances().stop(project=project_id,
zone=zone,
instance=instance_name).execute()
print('Instance %s in project %s has been scheduled for shut down.'
% (instance_name, project_id))
else:
# Issue relearn API call.
print('New measurement is known good. Relearning...')
result = compute.instances().setShieldedInstanceIntegrityPolicy(
project=project_id,
zone=zone,
instance=instance_name,
body={'updateAutoLearnPolicy':True}).execute()
print('Instance %s in project %s has been scheduled for relearning.'
% (instance_name, project_id))
requirements.txt
:google-api-python-client==1.6.6
google-auth==1.4.1
google-auth-httplib2==0.0.3
google-cloud-firestore==0.29.0
firebase-admin==2.13.0
main.py
and requirements.txt
.gcloud beta functions deploy
command to deploy the trigger:gcloud beta functions deploy relearn_if_known_good --project YOUR_PROJECT_ID \ --runtime python37 --trigger-resource integrity-monitoring \ --trigger-event google.pubsub.topic.publish
replacing YOUR_PROJECT_ID
with the id of your project.
shutdown_vm
function in the cloud function console.shutdown_vm
function and click delete.uname -sr
You should see something like Linux 4.15.0-1028-gcp
.
sudo dpkg -i *.deb
lateBootReportEvent
to the known good measurement Firebase table. (Remember there are two things being changed: 1. Secure Boot option 2. Kernel Image.)lateBootReportEvent
.lateBootReportEvent
still showing false, but the machine should now boot successfully, because the cloud function trusted and relearned the new measurement. We can verify it by checking the Stackdriver of the cloud function.uname -sr