This Codelab covers streaming analytics on events coming from an app in Firebase, with the use of several services such as Cloud Firestore, Cloud Functions, Cloud Pub/Sub, Cloud Dataflow and BigQuery.

What you'll learn:

What you'll need:

Codelab-at-a-conference setup

If you see a "request account button" at the top of the main Codelabs window, click it to obtain a temporary account. Otherwise ask one of the staff for a coupon with username/password.

These temporary accounts have existing projects that are set up with billing so that there are no costs associated for you with running this codelab.

Note that all these accounts will be disabled soon after the codelab is over.

Use these credentials to log into the machine or to open a new Google Cloud Console window Accept the new account Terms of Service and any updates to Terms of Service.

Here's what you should see once logged in:

When presented with this console landing page, please select the only project available. Alternatively, from the console home page, click on "Select a Project" :

IAM Permission Needed - Owner

This lab requires the provisioning and enablement of multiple Google Cloud Platform and Firebase services. You will need Owner permission on the project being used for this lab.

To confirm you have the Owner permission on the project, under the Google Cloud Platform console (, navigate to "IAM & admin", "IAM", and confirm that "Owner" role is next to your email address (you may see other service accounts in your project):

Cloud Shell

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.

Run the following command in the cloud shell to confirm that you are authenticated:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

In your Cloud Shell, execute the following to enable the Dataflow API (if not enabled already):

gcloud services enable

By enabling Dataflow API, it should also enable a couple of other APIs. You can check which ones have been enabled by running the following:

gcloud services list --format 'value(' | sort

This command should display a list of APIs enabled in your project, with key ones like, and that we are using in this lab.

Google Cloud Storage allows object storage and retrieval in various regions. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.

Buckets are the basic containers of Google Cloud Storage that hold your data. Everything that you store in Cloud Storage must be contained in a bucket. In this lab, the bucket will be used in various ways:

Create the GCS Bucket using gsutil under Cloud Shell

Choose a region for your bucket. This region will also be used for Dataflow later.

For this lab, let's use us-central1. Copy and paste this onto Cloud Shell:


Choose a name for your bucket (without the brackets or gs:// prefix). You are welcome to choose any names for the lab, although the bucket name must be unique across all of Cloud Storage. So if you choose an obvious name, such as "test", you will probably find that someone else has already created a bucket with that name, and will receive an error.

For the purpose of this lab, we will use the following, which is the project ID, with "-mdp-lab-gcs" as the suffix.


There are also some rules regarding what characters are allowed in bucket names. If you start and end your bucket name with a letter or number, and only use dashes in the middle, then you'll be fine. If you try to use special characters, or try to start or end your bucket name with something other than a letter or number, the dialog box will remind you of the rules.

Run this command to create it.

gsutil mb -c regional -l ${REGION} gs://${GCS_BUCKET}

If you encounter an error saying "ServiceException: 409 Bucket test already exists", you will need to go back to the GCS_BUCKET command, pick another name and try the "gsutil mb" command again.

You can confirm the bucket creation in GCP console by going to Storage -> Browser, and it should display the newly created bucket:

Firebase lets you build more powerful, secure and scalable apps. We are using several Firebase components in this lab:

Create a Firebase project

Navigate to the Firebase console. click on Add Project.

Select the project name used in the previous section. Accept the terms and choose "Continue". In this lab, you are NOT required to enable "Use the default settings for sharing Google Analytics for Firebase data".

You can leave the next set of settings unchecked, and click "Add Firebase".

Confirm the Firebase billing plan, which is "Pay as you go". The charges will be made against the credits in your GCP account.

Enable Google Auth

To allow users to sign-in on the web app we'll use Google auth which needs to be enabled.

In the Firebase Console open the DEVELOP section > Authentication > SIGN IN METHOD tab (click here to go to the page directly) you need to enable the Google Sign-in Provider and click SAVE. This will allow users to sign-in the Web app with their Google accounts

Retrieve the code

Under Cloud Shell, retrieve the code containing the sample firebase code:

cd ~
gsutil cp gs://mdp-next18-lab/quizgame.tar.gz .
tar zxvf quizgame.tar.gz

Enter the newly cloned 📁 quizgame directory. This directory contains the code for the fully functional Firebase Web App.

cd quizgame

Configure the Firebase Command Line Interface

Cloud Shell should come with the firebase Command Line Interface already installed. Make sure you are in the ~/quizgame directory, then set up the Firebase CLI to use your Firebase Project:

firebase use --add

Then select your Project ID and follow the instructions. When prompted, you can choose the alias for this project, which you can input a value like "staging".

Substitute configuration variables

You will need to replace some configuration variables in the source code. We will use command line to retrieve and replace the variables.

Run this command to examine the variables. These are variables that are to be placed on the client side app to help with conectivity to the Firebase services.

firebase setup:web

We will use some quick sed commands to reformat it to the format for the app's code.

FIREBASE_APIKEY=$(firebase setup:web | grep '^  "apiKey' | cut -d'"' -f4)
FIREBASE_DATABASEURL=$(firebase setup:web | grep '^  "databaseURL' | cut -d'"' -f4)
FIREBASE_STORAGEBUCKET=$(firebase setup:web | grep '^  "storageBucket' | cut -d'"' -f4)
FIREBASE_AUTHDOMAIN=$(firebase setup:web | grep '^  "authDomain' | cut -d'"' -f4)
FIREBASE_MESSAGINGSENDERID=$(firebase setup:web | grep '^  "messagingSenderId' | cut -d'"' -f4)
FIREBASE_PROJECTID=$(firebase setup:web | grep '^  "projectId' | cut -d'"' -f4)

The variables are to be substituted into the file src/main.js.

sed -i "s~^  apiKey:.*$~  apiKey: '${FIREBASE_APIKEY}',~g" src/main.js
sed -i "s~^  databaseURL:.*$~  databaseURL: '${FIREBASE_DATABASEURL}',~g" src/main.js
sed -i "s~^  storageBucket:.*$~  storageBucket: '${FIREBASE_STORAGEBUCKET}',~g" src/main.js
sed -i "s~^  authDomain:.*$~  authDomain: '${FIREBASE_AUTHDOMAIN}',~g" src/main.js
sed -i "s~^  messagingSenderId:.*$~  messagingSenderId: '${FIREBASE_MESSAGINGSENDERID}',~g" src/main.js
sed -i "s~^  projectId:.*$~  projectId: '${FIREBASE_PROJECTID}'~g" src/main.js
echo "Completed substitution"

Confirm the successful substition by looking at src/main.js again.

sed -n /^firebase.initializeApp/,/}\)/p src/main.js

You should see something like this, with values specific to your project:

We will run a second pipeline to write the same data from Pub/Sub to a GCS (Google Cloud Storage) bucket in Avro format.

Dataflow has the ability to write to multiple destinations within a single pipeline. But in this lab, we are creating two separate pipelines with separate subscriptions to the same topic, so that the pulling of the messages are independent of each other. This is a useful pattern if the subscribers could be consuming the messages at different pace (e.g. downtime in a particular downstream subscriber application) and do not want to slow down the message consumption of each other.

Execute the Template from the Console

Again, go to GCP Console. On the left navigation bar, choose "Pub/Sub" -> "Topics", then click into the Topic name ending with "triviagameevents". You should have a screen like the following. Click "Export To":

Choose "Cloud Storage Avro file" at the popup.

Click "Continue" at the prompt.

You should be brought to the "Create job from template" screen. Certain variables should have been pre-populated:

You will need to enter a few parameters:

Click the "Run Job" button. It should display a screen like this:

Examine the Cloud Pub/Sub subscription

In the GCP console, you can go to Pub/Sub -> Subscriptions, and verify another subscription has been automatically created by the Dataflow pipeline.

Play the game!

Back to your Firebase Web App UI. It should be in the form of https://<hosting-id>

Click the "Sign in with Google", and you should be redirected to choose a Google account for signing in.

After signing in, you should be able to the trivia questions! Play a few questions to generate data to be processed by the data pipelines. (The first question may take slightly longer to be processed


Open the BigQuery UI in the Google Cloud Platform Console. You may get an authentication prompt if this is your first time here.

On the left rail, you should be able to see your project name with the "raw" dataset underneath. Expand it and you should see the "events" table.

Click "events", which should bring up the schema on the right hand side. Click "Query Table".

You should see a query editor.

Click "Compose Query" and copy-and-paste the following query into the New Query box.

SELECT substr(userId,-6,6) user,count(isAnswerCorrect) as score 
where isAnswerCorrect=true
group by user
order by score desc

You should be able to see the user (as appeared in the top right-hand corner of the Firebase Web App) with the count of correct answers next to it. Feel free to open another browser session and sign in with a different Google account to generate more data.

Go back and answer more questions on the app, and observe how the data gets updated in BigQuery in near real-time.

Avro on GCS

Navigate to the GCS Storage Browser. Click on the bucket created earlier (ending with -mdp-lab-gcs), you should see an "avro" sub-directory, with the Avro files created underneath. This is a useful pattern of archiving the messages as separate copies for potential re-processing in the future.

You can clean up your resources from the Google Cloud Platform Console.

Stopping Dataflow pipelines

On the Dataflow screen, click into each of the pipelines, then "Stop job" on the console.

You will be prompted for "Cancel" or "Drain". In this case you can choose "Cancel". (To learn about the differences, click on the "Read more about stopping Dataflow jobs" for details.)

Perform these steps for each of the Dataflow pipelines you have started in the lab. The jobs should say "Canceled" in the Status column.

Delete the BigQuery Dataset

Under the BigQuery UI, select the dropdown on the "raw" dataset, and choose Delete dataset.

Removing the Cloud Storage Bucket if needed

If you have newly created the Cloud Storage bucket and no longer need it, you can navigate to the Cloud Storage browser screen, and remove the newly created bucket.

Delete the Cloud Pub/Sub Subscription

Under Pub/Sub -> Topics, check the box and click "Delete" to delete the subscription.

Delete the Cloud Functions

Under Cloud Functions, highlight both the "answerSubmit" and "publishMessageToTopic" functions, and delete them.

Delete the Firebase Hosting Resources

Under Cloud Shell, run the following command to stop the Firebase hosting:

cd ~/quizgame && firebase hosting:disable

Then, the Firebase console, select the project used in this lab, and then choose "Hosting". There should be a Disabled status and a Deployed status. Hover to the right edge of the "Deployed" row, and Delete the resources.

Delete the Firebase Cloud Firestore Data

In the Firebase console, under the Database tab, you should be able to see the data for the questions and users. To delete the data, highlight "questions" and then choose "Delete all documents" in the 2nd column. Repeat this for "users".

Revert the Firebase Authentication Settings

In the Firebase console, under the Authentication tab, you can delete the users by highlighting each row and choose "Delete Account".

Under the "Sign-in method" tab, you can revert the Google's Sign-in-provider status to "Disabled".

You learned how to build modern data pipelines on Google Cloud Platform, with the use of several services such as Cloud Firestore, Cloud Functions, Cloud Pub/Sub, Cloud Dataflow and BigQuery.