The Google Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

In this codelab you will focus on using the Vision API with Ruby. You will learn how to perform text detection, landmark detection, and face detection!

What you'll learn

What you'll need

Codelab-at-a-conference setup

By using a kiosk at Google I/O, a test project has been created and can be accessed by using going to: https://console.cloud.google.com/.

These temporary accounts have existing projects that are set up with billing so that there are no costs associated for you with running this codelab.

Note that all these accounts will be disabled soon after the codelab is over.

Use these credentials to log into the machine or to open a new Google Cloud Console window https://console.cloud.google.com/. Accept the new account Terms of Service and any updates to Terms of Service.

When presented with this console landing page, please select the only project available. Alternatively, from the console home page, click on "Select a Project" :

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.

Run the following command in the cloud shell to confirm that you are authenticated:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

Before you can begin using the Vision API you must enable the API. Using the Cloud Shell you can enable the API by using the following command:

gcloud services enable vision.googleapis.com

In order to make requests to the Vision API, you need to use a Service Account. A Service Account is an account, belonging to your project, that is used by the Google Client Ruby library to make Vision API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the gcloud tool to create a service account and then create credentials you will need to authenticate as the service account.

First you will set an environment variable with your PROJECT_ID which you will use throughout this codelab:

export GOOGLE_CLOUD_PROJECT="<PROJECT_ID>"

Next, you will create a new service account to access the Vision API by using:

gcloud iam service-accounts create my-vision-sa \
  --display-name "my vision service account"

Next, you will create credentials that your Ruby code will use to log in as your new service account. Create these credentials and save it as a JSON file "~/key.json" by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account my-vision-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Vision API Ruby gem, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:

export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/key.json"

You can read more about authenticating the Google Cloud Vision API.

You can use the command line to install the Google Cloud Vision API Ruby gem.

gem install google-cloud-vision -v 0.27.0

You can read more about the set of Google Cloud service Ruby gems available for different APIs here.

Next you will clone the Ruby sample repository that contains example images you can use to follow along.

Clone the Ruby sample repository:

git clone https://github.com/GoogleCloudPlatform/ruby-docs-samples.git
cd ruby-docs-samples
git checkout "a902f30dd449ce82469cc315610c8a3d4888ff5a"

Change directory into `ruby-docs-samples/vision`:

cd vision

Now that you have installed the required gem, start the Interactive Ruby tool by using irb.

irb --noecho

IRB will run the Ruby interpreter in a Read, Eval, Print, Loop session.

Text Detection performs Optical Character Recognition. It detects and extracts text within an image with support for a broad range of languages. It also features automatic language identification.

In this example, you will perform text detection on an image of an Otter Crossing.

Copy the following Ruby code into your IRB session:

require "google/cloud/vision"

vision = Google::Cloud::Vision.new
image  = vision.image "images/otter_crossing.jpg"

puts image.text

You should see the following output:

CAUTION
Otters crossing
for next 6 miles

Summary

In this step, you were able to perform text detection on an image of an Otter Crossing and print recognized text from the image. Read more about Text Detection.

Landmark Detection detects popular natural and man-made structures within an image.

In this example, you will perform landmark detection on an image of the Eiffel Tower.

To perform landmark detection, copy the following Ruby code into your IRB session.

require "google/cloud/vision"

vision = Google::Cloud::Vision.new
image  = vision.image "images/eiffel_tower.jpg"

image.landmarks.each do |landmark|
  puts landmark.description

  landmark.locations.each do |location|
    puts "#{location.latitude}, #{location.longitude}"
  end
end

You should see the following output:

Eiffel Tower
48.858461, 2.294351

Summary

In this step, you were able to perform landmark detection on image of the Eiffel Tower. Read more about Landmark Detection.

Face Detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear.

In this example, you will detect the likelihood of emotional state from four different emotional likelihoods including: joy, anger, sorrow, and surprise.

To perform emotional face detection, copy the following Ruby code into your IRB session:

require "google/cloud/vision"

vision = Google::Cloud::Vision.new
image  = vision.image "images/face_no_surprise.jpg"

image.faces.each do |face|
  puts "Joy:      #{face.likelihood.joy?}"
  puts "Anger:    #{face.likelihood.anger?}"
  puts "Sorrow:   #{face.likelihood.sorrow?}"
  puts "Surprise: #{face.likelihood.surprise?}"
end

You should see the following output for example image:

Joy: true
Anger: true
Sorrow: false
Surprise: true

Summary

In this step, you were able to perform emotional face detection. Read more about Face Detection.

You learned how to use the Vision API using Ruby to perform different detection on images!

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

Learn More

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.