The Cloud Speech API lets you do speech to text transcription from audio files in over 80 languages.

In this lab, we will record an audio file and send it to the Cloud Speech API for transcription.

What you'll learn

What you'll need

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would rate your experience with Google Cloud Platform?

Novice Intermediate Proficient

Codelab-at-a-conference setup

The instructor will be sharing with you temporary accounts with existing projects that are already setup so you do not need to worry about enabling billing or any cost associated with running this codelab. Note that all these accounts will be disabled soon after the codelab is over.

Once you have received a temporary username / password to login from the instructor, log into the Google Cloud Console: https://console.cloud.google.com/.

Here's what you should see once logged in :

Click on the menu icon in the top left of the screen.

Select API Manager from the drop down.

Click on Enable API.

Then, search for "speech" in the search box. Click on Google Cloud Speech API:

Click Enable to enable the Cloud Speech API:

Wait for a few seconds for it to enable. You will see this once it's enabled:

Google Cloud Shell is a command line environment running in the Cloud. This Debian-based virtual machine is loaded with all the development tools you'll need (gcloud, bq, git and others) and offers a persistent 5GB home directory. We'll use Cloud Shell to create our request to the Speech API.

To get started with Cloud Shell, Click on the "Activate Google Cloud Shell" icon in top right hand corner of the header bar

A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt. Wait until the user@project:~$ prompt appears

Since we'll be using curl to send a request to the Speech API, we'll need to generate an API key to pass in our request URL. To create an API key, navigate to the API Manager section of your project dashboard:

Then, navigate to the Credentials tab and click Create credentials:

In the drop down menu, select API key:

Next, copy the key you just generated.

Now that you have an API key, save it to an environment variable to avoid having to insert the value of your API key in each request. You can do this in Cloud Shell. Be sure to replace <your_api_key> with the key you just copied.

export API_KEY=<YOUR_API_KEY>

You can build your request to the speech API in a request.json file. First create this file in Cloud Shell:

touch request.json

Open it using your preferred command line editor (nano, vim, emacs). Add the following to your request.json file, replacing the uri value with the uri of your raw audio file:

request.json

{
  "config": {
      "encoding":"FLAC",
      "sample_rate": 16000,
      "language_code": "en-US"
  },
  "audio": {
      "uri":"gs://cloud-samples-tests/speech/brooklyn.flac"
  }
}

The request body has a config and audio object. In config, we tell the Speech API how to process the request. The encoding parameter tells the API which type of audio encoding you're using for the audio file you're sending to the API. FLAC is the encoding type for .raw files (see the documentation for encoding type for more details). sample_rate is the rate in Hertz of the audio data you're sending to the API. There are other parameters you can add to your config object, but encoding and sample_rate are the only required ones.

In the audio object, you pass the API the uri of our audio file in Cloud Storage. Now you're ready to call the Speech API!

You can now pass your request body, along with the API key environment variable you saved earlier, to the Speech API with the following curl command (all in one single command line):

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json \
"https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=${API_KEY}"

Your response should look something like the following:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}

The transcript value will return the Speech API's text transcription of your audio file, and the confidence value indicates how sure the API is that it has accurately transcribed your audio.

You'll notice that we called the syncrecognize method in our request above. The Speech API supports both synchronous and asynchronous speech to text transcription. In this example we sent it a complete audio file, but you can also use the syncrecognize method to perform streaming speech to text transcription while the user is still speaking.

Are you multilingual? The Speech API supports speech to text transcription in over 80 languages! You can change the language_code parameter in request.json. You can find a list of supported languages here.

For example, if you had a Spanish audio file, you can set the language_code attributes in the request.json file like this:

request.json

 {
  "config": {
      "encoding":"FLAC",
      "sample_rate": 16000,
      "language_code": "es-ES"
  },
  "audio": {
      "uri":"gs://.../..."
  }
}

You've learned how to perform speech to text transcription with the Speech API. In this example you passed the API the Google Cloud Storage URI of your audio file. Alternatively, you can pass a base64 encoded string of your audio content.

What we've covered

Next Steps