Project 15: Hear It with IBM's Text to Speech

July 12, 2015 |

If you’ve ever listened to a young reader read, you may have noticed the arduous task that that little brain is working through. English is a complex language, with many different pronunciations of the same word.

Techniques like sounding out words help some young readers learn new words. But there are some words that don’t conform. And which are in nearly every piece of content. They are called sight words. Recognizing these words quickly makes reading much easier and faster for beginning readers.

In my final project of my 15 projects in 30 days challenge, I’m going to use IBM’s Bluemix Text to Speech API to read a word aloud. The player clicks on the box with the matching word and scores a point.

IBM Bluemix

IBM Bluemix has quite a number of APIs and capabilities, but I’m going to use only the Text to Speech API in this project. Sign up for a Bluemix account.

Add the Text to Speech service.

In the left-hand column, click on Service Credentials. Copy the username and password for the next step.

Setup

There are four files for this project. The Node.js app, app.js, will communicate with the Text to Speech API to get audio representation of the words. The AngularJS app, index.html and hearit.js displays the game.

// Filename: app.js
var BLUEMIX_USERNAME = '';
var BLUEMIX_PASSWORD = '';
var PORT = 8080;

var express = require('express');
var watson = require('watson-developer-cloud');
var url = require('url');

var app = express();

app.get('/api/speak', function(req, res) {
  var query = url.parse(req.url, true).query;

  var text_to_speech = watson.text_to_speech({
    username: BLUEMIX_USERNAME,
    password: BLUEMIX_PASSWORD,
    version: 'v1',
    url: 'https://stream.watsonplatform.net/text-to-speech/api'
  });

  var params = {
    text: query.text,
    voice: 'en-US_AllisonVoice', // Optional voice
    accept: 'audio/wav'
  };

  text_to_speech.synthesize(params).pipe(res);  
});

app.use(express.static(__dirname + '/public'));
app.listen(PORT);

console.log('Application listening on port '+PORT);

<!-- Filename: public/index.html -->
<html ng-app="HearItApp">
  <head>
    <title>Hear It</title>
    <script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.3.16/angular.min.js"></script>
    <script src="hearit.js"></script>
  </head>
  <body ng-controller="HearItCtrl" style="font-family:Arial">
    <div style="float:right">
      Score: {{score}}
      <audio controls preload="auto" id="audio">
        <source id="wavsource" type="audio/wav">
      </audio>
    </div>

    <h2>Hear It</h2>

    <div style="clear:both">
      <div style="float:left; width:250px; height:75px; border:1px solid black; text-align:center;font-size:36pt" ng-repeat="word in wordSet" ng-click="checkAnswer(word)">{{word}}</div>
    </div>
  </body>
</html>

// Filename: public/hearit.js

angular.module('HearItApp', [])
.controller('HearItCtrl', ['$scope', function($scope) {
  var wordList = ['a', 'and', 'away', 'big', 'blue', 'can', 'come', 'down', 'find', 'for', 'funny', 'go', 'help', 'here', 'I', 'in', 'is', 'it', 'jump', 'little', 'look', 'make', 'me', 'my', 'not', 'one', 'play', 'red', 'run', 'said', 'see', 'the', 'three', 'to', 'two', 'up', 'we', 'where', 'yellow', 'you'];
  var audio = document.getElementById('audio');
  var wavsource = document.getElementById('wavsource');

  $scope.score = 0;
  $scope.attempt = 0;

  $scope.loadSet = function() {
    // Shuffle the word list.
    for(var j, x, i = wordList.length; i; j = Math.floor(Math.random() * i), x = wordList[--i], wordList[i] = wordList[j], wordList[j] = x);

    $scope.wordSet = wordList.slice(0,4);
    $scope.selectedWord = $scope.wordSet[Math.floor(Math.random()*$scope.wordSet.length)];
    $scope.attempt = 0;

    wavsource.src = '/api/speak?text=Click+on+the+word+'+$scope.selectedWord;

    audio.load();
    audio.play(); 
  }

  $scope.loadSet();

  $scope.checkAnswer = function(word) {
    if(word == $scope.selectedWord) {
      if($scope.attempt == 0)
        $scope.score++;
      
      $scope.loadSet();
    } else {
      audio.play();
      $scope.attempt++;
    }
  }
}]);

// Filename: package.json
{
  "name": "hear-it",
  "description": "Hear It game for Node.js",
  "version": "0.0.1",
  "private": true,
  "dependencies": {
    "express": "*",
    "url": "*",
    "watson-developer-cloud": "*"
  }
}

To install the Node.js dependencies, run the command:

npm install

And to start the Node.js app, run the command:

nodejs app.js

Hear It

Hear It is pretty simple to play. Load the index.html in the browser. Four random words from the wordList array are displayed. One of the words is spoken. The objective is to click on the correct word to score a point.

The next set of words is displayed when you click on the correct word. If you don’t get the correct answer on the first try, no points are awarded for that turn.

The audio is played via the audio HTML5 feature, which plays a wav file that comes from IBM’s Text to Speech API.

That’s it for this project. Here are some ways this project can be expanded:

Add full sentence examples to provide context to how the word can be used.
Add multiplayer capability (using Firebase, Project 14) where multiple players compete to click on the correct answer first.
Provide feedback in a report to a parent/teacher about which words the player struggles with.

Source Code

You can find the repo on GitHub.

Post Mortem

The Text to Speech API was easy to use and offered several voices. Text to Speech provides the ability for an app to present content in a way even illiterate users can understand. While this project was simple, it showed how audio can present an experience that works for many different audiences.

15 Projects in 30 Days Challenge

This blog post is part of my 15 projects in 30 days challenge. I’m hacking together 15 projects with different APIs, services, and technologies that I’ve had little to no exposure to. If my code isn’t completely efficient or accurate, please understand it isn’t meant to be complete and bulletproof. When something is left out, I try to mention it. Reach out to me and kindly teach me if I go towards the dark side. ?

This challenge serves a couple of purposes. First, I’ve always enjoyed hacking new things together and using APIs. And I haven’t had the chance (more like a reason) to dive in head first with things like AngularJS, Node.js, and IBM’s Bluemix. This project demonstrated AngularJS, Node.js, and IBM’s Text to Speech API.