How to Build A Simple AI Chatbot With Web Speech API and Node.js

In this tutorial, you’ll learn how to create an AI-powered voice chat interface in the browser. You’ll use the Web Speech API to listen to the user’s voice and reply with a synthetic voice. Tomomi Imura over at Smashing Magazine takes you thru every step so you can go from zero to hero in no time.

The Web Speech API allows users to integrate voice into their web apps so users can issue voice commands. Unfortunately, only a handful of browser versions support the API at the moment, including Chrome 25+ and Opera 27+. So keep that in mind when testing your chatbot. 

You’ll take three big steps in developing your app. First, you’ll use the Web Speech API’s SpeechRecognition interface to listen to the user’s voice. Second, you’ll send the transcribed speech to API.AI as a text string to understand the text. Lastly, you’ll fetch the response and use the SpeechSynthesis interface to read it out in a synthetic voice. 

Setup

Start by setting up a web app framework with Node.js. You’ll want an app directory with the following structure:

├── index.js
├── public
│   ├── css
│   │   └── style.css
│   └── js
│       └── script.js
└── views
    └── index.html

Then run ‘npm init -f’ in the command line to initialize your app. You’ll now need to install your dependencies so run ‘npm install express socket.io apiai --save’. This will install Express, the web app framework you’ll be using, Socket.IO, for real-time bi-directional communication using the WebSocket protocol, and finally API.AI, for the API component.  Now you have your dependencies, create an index.js file to instantiate Express and listen for the server:

const express = require('express');
const app = express();

app.use(express.static(__dirname + '/views')); // html
app.use(express.static(__dirname + '/public')); // js, css, images

const server = app.listen(5000);
app.get('/', (req, res) => {
  res.sendFile('index.html');
});

Creating the User Interface

You can now get down to business and start building the app. The user interface will be a simple affair, just a button to trigger voice recognition. We’ll add a little content to the index.html that will include our script.js file and Socket.IO.

<html lang="en">
  <head>…</head>
  <body>
    …
    <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.0.1/socket.io.js"></script>
    <script src="js/script.js"></script>
  </body>
</html> 

Don’t forget to add the button:

<button>Talk</button>

You’ll have to style the thing yourself!

Capturing Voice With JS

Now we want to start using the Web Speech API in script.js. So create an instance of SpeechRecognition, the interface for voice recognition:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

You’ll notice that we’re using ECMAScript6 syntax since this is supported by the browsers that support the speech API. 

Feel free to customize various properties, such as:

recognition.lang = 'en-US';
recognition.interimResults = false;

Next, you’ll want to identify the button and listen for click events on it.

document.querySelector('button').addEventListener('click', () => {
  recognition.start();
});

This will trigger the start of speech recognition. You can then fetch the text being produced.

recognition.addEventListener('result', (e) => {
  let last = e.results.length - 1;
  let text = e.results[last][0].transcript;

  console.log('Confidence: ' + e.results[0][0].confidence);

  // We will use the Socket.IO here later…
});

A SpeechRecognitionResult list will be returned with the text in an array along with the confidence of the transcription. 

Real-Time Communication with Socket.IO

Now you need to set up the connection between the server and the browser with Socket.IO. So instantiate Socket.IO in script.js and send the text data to the server.

const socket = io();
socket.emit('chat message', text);

Getting a Reply from the API

We’re going to use API.AI on the server to get a response to transcribed text. So create a free API.AI account and create a ‘Small Talk’ agent thru their interface with whatever customisation you want. Then go to ‘General Settings’ and get your API key. 

Hooking up Node.js with API.AI

Now you’ll link the Node.js app with API.AI thru its own Node SDK. Go to your index.js file again and initialize API.AI

const apiai = require('apiai')(APIAI_TOKEN);

Now you’ll use server-side Socket.io to fetch the text from the browser and then fetch the result with API.AI.

io.on('connection', function(socket) {
  socket.on('chat message', (text) => {

    // Get a reply from API.AI

    let apiaiReq = apiai.textRequest(text, {
      sessionId: APIAI_SESSION_ID
    });

    apiaiReq.on('response', (response) => {
      let aiText = response.result.fulfillment.speech;
      socket.emit('bot reply', aiText); // Send the result back to the browser!
    });

    apiaiReq.on('error', (error) => {
      console.log(error);
    });

    apiaiReq.end();

  });
});

Then simply use socket.IO’s emit method to send the response back to the browser. 

Giving the AI a Voice

Finally, you need to use the SpeechSynthesis interface to read out the response provided by API.AI. So, go back to script.js and create a function to generate a synthetic voice.

function synthVoice(text) {
  const synth = window.speechSynthesis;
  const utterance = new SpeechSynthesisUtterance();
  utterance.text = text;
  synth.speak(utterance);
}

This code will create an instance of SpeechSynthesisUtterance, set the response text as a property and then read it out loud. All you need to do now is fetch the text from Socket.io and have your new function as a callback.

socket.on('bot reply', function(replyText) {
  synthVoice(replyText);
});

And you’re done! You can now test it out yourself. Remember, though, you’ll have to give the browser access to your microphone first to record your first speech.

Original Article

Building A Simple AI Chatbot With Web Speech API And Node.js

Seamus Holland
 

Comments