How to Get Started With Amazon’s Alexa Skills Kit

We are currently at an intersection where the devices that we will work with will not necessary have an interface or buttons to click. One of the natural ways that we can interact with these devices is via voice. And one of the winners in the consumer device market in this segment is the Amazon Echo device. Since its release in mid-2015, it has found its way into homes where it performs a variety of tasks (called skills), prompted by users’ commands, and it responds back with the appropriate response.

amazon echo

The Amazon Echo device is not just a device that magically translates our voice into commands that it can understand and execute. Amazon has created an entire ecosystem that comprises the following:

  1. A range of hardware devices like Echo, Echo Dot, etc.
  2. The engine behind these devices, Alexa Voice Service is made available to manufacturers who would like to build out voice-enabled capabilities in their devices.
  3. A Software Development kit called Alexa Skills Kit that can be used by developers to build out custom skills.
  4. An Alexa Skills Marketplace that you can publish your skills to and on approval,is made available to anyone who owns an Amazon Echo device.
  5. An Alexa Skills Fund, that can help fund your ideas around the Alexa Voice Service.
  6. Last but not the least, one of its fastest growing services in AWS i.e. AWS Lambda is the easiest way that you can go about publishing your custom Alexa skill.

In this article, we are going to focus on the Alexa Skills Kit (ASK) and write our first custom skill. It assumes that you are familiar with the capabilities of the Amazon Echo device. Even if you do not own an Amazon Echo device, you can use the Amazon Echo simulator service to help test out your custom skill.

What Will we Build?

We are going to build an Alexa Skill for ProgrammableWeb. The skill will be kept simple and it will provide information on a Featured API and the top three popular API Categories. So we can interact with the Amazon Echo device by asking it sample queries as follows:

  • Alexa, ask ProgrammableWeb to give me the featured API
  • Alexa, ask ProgrammableWeb to list the top API categories
  • Alexa, ask ProgrammableWeb what are the top categories

The above are just sample voice commands that we can ask to Amazon Echo. There can be multiple variations of it and we shall look at that in details as we build out the Alexa Skill.

Note: The Skill will use dummy data and we will not  actually connect to any live API to retrieve the answer to our queries.

Alexa Skills Sample Projects

You can technically write the Alexa Skills in any programming language that you would like, as long as you are able to host the code on a server that the Alexa Service can communicate with. Amazon makes available sample projects in Node.js and Java, that you can look at to get started with writing your skills. The different sample projects offer a variety of conversational styles. For example the style that we will be using in our tutorial will be the one-off fire style, where we will ask a question and  get a reply back. There are other conversational styles too where you can continue a conversation. Those are complex interactions  worth looking into once you understand the basic process of publishing an Alexa Skill.

The sample projects for Node.js are available here. In particular, we have used the helloworld project and customized it in our post here for the skill  that we wish to write. You can similarly use other samples to get a solid base on which you can build out your skills.

User Interaction Flow

The first step to writing an Alexa Skill is to understand how the device will interact with your custom skill and the flow from request to eventual voice response that the device will speak out once it receives the response from your service.

To understand that, take a look at the diagram shown below that has not just the user interaction flow but also gives you a workflow for the entire development, deployment and eventual invocation of your custom Alexa Skill.

alexa user interaction flow

Let us go through the steps that the diagram has highlighted. These steps are high level and we will see each of these steps as we go along in the tutorial.

Step 1 : Develop your code and deploy it on a hosting environment

This step requires that you should first design how the user will interact with your service and then write the code to interpret those commands. This requires that you chose an activation name for your service, select your sample utterances, and then write your code. It will also involve deploying your code to a hosting environment. We shall be using AWS Lambda for our tutorial.

Step 2 : Configure your Alexa Custom Skill

Once you have deployed your code, the next thing is to configure your skill. This requires that you have an Amazon Developer Account. You can configure your Alexa skill via the Amazon Developer Portal and can even test it out over there.

The next steps indicate how the user will interact with your service.

Step 3 : Interpret your Voice Command

When the user speaks out a command, for example “Alexa, ask ProgrammableWeb to give me the featured API”, the Alexa Voice Service converts that voice to text and interprets it. It will extract out the invocation command (ProgrammableWeb) and then work around some of its standard words to extract out the rest of the command i.e. “give me the featured API”. If the configuration is done correctly, it will map the intent to the available service that is hosted.

Step 4 : Invoke your Application Service that hosts the skill

The Alexa Service will then invoke your hosted service and receive the response.

Step 5 : Return the response to multiple devices

The response is then converted to both voice and an appropriate format for the companion mobile application.

Let us now move on to how we can implement the above steps.

Voice Interface Design, Intents and Sample Utterances

When you design the custom Alexa skill, you need to spend some time thinking about how the user is going to interact with your service via voice. This means a few things like:

Invocation Name

Identify the command that will be used to trigger your skill. For example the Amazon Echo is woken up by the word Alexa or Amazon or Echo. So the typical interaction will go like:

Alexa, ask <invocationname> to <Intent>

In our case, we are going to go with the invocationname as ProgrammableWeb.
The Alexa Voice Service is designed in a way that allows you to be quite flexible in terms of the commands and the words/phrases around it.

For example we could interact with the same service in multiple ways as shown below:

  • Alexa, ask <invocationname> to <Intent>
  • Alexa, tell <invocationname> to <Intent>
  • Alexa, ask <invocationName> <Intent>
  • Alexa, ask <invocationName> for <Intent>

For more information on how you can design the Voice Interaction, check out the Voice Design Handbook.

Intents

The next thing to focus on is the Intent. If you are familiar with Android programming, then you will understand an Intent immediately. An Intent is the question that you are asking your Alexa Skill and which will result in the invocation of your service. Your Skill can support one or more Intents which are usually defined by a JSON file that is aptly titled IntentSchema.json.

As described earlier, we are going to write our skill that will give responses to two kinds of general requests: featured API and top API categories. So the IntentSchema.json file is where you define these two Intents. The file contents are shown below:

{
 "intents": [
   {
     "intent": "FeaturedAPIIntent"
   },
   {
     "intent": "TopAPICategoriesIntent"
   }
 ]
}

As you can see, your Alexa skill can have one or more intents. The intents are basically defined here by their name and what we see here is the simplest possible way to define an intent since we are only going to invoke the command and are not passing any variables or values while giving the command.

The Intent Schema is very flexible and can also take in parameters which are defined in slots. For more discussion on slots, refer to Intent Schema guide.

Utterances

So far we have finalized our Invocation Name (ProgrammableWeb) and our Intents. We now need to specify the various voice commands that would be spoken as part of the Intent. You can map to more than one utterance to make the voice command flexible.

Here is the Utterances file that we have for our two intents:

FeaturedAPIIntent featured API
FeaturedAPIIntent give me the featured API
FeaturedAPIIntent tell me the featured API

TopAPICategoriesIntent top categories
TopAPICategoriesIntent tell me the top categories
TopAPICategoriesIntent list the top categories
TopAPICategoriesIntent list top categories
TopAPICategoriesIntent give me the top categories
TopAPICategoriesIntent what are the top categories

It should be clear to you how we have mapped a single intent to multiple utterances. You can now speak any of the statements and the Alexa Voice Service will try and map it to the right Intent.

Romin Irani Romin loves learning about new technologies and teaching it to others. His passion is to help developers succeed.

Comments

Comments(1)