After playing around with sentiment analysis APIs and Alexa skills, I discovered the ParallelDots (now Komprehend.io) text analysis APIs that include emotion analysis. While my sentiment analysis skills were in the form of a game/challenge, I took a different tack with emotion analysis, making it more free-form and open-ended, focusing on the reporting of results as the user provided the narrative.
Operation
In Emotional Radar the user is prompted to “speak to Alexa, a sentence at a time, about anything you want. A story, your feelings, your plans, whatever…” Then, “At the end of each sentence, Emotional Radar will report on how it interprets your feelings. If I’m not sure what you said, I’ll ask you to say ‘Redo’, and you can say it again with a shorter sentence. If you need a prompt, say ‘Prompt’. After you say ‘The End’, Emotional Radar will give a summary.“
Notes
Freeform Utterances
The user is the hero, with a story to tell, as best she can within the limits of Alexa’s ability to understand and process what a user says. Though normally Alexa intents are based on slot-filling relative to the task at hand, there are no real slots in a freeform story. Though it comes with caveats, there are ways of getting freeform user utterances passed to a skill back end. The method I use is based on defining a slot with lots of examples of freeform utterances, and creating an intent exemplified by that slot alone.
Too Much? Not Enough?
If you don’t respond to an Alexa prompt in a short period of time (~8 seconds), your skill provides a reprompt message to the user. However, if the user speaks for too long (actually indeterminate: > 8 seconds? 20 seconds?) Alexa processes as though nothing was heard, and triggers a reprompt. There is no flag given the skill to distinguish between the two, so it becomes a UX challenge. I punt a bit. After prompting the user to say something, my reprompt is:
let reSay = "Sorry. Please say, redo. ";
I then have an intent triggered by “redo” (and its alternatives “retake”, and “restart”) that prompts the user with:
var say =
"I didn't catch the last sentence you said. If it was long, make it shorter. Please try again. ";
Not a perfect world solution, but an approximation.
Prompts
As a UX enhancement, I realized that users might not be spontaneously eloquent in their utterances, so I let them ask for a prompt to get them started. Here’s the current starter set:
const prompts = [
"Tell me a story about your childhood.",
"Tell me about a recent dream you had.",
"Tell me about a memorable vacation.",
"Recount a recent conversation you had.",
"Tell me about a recent experience with customer service.",
"Talk about one of your pets.",
];
The ParallelDots API
Data
The ParallelDots Emotion API returns:
- emotion: Type of emotion present in text i.e (angry, bored, fear, sad, excited, happy)
- probabilities: The confidence score of each of the emotion. It lies between 0 to 1 . Higher score states the higher confidence score of the output.
{
"emotion": {
"Bored": 0.0768371562,
"Angry": 0.2231197248,
"Sad": 0.2348101367,
"Fear": 0.2431143526,
"Happy": 0.1176188243,
"Excited": 0.1044998055
}
}
Because the API can report back on multiple emotions for a given utterance, I report the predominant emotion in the audio response (and if display is supported) graphically display the probability for each of the emotions. I also accumulate the scores over all the utterances in a session, so I can summarize when the user quits the skill.
Code
My Lambda backend is written in node.js, and I use the ParallelDots node.js api wrapper to communicate with the API service. I stash my API key in an environment variable, and then when my freeform utterance intent grabs the entire utterance from the request envelope, I call this function:
function getPDResults(query) {
const pd = require("paralleldots");
pd.apiKey = process.env.APIKEY;
return new Promise((resolve, reject) => {
pd.emotion(query, "en")
.then((response) => {
console.log(response);
return resolve(response);
})
.catch((error) => {
console.log(error, error.stack); // an error occurred
return reject("bad call");
});
});
}
Pricing
I am using the free tier of the ParallelDots API:
- Usage limit: 1000/day
- Rate limit: 20/minute
This is holding for now, but might be a problem when/if the skill had more traffic. The next tier is $79/month and goes up quickly from there. So in the meantime, I detect a 429 error (per their docs: “429: Too Many Requests — You’re requesting too many kittens! Slow down!”) and report: “The radar is closed for the day. Please come back tomorrow.” (Though I don’t know if the 429 is reported back for both usage limit and rate limit, so maybe I should say “The radar is busy. Come back in a minute, and if that still doesn’t work, come back tomorrow.”)
The Graphical Radar
When I wanted to display the resulting values for the six emotions, my first thought was a radar chart. If I were an APL maven, I would have done the whole thing in APL, but no. I have done simple bar charts in native APL, but this would have been a little hairier. Instead, I followed Frank Börncke’s lead and made yet another external call, this time to quickchart.io, which has a built-in radar chart generator. I tweaked its parameters to suit, and included a parameterized URL, fed by data from the intent handler, as the source of an image component in my APL doc:
{
"type": "Image",
"width": "100%",
"height": "100%",
"source": "https://quickchart.io/chart?bkg=white&c={'type':'radar','data':{'labels':['Angry','Sad','Excited','Bored','Happy','Fear'],'datasets':[{'backgroundColor':'${payload.emotionData.properties.fills}','borderColor':'${payload.emotionData.properties.lines}','data':[${payload.emotionData.properties.scores[0]},${payload.emotionData.properties.scores[1]},${payload.emotionData.properties.scores[2]},${payload.emotionData.properties.scores[3]},${payload.emotionData.properties.scores[4]},${payload.emotionData.properties.scores[5]}],'label':'${payload.emotionData.properties.query}'}]},'options':{'maintainAspectRatio':true,'spanGaps':false,'elements':{'line':{'tension':0.000001}},'plugins':{'filler':{'propagate':false},'samples-filler-analyser':{'target':'chart-analyser'}},'legend':{'labels':{'fontColor':'black','boxWidth':0}},'scale':{'pointLabels':{'fontColor':'black'},'ticks':{'fontColor':'black'}}}}"
}
Et voilà:
Emoting with SSML
If I’m placing the burden on the user to emote, the least I could do is try to express the response with a little emotion. SSML does have built-in styles for two emotions: excited and disappointed. So I fiddled with SSML attributes to see if I could approximate some feeling in the response that reports the predominant emotion:
const emotionOutput = [
"<prosody volume='loud' pitch='low' rate='slow'>Mostly</prosody><break time='.25s'/><prosody volume='x-loud' pitch='x-low' rate='slow'> Angry. </prosody>",
"<amazon:emotion name='disappointed' intensity='high'>Mostly Sad. </amazon:emotion>",
"<amazon:emotion name='excited' intensity='high'>Mostly Excited. </amazon:emotion>",
"<prosody rate='x-slow' pitch='x-low'>Mostly bored. </prosody>",
"Mostly<break time='.25s'/><prosody volume='x-loud' pitch='high' rate='medium'> Happy. </prosody>",
"<prosody volume='loud' pitch='high' rate='slow'>Mostly </prosody><prosody volume='loud' pitch='x-high' rate='slow'>Fear. </prosody>",
];
Check them out in the developer console Voice & Tone simulator and let me know what you think, or see if you can improve on them!