I created Typical Topical for the Devpost “Sentiment & Opinion Mining Natural Language API Hackathon”. It was the fourth hackathon I have entered based on textual analysis APIs, the second using Expert.ai natural language APIs.
Get the code for Typical Topical on GitHub
Some of the situations and learnings I encountered during the coding were:
Unstructured Utterances
As with a number of my skills, I respond to unstructured utterances spoken by the user. This is not a standard capability provided by the Alexa Skills Kit SDK, so I have used one of two hacks that I know of to accomplish this. Amazon deprecated a slot type called AMAZON.Literal, used to scoop up everything the user said, and replaced it with AMAZON.SearchQuery, which scoops up everything after that is accompanied by an anchor word or phrase, such as “Search for {AMAZON.SearchQuery}”. An alternative is to train the interaction model using a “catchall” slot on a wide variety of random utterances, and create an intent that is triggered with an example phrase consisting of the catchall slot: “{catchall}”. This is the method I use in Typical Topical, and it returns the utterance, modulo any misunderstandings (which do happen). As this isn’t a mission critical skill, I don’t confirm or error correct misunderstandings.
Time Limits
An Alexa skill will prompt the user, then open the mic for input. If there is no input detected after a short period of time, the user can reprompt, with a new prompt message, to get the user to talk. However, if the user speaks for too long a period (which is actually indeterminate), then no input is detected and the user is reprompted. The skill code can’t tell the difference between an empty utterance and a too-lengthy utterance, so can’t customize a reprompt between “Say something!” and “Don’t say so much!”. In Typical Topical, I reprompt with “I didn’t catch that. Say it again more concisely, or say, repeat, to hear the targets again.” as a fallback.
Because I’m hitting that sweet spot between saying nothing and saying too much, I am limited to the amount of text I can feed Expert.ai for analysis. Though I suppose I could collect more utterances before making the API calls, I wanted to have more intermediate turn-based feedback. I’m not sure the effect this has on Expert.ai’s ability to detect the behavior, emotion, and topic as effectively as with longer text input. On the one hand, more text may give the elves more to chew on, and on the other hand, it could up the potential for multiple conflicting signals, or return more arrays of results than simple detections.
It also isn’t necessarily clear to the user what constitutes a “behavior” vs an “emotion”: my read of the Expert.ai docs are that a behavior is what is described by the subject of the utterance, whereas the emotion describes the feelings of the speaker of the utterance. I found more utterances where behaviors were undetectable in the utterance, so this may be a nuanced measurement.
In my explorations of Alexa’s limits, I look at ways of mitigating them through interaction design elements. For instance, in the face of an eight second limit for a user to respond, adding a “repeat” command to repeat the targets gives the user a little more time to consider their utterance before making it. Offering “easy”, “medium”, and “hard” levels (one, two or three targets) as well as repeating targets after an utterance was already scored (with no points for the repeat), lets users get more practice at delivering an utterance within the time limit.
Parallel Calls
Each turn makes calls to three separate Expert.ai APIs. Because of the need for a quick response from Alexa before the Lambda times out, I am making these calls in parallel. To do this, I use the Promise.allSettled() method:
try {
[analyze, categorize, behavioral] = await Promise.allSettled([
hasTopic ? getAnalyze(handlerInput, query) : getNull(),
hasEmotion ? getCategorize(handlerInput, query) : getNull(),
hasBehavior ? getBehavioral(handlerInput, query) : getNull(),
]);
} catch (err) {
console.log("Bad return from Expert.ai! ", err);
}
This method calls the three APIs in parallel, and awaits the completion of the three.
Tokens
Expert.ai APIs require the use of a token, which expires after a period of time. I make a request for the token via a separate API during each turn. Though I could stash the token into persistent storage and call the token API only if the currently stashed token returns a failure on an API call, then re-issue the API calls and restash the new token, I chose to simplify things with the extra call each time.
Alexa Presentation Language (APL)
For graphical appeal, I originally envisioned a slot machine delivering the target topic, behavior and emotion on three spinning reels that would stop on the targets. I then considered three separate spinning wheels (having found example spinning wheel code), but settled on dealing three cards, turning them face up to get the targets. I’m not really an APL whiz, so thankfully I found some great code by Alexander Martin on apl.ninja that, with some mods (and a good deal of learning on my part!) did the trick.
Scoring
Originally designed to score a fixed number of points based on either meeting or not meeting the target, I modified it somewhat to factor in two things. First, while topic was either met or not met, emotion is a two-level hierarchy: emotion and emotion group, e.g. “Anxiety”, “Fear”, “Stress” and “Worry” are all part of emotion group “Apprehension”; and behavior is a three-level hierarchy of behavior, behavior group and intensity (low, fair and high). So for behavior and emotion, there were lesser scores awarded if you matched the same group or group and intensity as the target. I also halved the scoring if the user utters the literal target word in their utterance. The user can also repeat a target for zero points, just to get practice.
Skill Name
The original name of the skill was “Expert Texpert” a nod to the Beatles. However, that name didn’t pass certification because “The skill’s invocation name is unlikely to have high accuracy for launching the skill. Users may experience challenges invoking the invocation name due to inclusion of the term, ‘texpert.’ We advise selecting a different term to ensure users are able to invoke the skill.” Oh, well, so it’s “Typical Topical”, alluding to the behavior and emotion types and the topics.
Get the code for Typical Topical on GitHub