A little over a year ago I spoke with Bryan Catanzaro of Nvidia about some of the interesting technology they were developing in the areas of graphical AI, voice synthesis and conversational/speech AI.
Bryan shared a vision of the future of what things like machine learning and deep learning could do to impact the way we experience the world around us. And while some of the things like AI creating things like art and music and human sounding voices get a lot of attention, there are some more practical examples of AI already being used to help create better customer experiences when we need help with a product or service.
With a year going by I was curious to hear how things are progressing in these areas, and I was fortunate to speak via LinkedIn Live with Erik Pounds, Sr. Director of Enterprise Computing and Data Science at Nvidia, around the direction things like conversational and speech AI have moved in since l last spoke with Bryan. Below is an edited transcript of our conversation. Click on the embedded SoundCloud player to hear the full conversation.
Brent Leary: What are we dealing with when it comes to speech AI and conversational AI today?
Erik Pounds: You think of speech AI, think of functions like automatic speech recognition where the AI is running in the background and can immediately recognize what you’re saying. It can transcribe what’s being said. It can then act in real-time on that information. And you can provide a lot of helpful things by doing that. Imagine a customer service agent on the back end of a phone conversation. A lot of us on the other end, on the consumer side, we want to… And what do we really want? Well, one, we like talking to humans, and the other is we want to get help quickly, right?
Imagine using on the back end of it, so on the agent side, imagine if I’m talking to an agent trying to get some help and I’m asking a bunch of questions, imagine if the AI is running in the background, pulling up knowledge-based articles, finding information, finding helpful tools, and help me answer my question.
Then the agent has all this information right at their fingertips to help me solve my problem. It’s like having almost like this superpower sitting right next to you, to help someone have a great experience and solve their challenges, right? When we think about AI, especially in that context, it’s not about replacing the human with a robot that you’ll talk to. There’s these incremental steps that are going to be able to help businesses that provide a service to their clients for literally decades to come.
Data is foundational, empathy adds needed human element
Brent Leary: When people think of AI, they have this narrow definition and a narrow view of what it can actually impact. But when it comes to the customer experience when they need help, it feels like not just the AI, but the combination of at least feeling like you’re communicating with a human, at least a human sounding thing or somebody who has some sort of human empathy. It’s just as important as having the right data at their disposal.
Erik Pounds: Absolutely. Data is the foundational element of all of this. If we transcribe a call, that produces data in real-time. But also, there’s other data that is already in existence, often sitting at rest inside of a business that can be leveraged. And I think one of the best strategies any business can take is figuring out, “All right. What is the valuable data that I already have, that I already possess? And how can I leverage that to provide better customer experiences?” Some of it can be just general data.
For example, every time a customer transaction occurs, an engagement occurs, that produces data. You can gain a lot of information from that with regards to trends and patterns and things like this. They could help future customers, right? Often a lot of these calls, interactions are transcribed and stored. We all hear that part of the beginning of any call like, “This call may be monitored.
If you proceed, this is what’s going to happen.” Think of that as almost like crowdsourcing information. You can really leverage that information to your best benefit. So I think a lot of it starts with the foundation of how you leverage and utilize data.
Connecting context
Brent Leary: Can you talk a little bit about the component of this where we’re not just able to have great natural language transcription and understanding, but also the sentiment component, the ability to leverage empathy along with the speech AI as part of the combination. Because part of it is solving the challenge or helping, but the other part is how it happens and the feeling that people get not only from getting the thing corrected, but the manner in which the thing was corrected, the manner in which they were engaged, their community, the empathy going back and forth. Can you talk a little bit about where we are with that?
Erik Pounds: Often when I say one thing, and then you respond, then I say another thing, that following sentence is tied to the first sentence. When you look at how traditionally algorithms have worked, they often don’t understand that context. They’re not processing that or taking that into consideration. That is possible now. For example, we’ve put out some demos recently at our conference just last month, NVIDIA GTC, we put out a demo.
It’s a customer service demo using an AI framework that we call NVIDIA Tokkio that shows exactly how this works with regards to providing an interaction that is lifelike, that understands what I’m saying, what I’m asking for, and be able to do it in a natural type of flow of a human conversation. And that is critical. As we automate more of the complete process, that’s absolutely critical. Because like you said, we want to interact with humans, right? Like you said, someone calls in, they want to hear a human voice, they want someone that is friendly, that understands them, that appreciates what they’re saying.
If the AI is built to that level, it needs to be able to do that. Otherwise, the experience isn’t going to be good. I think this is important when we’re talking about AI technology. When it comes to speech AI or conversational AI, there’s a lot of technicalities of like, “All right. Well, what percentage of the words are you saying do I understand? Am I able to understand your words in a noisy environment? I’m able to do all this stuff.” And that’s how the technology works.
But what really matters is, is it a great experience or is it not a great experience? You can apply amazing technology to this challenge and still not provide a great customer experience. And that’s the most important thing, right? So we’ve taken the approach with our technology that one of the most important things that we can help our customers do is take the AI, take these pre-trained models, and be able to customize them for their own domain and their own environments.
If you’re running a call center where most of the discussions are around botany, I can’t remember the names of the plants that I’ve changed through times of my front yard, right? But if that’s the case, you need to make sure that this AI understands certain terminologies and phrases and context around that domain. Or if it’s a medical devices company, you can imagine there’s a lot of things that will be discussed in that conversation that are not in a normal conversation that an AI model would be trained in.
So customization is super important as well as lingo, right? So based on the areas of the world that your customers live in or call in from, you want to be able to understand dialects, lingo, things like this and be able to handle that properly. So a lot of this is not… You can’t just take a stock AI model and deploy it to work in an environment and it provides a great experience everywhere. Customization is going to be very important.
Don’t overlook the data right in front of you
Brent Leary: What are some of the things that are how maybe companies are still trying to get their head around in terms of moving forward with this?
Erik Pounds: In the context of this conversation, like you mentioned, you have good relationship with a bunch of companies that build these CRM platforms that are used by many different enterprises and organizations. Often an enterprise, they have their existing service stack or tech stack, and then they want to do something new. Sometimes where they are today has some limitations.
So that often adds some complexities because part of it is, “Well, I can build this my own on my own and plug it into my existing platform.” Or sometimes you got to go back to your ISV, make a feature request like, “Hey, we really want to do this. What are your ideas?”
I think most importantly, as you get those conversations going, understand the data that’s at your fingertips. Understand what you can do on your own, what your ISVs are capable of doing, what you could even possibly be able to do if you had just a little bit of consulting help. And I think just having a full understanding, so you can make positive steps forward.
Most first AI projects inside enterprises are used to… They cut their teeth with them, right? They’re not always successful. This is a new technology. So I would say being prepared as much as possible, so you have the greatest chance of success in your first project is super important right now.
Brent Leary: From a CRM application perspective, particularly if you’re a salesperson, they hate using CRM. They don’t like putting in stuff. They didn’t sign up to type or swipe or click. They really want to go out and build relationships and sell things. And my fantasy is, wouldn’t it be cool if you could just talk to your enterprise application, whether it’s CRM or ERB or whatever acronym you want to throw out there, if you could just talk to it like we’re talking right now and get your stuff done, is that just mere fantasy? Or do you see a day when we actually could do that kind of conversation with our apps?
Erik Pounds: No, it shouldn’t be. Especially nowadays when most of these… You mentioned like, “Okay. I’ve got go back into Salesforce and update this record after I have this conversation with this customer or prospect.” And we all know a lot of times these records aren’t that well updated, and then the business doesn’t have the intelligence it needs to move forward, right? The pipeline’s not up to date. You’re not able to learn from that. A lot of these conversations now are like we’re having, right? They’re remote. They’re not in a conference room in some building. Or even if they’re in a conference room in some building, there’s often somebody who is remote. And so there is a system listening to this conversation.
Just being able to transcribe that conversation and be able to do that for, in this case, the account manager or whoever’s involved would be great. And that’s all capable today. Just like this conversation, this conversation is transcribed. You’re using some ASR function to transcribe the conversation, then you’re applying some NLU or NLP function to understand the context of what the heck we’re talking about. And then you could pretty easily go and update a lot of those standard fields. And this is all repetitive stuff. The more repetitive an activity is, the easier it should be to apply AI.
This is part of the One-on-One Interview series with thought leaders. The transcript has been edited for publication. If it's an audio or video interview, click on the embedded player above, or subscribe via iTunes or via Stitcher.