This episode is brought to you by the Cloud Wars Expo. This in-person event will be held June 28th to 30th at the Moscone Center in San Francisco, California.
Highlights
00:55 โ Meta AI introduced the Generative Spoken Language Model (GSLM). This is a language-based model that’s essentially textless NLP.
01:20 โ Text messages often get misconstrued because the original intent behind the message doesn’t always fully translate to the recipient.
02:10 โ The goal of GSLM is to capture data in terms of human expression from speech (audio) and video inputs. It analyzes human body language, as this form of human-specific communication enhances speech.
03:00 โ The context of words, how they are spoken, and the body language of the speaker are all various inputs behind the text.
03:15 โ Meta’s AI approach was to have a speech-emotion conversion. The textless NLP model would take the speech input and consider four parts while processing the information to formulate an output:
- Phonetic content
- Speaking rate and duration
- Identity of the speaker
- Emotion
04:00 โ The outcome was to identify non-verbal cues. This model can signal the intent of the speech or the emotion behind it.
04:31 โ The wider aim of textless NLP is to understand the richness of human communication.
Looking for real-world insights into artificial intelligence and hyperautomation? Subscribe to the AI and Hyperautomation channel: