AINOTE-TAKERSAMPLE-APPScribe: Building an AI-powered meeting notetaker (Part 1)Johann VergheseJohann Verghese
3 min read
Johann Verghese
While there's no shortage of services providing ASR/Speech-To-Text, it still takes a herculean effort to go from transcripts to building a production-ready AI app like an automated notetaker. In part 1 of this blog series, we drill into the challenges and pieces we need in place to flesh out a robust AI app.
Below is a demo of the app that we're going to build as part of this blog series:
We'll now discuss the challenges and high level steps for building this app:
- Choosing an ASR provider
- Speaker Diarization and Assignment
- NLP and NLU processes
- Transcript Viewer/Editor and other UI Components
- Intelligent Warehouse for Transcripts and Assets
1. Choosing an ASR provider
The first step is determining which ASR provider to go with. Here's a list of providers we maintain. You can choose a provider based on your needs, such as:
- Transcript Quality (what works best with your data)
- Pricing
- Speed
- Live/Streaming or Offline/Batch support
- Language and Accent Support
- Reliability
- Privacy and Data Locality
- Feature Support (such as filler words etc)
With our unified API, you can choose a provider without falling into analysis paralysis because you can always switch and even mix and match providers, without vendor lock-in. We'll be building our notetaker using this Unified API, as it simplifies this step.
2. Speaker Diarization and Assignment
The speaker diarization process yields speaker labels, but we will still need to map them to actual identities. This step can be done using our transcript warehouse and by calling the speakers
API.
3. NLP and NLU processes
In addition to base transcripts, you can pair them with NLP and NLU processes. For our automated notetaker, we'll use a subset of tasks that make sense for our use-case. These include:
- Sentiment
- Action Items / Follow ups
- Summarization (So you don't have to read the entire transcript)
- Topic Modelling
- Q&A
The following snippet shows a call being made through our unified API which requests ASR using Assembly AI, and also specifies an NLP task of sentiment analysis, to be run on the generated transcript.
4. Transcript Viewer/Editor and other UI Components
The other major aspect of building this app is the UI components. Building a transcript viewer and editor can lead you down a rabbit hole, consume a lot of developer months, and leads to hairy code and maintenance nightmares. All this time and effort could otherwise be invested elsewhere.
Another thing to note is that certain features cannot be built incrementally. Suppose you have a viewer or editor that is not collaborative, you may need to rewrite the component in its entirety to add real-time collaboration.
To save you from a lot of pain, we provide a collaborative transcript editor and viewer with our SDK.
Easy to integrate
You can easily embed the transcript editor by using popular frameworks such as React or by using vanilla Javascript.
yarn add @exemplaryai/conversational-doc
or
npm install @exemplaryai/conversational-doc
5. Intelligent Warehouse for Transcripts and Assets
Running transcripts and any NLP/NLU processes is merely one step in this process. We need an intelligent warehouse for storing all this data, and the underlying assets, such as video or audio recordings, to build our AI app.
The intelligent warehouse supports:
- Speaker API
- Speaker / Participant Analysis (Speaker Stats)
- Semantic Search
- Continuous Learning
- Retranscribing
- Clipping
- Asset Storage
- Transcript History / Versioning
In the next post in this series, we'll guide you through using our SDK to build the automated notetaker. Stay tuned!