Scribe: Building an AI-powered meeting notetaker (Part 1)

While there's no shortage of services providing ASR/Speech-To-Text, it still takes a herculean effort to go from transcripts to building a production-ready AI app like an automated notetaker. In part 1 of this blog series, we drill into the challenges and pieces we need in place to flesh out a robust AI app.

Below is a demo of the app that we're going to build as part of this blog series:

We'll now discuss the challenges and high level steps for building this app:

Choosing an ASR provider
Speaker Diarization and Assignment
NLP and NLU processes
Transcript Viewer/Editor and other UI Components
Intelligent Warehouse for Transcripts and Assets

1. Choosing an ASR provider

The first step is determining which ASR provider to go with. Here's a list of providers we maintain. You can choose a provider based on your needs, such as:

Transcript Quality (what works best with your data)
Pricing
Speed
Live/Streaming or Offline/Batch support
Language and Accent Support
Reliability
Privacy and Data Locality
Feature Support (such as filler words etc)

With our unified API, you can choose a provider without falling into analysis paralysis because you can always switch and even mix and match providers, without vendor lock-in. We'll be building our notetaker using this Unified API, as it simplifies this step.

2. Speaker Diarization and Assignment

The speaker diarization process yields speaker labels, but we will still need to map them to actual identities. This step can be done using our transcript warehouse and by calling the speakers API.

3. NLP and NLU processes

In addition to base transcripts, you can pair them with NLP and NLU processes. For our automated notetaker, we'll use a subset of tasks that make sense for our use-case. These include:

Sentiment
Action Items / Follow ups
Summarization (So you don't have to read the entire transcript)
Topic Modelling
Q&A

The following snippet shows a call being made through our unified API which requests ASR using Assembly AI, and also specifies an NLP task of sentiment analysis, to be run on the generated transcript.

Shell

Node

Java

PHP

Python

Rust

JSON

curl --request POST \
  --url http://exemplary.ai/v1/transcript \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer __api_key__' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "https://assets/file.mp4",
  "provider": "assemblyai",
  "sentiment": {
    "provider": "aws.comprehend"
  },
  "speaker_labels": true,
  "language": "en-US"
}
'

4. Transcript Viewer/Editor and other UI Components

The other major aspect of building this app is the UI components. Building a transcript viewer and editor can lead you down a rabbit hole, consume a lot of developer months, and leads to hairy code and maintenance nightmares. All this time and effort could otherwise be invested elsewhere.

Another thing to note is that certain features cannot be built incrementally. Suppose you have a viewer or editor that is not collaborative, you may need to rewrite the component in its entirety to add real-time collaboration.

To save you from a lot of pain, we provide a collaborative transcript editor and viewer with our SDK.

Easy to integrate

You can easily embed the transcript editor by using popular frameworks such as React or by using vanilla Javascript.

yarn add @exemplaryai/conversational-doc

npm install @exemplaryai/conversational-doc

5. Intelligent Warehouse for Transcripts and Assets

Running transcripts and any NLP/NLU processes is merely one step in this process. We need an intelligent warehouse for storing all this data, and the underlying assets, such as video or audio recordings, to build our AI app.

The intelligent warehouse supports:

Speaker API
Speaker / Participant Analysis (Speaker Stats)
Semantic Search
Continuous Learning
Retranscribing
Clipping
Asset Storage
Transcript History / Versioning

In the next post in this series, we'll guide you through using our SDK to build the automated notetaker. Stay tuned!

AINOTE-TAKERSAMPLE-APPScribe: Building an AI-powered meeting notetaker (Part 1)
Johann Verghese
Johann Verghese
September 19, 20223 min read

1. Choosing an ASR provider

2. Speaker Diarization and Assignment

3. NLP and NLU processes

4. Transcript Viewer/Editor and other UI Components

5. Intelligent Warehouse for Transcripts and Assets

Related Blogs

AINOTE-TAKERSAMPLE-APPScribe: Building an AI-powered meeting notetaker (Part 1)Johann VergheseJohann VergheseSeptember 19, 2022·3 min read

1. Choosing an ASR provider

2. Speaker Diarization and Assignment

3. NLP and NLU processes

4. Transcript Viewer/Editor and other UI Components

5. Intelligent Warehouse for Transcripts and Assets

Related Blogs

AINOTE-TAKERSAMPLE-APPScribe: Building an AI-powered meeting notetaker (Part 1)
Johann Verghese
Johann Verghese
September 19, 20223 min read