Assembly AI

Try and compare Assembly AI with other providers

  • About - Transcribe and understand audio with a single AI-powered API Automatically convert audio and video files and live audio streams to text with AssemblyAI's Speech-to-Text APIs. Do more with Audio Intelligence - summarization, content moderation, topic detection, and more. Powered by cutting-edge AI model.
  • Languages - Out of the box, AssemblyAI supports English, Spanish, French, German, Italian, Portuguese, Dutch, Hindi, Japanese. More supported languages can be used, once requested. See here for additional language support and their language reference.
  • Features - Some of the core features supported by the provider are as follows. To see a comprehensive list of features, see here.
    • Custom Vocabulary - AssemblyAI includes support for Custom Vocabularies. You can include words, phrases, or both for boosting. Any term included will have its likelihood of being transcribed boosted, with the weight specified. See here.
    • Punctuation - feature gives you a a well-punctuated transcription with proper casing. AssemblyAI's Punctuation Restoration model is a multi-class classifier under the hood. For each word, predicting a class that denotes either one or two actions: do nothing (i.e. leave the word as-is), or add punctuation and/or casing to the word. For example, the model might predict to uppercase a word, and add a comma to the end of it. Or, the model might predict that the word should stay lowercase, but a period should be added to the end of it. Our current best model leverages a transformer-based model architecture, which produces superior results compared to RNN based models. This model was trained on over 1 billion tokens, and yields an accuracy of over 92% for punctuation and casing restoration!
    • Custom Spelling -  feature gives you the ability to specify how words are spelled or formatted in the transcript text. For example, Custom Spelling could be used to change the spelling of all instances of the word "Ariana" to "Arianna". It could also be used to change the formatting of "CS 50" to "CS50"
    • Disfluencies (Filler Words) - By default, the API will remove Filler Words, like "umm" and "uhh", from transcripts. Supported Filler Words:
      • "um"
      • "uh"
      • "hmm"
      • "mhm"
      • "uh huh"
    • Profanity Filtering - allows you to quickly remove curse words and replace them with "#" characters in the transcript so you don't have to worry about it! By default, the API will return a verbatim transcription of the audio, meaning profanity will be present in the transcript if spoken in the audio. To replace profanity with asterisks, as shown below, include the additional parameter filter_profanity to your request when submitting files for transcription, and set this to true. Eg: It was some tough s* that they had to go through. But they did it. I mean, it blows my f****** mind every time I hear the story.** The JSON for your completed transcript will come back as-per-usual, but the text will contain asterisks when profanity was spoken. Filter profanity from the transcribed text, can be true  or false - filter_profanity (boolean)
    • Audio Intelligence - Build powerful applications with features like Summarization, Entity Detection, Content Moderation, Sentiment Analysis, PII Redaction, and more.
      • PII Redaction - Control Which Types of PII to Redact
      • Detect Important Phrases and Words
      • Content Moderation Topic Detection (IAB Classification)
      • Sentiment Analysis
      • Auto Chapters (Summarization)
      • Entity Detection
  • Pricing -
    • Core Transcription - $0.00025 PER SECOND Automatically convert audio and video files, and live audio streams, into accurate transcriptions with a simple API. Powered by cutting-edge research into Automatic Speech Recognition. Async transcription Real-time transcription Process multiple files & streams in parallel State-of-the-art accuracy Billed per second transcribed.
    • Audio Intelligence - $0.000583 PER SECOND in addition to Core Transcription pricing Leverage cutting-edge AI research to build powerful applications and features with higher ROI. Extract summaries and topics, detect sensitive content and sentiment, and more with Audio Intelligence. Summarization Sentiment Analysis Entity Detection PII Redaction Content Moderation