AWS Transcribe

Try and compare AWS with other providers

  • About - Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text. You can use Amazon Transcribe as a standalone transcription service or to add speech-to-text capabilities to any application. With Amazon Transcribe, you can improve accuracy for your specific use case with language customization, filter content to ensure customer privacy or audience-appropriate language, analyze content in multi-channel audio, differentiate the speech of individual speakers, and more.
  • Models - Custom language models are designed to improve transcription accuracy for domain-specific speech. This includes any content outside of what you would hear in normal, everyday conversations. For example, if you're transcribing the proceedings from a scientific conference, a standard transcription is unlikely to recognize many of the scientific terms used by presenters. In this case, you can train a custom language model to recognize the specialized terms used in your discipline. Unlike custom vocabularies, which increase recognition of a word by providing hints (such as pronunciations), custom language models learn the context associated with a given word. This includes how and when a word is used, and the relationship a word has to other words. For example, if you train your model using climate science research papers, your model may learn that 'ice floe' is a more likely word pair than 'ice flow'.
  • Languages - Amazon Transcribe supports English, Arabic, Spanish, French, German, Italian, Portuguese, Dutch, Hindi, Japanese, Korean, and other languages, which can be used as requested. See here for additional language support and their language reference.
  • Features - Amazon Transcribe is an automatic speech recognition service that makes it easy to add speech to text capabilities to any application. Transcribe’s features enable you to ingest audio input, produce easy to read and review transcripts, improve accuracy with customization, and filter content to ensure customer privacy.
    • Audio inputs - Transcribe is designed to process live and recorded audio or video input to provide high quality transcriptions for search and analysis. We also offer separate APIs that uniquely understand customer calls (Amazon Transcribe Call Analytics) and medical conversations (Amazon Transcribe Medical). Streaming & batch transcription You can process your existing audio recordings or stream the audio for real-time transcription. Using a secure connection, you can send a live audio stream to the service, and receive a stream of text in response. Audio inputs Domain specific models Select a model that is tuned to telephone calls or multimedia video content. For example, Transcribe adapts to low-fidelity phone audio common in contact centers. Automatic language identification With Amazon Transcribe, you can automatically identify the dominant language in an audio file and generate transcriptions. This is useful when your media library contains audio files in different languages. You can also use this feature for media content classification and verify that the main spoken language in your videos and podcasts is correctly labeled.
    • Easy to read transcripts - Amazon Transcribe enables you produce accurate transcripts that are easy to read, review, and integrate into your specific applications. We work to make the output ready for downstream activities such as call transcript analysis, subtitling, and content search. Punctuation & number normalization Amazon Transcribe automatically adds punctuation and number formatting, so that the output closely matches the quality of manual transcription at a fraction of the time and expense. Numbers are also transcribed into digits or “normal form” instead of words. Easy to read transcripts Timestamp generation Amazon Transcribe returns a timestamp for each word, so that you can easily find a word or phrase in the original recording or add subtitles to video. Recognize multiple speakers Speaker changes are automatically recognized and attributed in the text to capture scenarios like telephone calls, meetings, and television shows accurately. To learn more about speaker identification. Channel identification Contact centers can submit a single audio file to Amazon Transcribe, and the service will identify produce a single transcript annotated by channel labels automatically.
    • Customize your output - Accuracy is critical and we provide you many options to customize transcripts to your specific business needs and vernacular. Transcribe also provides up to 10 alternative transcriptions for each sentence, so you can quickly choose the best option that applies to your content and domain. This is useful for human in-the-loop subtitling workflows. Custom vocabulary With custom vocabulary, you can add new words to the base vocabulary to generate more accurate transcriptions for domain-specific words and phrases like product names, technical terminology, or names of individuals. Customize your output Custom language models When needed, you can build and train your own custom language model (CLM) for your use case and domain by submitting a corpus of text data to Amazon Transcribe. CLM is a suitable feature for enhancing speech recognition accuracy with your own data.
    • User safety & privacy features - Ensuring customer privacy and safety is critical. When needed, Transcribe enables you to mask or remove words that are sensitive or unsuitable for your audience from transcription results. Vocabulary filtering You can specify a list of words to remove from transcripts with vocabulary filtering. For example, you can specify a list of profane or offensive words and Amazon Transcribe removes them from transcripts automatically. User safety & privacy features Automatic content redaction / PII redaction When instructed, Amazon Transcribe can help customers identify and redact sensitive personally identifiable information (PII) from the supported language transcripts. This allows contact centers to easily review and share the transcripts for customer experience insight and agent training. Data Protection Secure data at rest using Amazon S3 key (SSE-S3) or specify your own AWS Key Management Service key. Amazon Transcribe uses TLS (Transport Layer Security) 1.2, a cryptographic protocol that enables authenticated connections and secure data transport over the internet via HTTP, with AWS certificates to encrypt data in transit. This includes streaming transcription.
    • Amazon Transcribe Call Analytics - Extract conversation insights such as call sentiment and speech loudness to improve agent productivity and customer experience with Amazon Transcribe Call Analytics. Improve contact center productivity with call summarization Generate call summaries to help agents focus on providing excellent customer experiences and increase productivity post-call by automatically capturing key parts of the customer conversation (e.g. issue, outcomes, or action items). Managers can quickly review these summaries without reviewing the entire transcript to understand the context of an interaction and investigate any customer issues. Extract detailed call analytics & conversation insights Using the power of machine learning, you can quickly apply speech-to-text and natural language processing capabilities to uncover valuable conversation insights. You can then integrate insights such as customer and agent sentiment, detected issues, and speech characteristics like non-talk time, interruptions, and talk-speed into your inbound and outbound call analytics applications. This can help your supervisors more readily identify potential customer issues, agent coaching opportunities, and call trends. Amazon Transcribe Call Analytics Improve compliance & monitoring with automated call categorization Monitor your calls at scale to track compliance with company policies or regulatory requirements. Build and train your own custom categories based on your specified criteria (e.g. words/phrases or conversation characteristics). For example, you can setup category labels to see what percentage of calls are upsells or account cancellation. Produce rich call transcripts Give your agents access to the conversation details from past interactions. The turn-by-turn transcripts provide insights such as customer sentiment, detected issues and interruptions. Protect sensitive customer data Conversations often contain sensitive customer data such as names, addresses, credit card numbers, and social security numbers. Transcribe Call Analytics helps customers identify and redact this information from both the audio and text. Contact center integrations Genesys Cloud CX Genesys Cloud CX is a cloud contact center solution that unifies customers and agent experiences across multiple channels such as phone, text and chat. You can stream your call audio to Amazon Transcribe from the Genesys Cloud environment to improve agent productivity and extract customer interaction insights. Please refer to Genesys Cloud AudioHook integration for more information. In addition, start analyzing yourGenesys Cloud calls with the AWS Live Call Analytics solution. Amazon Chime SDK The Amazon Chime SDK is a set of real-time communications components that developers can use to quickly add audio calling, video calling, and screen sharing capabilities to their own web, mobile or telephony applications. Amazon Chime Voice Connector Amazon Chime Voice Conector lets you easily integrate with SIP-based contact centers to generate live, user attributed transcripts with Amazon Transcribe. Please refer to the Amazon Chime Voice Connector documentation for more information.
    • Amazon Transcribe Medical - Easily transcribe your medical conversations with Transcribe Medical, a HIPAA-eligible automatic speech recognition (ASR) service. Dictation mode Accurately transcribe single-speaker audio commonly found in medical dictation use cases. Learn more » Conversational mode Accurately transcribe multi-speaker conversational audio consisting of clinicians and/or patients alike. Learn more » Amazon Transcribe Medical Medical specialties Transcribe speech to text across a diverse range of medical specialties. Learn more » Batch API Transcribe recorded medical audio files at scale with high concurrency. Learn more » Streaming API Transcribe audio streams in near real time via either WebSocket Secure or HTTP/2 protocols. Learn more » Custom vocabulary Boost transcription accuracy by using custom vocabulary for potentially out-of-lexicon terminology. Learn more » Channel identification Concurrently transcribe multi-channel audio at no extra charge. Get one final coherent transcript. Learn more » Speaker diarization Separate speech from different speakers within any mono-channel audio. Learn more »
  • Usecases -
    • Get insights from customer conversations - With Transcribe Call Analytics, quickly extract actionable insights from customer conversations. AWS Contact Center Intelligence  partners and Contact Lens for Amazon Connect  offer turnkey solutions to improve customer engagement, increase agent productivity, and surface quality management alerts to supervisors.
    • Search and analyze media content - Content producers and media distributors can use Amazon Transcribe to automatically convert audio and video assets into fully searchable archives for content discovery, highlight generation, content moderation, and monetization.
    • Create subtitles and meeting notes - Subtitle your on-demand and broadcast content to increase accessibility and improve customer experience. Use Amazon Transcribe to boost productivity and accurately capture the meetings and conversations that matter to you.
    • Create subtitles and meeting notes - Subtitle your on-demand and broadcast content to increase accessibility and improve customer experience. Use Amazon Transcribe to boost productivity and accurately capture the meetings and conversations that matter to you.
  • Pricing - Amazon Transcribe pricing With Amazon Transcribe, you pay-as-you-go based on the seconds of audio transcribed per month. It’s easy to get started with the Amazon Transcribe Free Tier. Upon signup, start analyzing up to 60 audio minutes monthly, free for the first 12 months. AWS Pricing Calculator AWS Pricing Calculator Calculate your Amazon Transcribe and architecture cost in a single estimate. Create your custom estimate now » Free tier 60 minutes per month for 12 months As part of the AWS Free Tier, you can get started with Amazon Transcribe for free. The Amazon Transcribe Free Tier is available to you for 12 months, starting from the date on which you create your first transcription request. Your usage for the free tier is calculated each month across all AWS Regions except the AWS GovCloud Region and automatically applied to your bill; unused monthly usage will not roll over. When your free usage expires or if your application use exceeds the free usage tier, you simply pay standard, pay-as-you-go service rates. Standard pricing Amazon Transcribe API for both streaming and batch transcriptions is billed monthly based on the tiered pricing shown below. Tiered pricing rates and discounts vary by region and you can see the applicable rates by selecting your region in the drop down below. You can apply the specific discount rate for your region based on the following examples for the US East (N. Virginia) region. Usage is billed in one-second increments, with a minimum per request charge of 15 seconds. This pricing includes features such as custom vocabularies and vocabulary filtering. Additional charges apply for features such as automatic content redaction and custom language models. Automatic Content Redaction add-on pricing You can choose to redact sensitive information like personal identifiable information (PII) with Automatic Content Redaction for your transcription needs. The additional charges are billed monthly based on the tier pricing shown below. Tiered pricing rates and discounts vary by region and you can see the applicable rates by selecting your region in the drop down below. Usage is billed in one-second increments, with a minimum per request charge for 15 seconds. Free tier is not applicable for automatic content redaction. Custom Language Model add-on pricing You can build your own Custom Language Models (CLM) by training Amazon Transcribe’s standard models with your domain specific text. Once you have a CLM, you can choose which transcription jobs should utilize your CLM. You will only incur an additional CLM charge for the transcription jobs where a custom language model is applied. Transcription audio using your Custom Language Models is billed monthly based on the tier pricing shown below. Usage is billed in one-second increments, with a minimum per request charge for 15 seconds. Free tier is not applicable for custom language models.