Google Cloud Speech-to-Text

Try and compare Google with other providers

Coming soon to PlayGround

About - A Speech service feature that accurately transcribes spoken audio to text
Models - Cloud Speech-to-Text offers multiple recognition models , each tuned to different audio types.
Languages - Google Cloud Speech-to-Text supports English, Arabic, Chinese(Mandarin), Spanish, French, German, Italian, Portuguese, Dutch, Hindi, Japanese, Korean, and other languages, which can be used as requested. All languages supported by Google Cloud Speech-to-Text here
Usecases -
- Improve customer service Empower your customer service system by adding IVR (interactive voice response) and agent conversations to your call centers. Perform analytics on your conversation data to gain more insights into the calls and your customers. Speech-to-Text and its enhanced phone call models are already powering Google Cloud’s powerful solution, Contact Center AI. Workflow: Data moves from Contact Center Audio Data through Google Cloud products: Cloud Storage to (1) Transcribe with Speech-to-Text API to Natural Language API to (2) Analyze with Cloud Data Loss Prevention to (3) Redact PII with BigQuery. Then flow moves both directions from BigQuery (4) Store to (5) Query and visualize with Visualize Call Data.
- Enable voice control Implement voice commands such as “turn the volume up,” and voice search such as saying “what is the temperature in Paris?” Combine this with the Text-to-Speech API to deliver voice-enabled experiences in IoT (Internet of Things) applications. Workflow of voice control using speech to text API. Flow starts with (1) User voice command to User device with two-way flow to (0) Unique secure identity with Cloud IoT Core, and two-way flow to Cloud Functions to (2) Transcribe with Speech-to-Text API to AutoML Natural Language with (3) Intent and entity extraction, back to Cloud Functions and User device.
- Transcribe multimedia content Transcribe your audio and video to include captions and improve your audience reach and experience. Add subtitles to your content real time to your streaming content. Our video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses machine learning technology that is similar to video captioning on YouTube.
Features -
- Global vocabulary - Support your global user base with Speech-to-Text’s extensive language support in over 125 languages and variants.
- Streaming speech recognition - Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).
- Speech adaptation - Customize speech recognition to transcribe domain-specific terms and rare words by providing hints and boost your transcription accuracy of specific words or phrases. Automatically convert spoken numbers into addresses, years, currencies, and more using classes.
- Speech-to-Text On-Prem - Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.
- Multichannel recognition - Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference) and annotate the transcripts to preserve the order.
- Noise robustness - Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation.
- Domain-specific models - Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.
- Content filtering - Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results.
- Transcription evaluation - Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on your configuration.
- Automatic punctuation (beta) - Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).
- Speaker diarization (beta) - Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance.
Pricing - New customers get $300 in free credits to spend on Speech-to-Text. All customers get 60 minutes for transcribing and analyzing audio free per month, not charged against your credits. Speech-to-Text is priced per 15 seconds of audio processed after a 60-minute free tier.