How to transcribe video using Whisper via an API

Access OpenAI's Whisper like an API with AVflow.io's video workflow building tools.

AVflow.io runs OpenAI's Whisper in the cloud so you can use it in your Flow like an API.

‍

What is Whisper?

Whisper is a new open source transcription engine from OpenAI. Whisper is now in AVflow so you can use it as a step in your Flow, similar to accessing it as an API service. Learn more in this help article.

‍

Watch a demo of how to integrate Whisper in your flow.

Or watch how to use Whisper with Mux.

‍

For the Whisper step in AVflow, you can either receive a simple text file or a JSON output with the latter useful for SRT subtitles (as an example).

‍

Get started:

1. Create an account on AVflow.io, it is free. Create a Flow, which is our UI-based tool to create a trigger (i.e. what kicks off the flow) followed by a sequence of ordered steps for how you want to transform your audio / video file.

Add the Whisper step to your Flow. Whisper supports an input of only a few audio formats like mp3. If you are ingesting other formats, such as an mp4, use the FFmpeg step in AVflow first to transcode your media file to mp3. (Read more on how to use FFmpeg to transcode to mp3)

‍

2. For the "Source" field in the Whisper Step, choose the mp3/audio file source (which in our example is the S3 URL). For the "Action", choose "Transcribe" if you want a JSON output or "Speech to Text" if you just want a simple txt file of the transcript.

‍

3. Next you can use the "Subtitle" step to convert the JSON output to SRT subtitles or just save the Whisper output back to AWS S3 using the "Transfer to Storage" step. Here's an example of what you'd fill in for the Transfer to Storage step:

Learn more about the Transfer to Storage step.

4. Save, Enable, and trigger the flow. Check the folder you specified in the "Transfer to Storage" step to see the result file.

‍

Supported input format: .flac .mp3 .wav

Supported languages:


"en": "english",
"de": "german",
"es": "spanish",
"ru": "russian",
"fr": "french",
"pt": "portuguese",
"tr": "turkish",
"pl": "polish",
"ca": "catalan",
"nl": "dutch",
"ar": "arabic",
"sv": "swedish",
"it": "italian",
"id": "indonesian",
"hi": "hindi",
"fi": "finnish",
"iw": "hebrew",
"uk": "ukrainian",
"el": "greek",
"ms": "malay",
"cs": "czech",
"ro": "romanian",
"da": "danish",
"hu": "hungarian",
"ta": "tamil",
"no": "norwegian",
"ur": "urdu",
"hr": "croatian",
"bg": "bulgarian",
"lt": "lithuanian",
"la": "latin",
"mi": "maori",
"ml": "malayalam",
"cy": "welsh",
"sk": "slovak",
"te": "telugu",
"fa": "persian",
"lv": "latvian",
"bn": "bengali",
"sr": "serbian",
"az": "azerbaijani",
"sl": "slovenian",
"kn": "kannada",
"et": "estonian",
"mk": "macedonian",
"br": "breton",
"eu": "basque",
"is": "icelandic",
"hy": "armenian",
"ne": "nepali",
"mn": "mongolian",
"bs": "bosnian",
"kk": "kazakh",
"sq": "albanian",
"sw": "swahili",
"gl": "galician",
"mr": "marathi",
"pa": "punjabi",
"si": "sinhala",
"sn": "shona",
"yo": "yoruba",
"so": "somali",
"af": "afrikaans",
"oc": "occitan",
"ka": "georgian",
"be": "belarusian",
"tg": "tajik",
"sd": "sindhi",
"gu": "gujarati",
"am": "amharic",
"yi": "yiddish",
"uz": "uzbek",
"fo": "faroese",
"ht": "haitian creole",
"ps": "pashto",
"tk": "turkmen",
"nn": "nynorsk",
"mt": "maltese",
"sa": "sanskrit",
"lb": "luxembourgish",
"my": "myanmar",
"bo": "tibetan",
"tl": "tagalog",
"mg": "malagasy",
"as": "assamese",
"tt": "tatar",
"haw": "hawaiian",
"ln": "lingala",
"ha": "hausa",
"ba": "bashkir",
"jw": "javanese",
"su": "sundanese",
"vi": "vietnamese",