How to transcribe video using Whisper

AVflow.io supports OpenAI's Whisper so you can use it in your Flow

Whisper is a new open source transcription engine from OpenAI. Whisper is now in AVflow so you can use it as a step in your Flow. Learn more in this help article.

Watch a demo of how to integrate Whisper in your flow.

Or watch how to use Whisper with Mux.

For the Whisper step in AVflow, you can either receive a simple text file or a JSON output with the latter useful for SRT subtitles (as an example).

Get started:

1. Add the Whisper step to your Flow. Whisper supports an input of only a few audio formats like mp3. If you are ingesting other formats, such as an mp4, use the FFmpeg step first to transcode your media file to mp3. (Read more on how to use FFmpeg to transcode to mp3)

2. For the "Source" box in the Whisper Step, choose the mp3/audio file source. For the "Action", choose "Transcribe" if you want a JSON output or "Speech to Text" if you just want a simple txt file of the transcript.

3. Next you can use the "Subtitle" step to convert the JSON output to SRT subtitles or just save the Whisper output back to AWS S3 using the "Transfer to Storage" step. Here's an example of what you'd fill in for the Transfer to Storage step:

 

Learn more about the Transfer to Storage step.

 

4. Save, Enable, and trigger the flow. Check the folder you specified to see the result file.

Supported input format: .flac .mp3 .wav

Supported languages:


"en": "english", "de": "german", "es": "spanish", "ru": "russian", "fr": "french", "pt": "portuguese", "tr": "turkish", "pl": "polish", "ca": "catalan", "nl": "dutch", "ar": "arabic", "sv": "swedish", "it": "italian", "id": "indonesian", "hi": "hindi", "fi": "finnish", "iw": "hebrew", "uk": "ukrainian", "el": "greek", "ms": "malay", "cs": "czech", "ro": "romanian", "da": "danish", "hu": "hungarian", "ta": "tamil", "no": "norwegian", "ur": "urdu", "hr": "croatian", "bg": "bulgarian", "lt": "lithuanian", "la": "latin", "mi": "maori", "ml": "malayalam", "cy": "welsh", "sk": "slovak", "te": "telugu", "fa": "persian", "lv": "latvian", "bn": "bengali", "sr": "serbian", "az": "azerbaijani", "sl": "slovenian", "kn": "kannada", "et": "estonian", "mk": "macedonian", "br": "breton", "eu": "basque", "is": "icelandic", "hy": "armenian", "ne": "nepali", "mn": "mongolian", "bs": "bosnian", "kk": "kazakh", "sq": "albanian", "sw": "swahili", "gl": "galician", "mr": "marathi", "pa": "punjabi", "si": "sinhala", "sn": "shona", "yo": "yoruba", "so": "somali", "af": "afrikaans", "oc": "occitan", "ka": "georgian", "be": "belarusian", "tg": "tajik", "sd": "sindhi", "gu": "gujarati", "am": "amharic", "yi": "yiddish", "uz": "uzbek", "fo": "faroese", "ht": "haitian creole", "ps": "pashto", "tk": "turkmen", "nn": "nynorsk", "mt": "maltese", "sa": "sanskrit", "lb": "luxembourgish", "my": "myanmar", "bo": "tibetan", "tl": "tagalog", "mg": "malagasy", "as": "assamese", "tt": "tatar", "haw": "hawaiian", "ln": "lingala", "ha": "hausa", "ba": "bashkir", "jw": "javanese", "su": "sundanese", "vi": "vietnamese",