Description

This is a Gradio application that converts video files into subtitle (SRT) files by extracting audio from the video and transcribing the speech using AI. It automates the process of converting videos into subtitle files, making it useful for users who want to create captions or transcriptions for their video content.

Model Inference

This application uses the Whisper large model for automatic speech recognition (ASR). The model is accessed through Hugging Face’s Transformers library using the pipeline function, which simplifies the task of running the model. It transcribes the speech in the extracted audio from the video and converts it into text.

Whisper is hosted and distributed via Hugging Face, but the model itself was originally developed by OpenAI.

Tools Used

Process Overview

Try it Out!

Preview:

Content of transcription.srt File: