V2T the Voice to Text Translator
This project all started as a complaint, as many do.
My professor was relating to us about how hard it can be for foreigners to work in Germany and the topic of language barriers during meetings came up. He lamented the amount of times he had to ask his coworkers to hold meetings in English because a lot of the assistants, whose primary job in these meetings were to take notes, were not proficient enough in German to be able to keep up and discern which topics to keep notes on. I related to the assistants, coming to Germany the hardest part in my opinion was in fact the language. Artikels. Dativ, Genitive, Akkusativ, even in my nightmares I still see my German teacher screaming about it.
As it so happened, the class that the professor was teaching was one of Machine Learning, and the final project would be completely up to us. The goal was simple, make it so that the user can record something in either German or English and have the application take down the Equivalent in a text file.
The result was V2T, a voice-to-text system that not only transcribes what you say but automatically detects the language and provides translations, making those dreaded multilingual meetings a thing of the past.
I successfully built a system that can handle live recording sessions, process batches of audio files, and organize everything neatly into session folders. The application works entirely offline for the transcription part (thanks to running Whisper locally), which means your sensitive meeting notes don't get sent to some cloud service where who-knows-who might be listening.

Features:
Live Recording & Transcription: Press spacebar to start recording, 's' to stop. The system immediately transcribes whatever you just said, whether it was in German or English. So no more frantically scribbling notes while trying to follow a rapid-fire German discussion about quarterly reports.
Automatic Language Detection: The system figures out what language you're speaking without you having to tell it. Perfect for those meetings where people switch between languages mid-sentence.
Instant Translation: Not only does it transcribe what was said, but it also provides the translation in the other language. German input gets English translation, English input gets German translation.
Session Management: Each recording session gets its own folder with all the audio files, transcriptions, and translations organized neatly. No more hunting through random text files trying to remember which meeting was which.
Batch Processing: Already have a bunch of audio files sitting around? The system can process them all at once instead of making you sit there and babysit each one individually.
Offline Transcription: The Whisper model runs locally on your machine, so your private conversations stay private. The only thing that needs internet is the translation part.
How It Actually Works:
Record with Spacebar, SoundDevice then saves the audio as a .wav file
Whisper AI transcribes the audio to text locally on your machine
LangDetect identifies if the language is German or English
Google Translate converts the text to the other language
Session Manager organizes everything into named folders with audio, transcription, and translation files
Batch Processor handles multiple audio files at once for bulk processing