Programming
Put your audio files and speeches into text with Python
Photo by Jason Rosewell on Unsplash
What is Speech Recognition?
It is referred to as speech or voice recognition. It includes making audio signals meaningful by sampling, artificial neural networks, machine learning.
We know these practices well. Applications such as Apple Siri, Google Assistant, Amazon Alexa are available on most of us. These applications are, of course, far ahead, and there is serious engineering in their background. In addition to making sound signals meaningful in their applications, NLP -Natural Language Processing algorithms are also used.
Speech to text, the process of converting speech to text, is the first step to create a voice assistant. So first, we expect the app to understand what we?re talking about, right? If it can send the sounds, it understands to its engine as text, the first step in processing this sound data is over. In this article, we will implement two applications through Python and by using several libraries and focus on the speech to the text section.
1. Application ? Converting Audio File to Text
In this application, we will try to convert audio files to text. Converting audio files to text is one of the subjects of data science in particular. For example, you can create a chatbot by processing voices or categorize the records coming to the call center by running NLP.
Our head-end library, SpeechRecognition, where we will process our voices. As can be understood from this library name, it is based on speech recognition and communicates with many APIs.
APIs supported by the library are;
- CMU Sphinx (works offline)
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Bing Voice Recognition
- Houndify API
- IBM Speech to Text
- Snowboy Hotword Detection (works offline)
Let?s Start Coding!
1- Let?s install the SpeechRecognition module.
pip install SpeechRecognition
2- Let?s download the library.
import speech_recognition as sr
3- Let?s assign the recognizer to the variable that will perform the recognition process.
r = sr.Recognizer()
4- Let?s create our audio file.
It is essential to know which file types this library supports before creating the audio file. Supported formats are as follows.
- WAV
- AIFF
- AIFF-C
- FLAC
It?s okay if we work with mp3, m4a, or a different type. We can use audio file converter websites for this. I used zamzar for this process. We upload the file we have and output it in a .wav format.
zamzar.com
We assign the sound file we created to the file variable.
file = sr.AudioFile(?deneme.wav?)
5- Now, we can convert the sound into text.
with file as source: r.adjust_for_ambient_noise(source) audio = r.record(source) result = r.recognize_google(audio,language=?tr?)print(result)
Here we used the recognize_google method. This method used the Google Cloud Speech API. Besides, we have set the language to language = ?tr? so that it can perceive Turkish sounds better.
6- Run the code, and our output is ready now.
In our example, we put our small file, which we call ?deneme deneme? (Turkish words) for 2-seconds into sound processing, and the result is exactly what we expected.
You can also see that it can convert much more complex and long audio files into texts easily. For this, you can record the sound of different lengths and complexities, assign them to the file variable, and experience the results.
2. Application ? Instantly Convert Our Microphone Voice to Text
In this application, we will try to convert these sounds into texts instantly by using our computer?s microphone. For this, we will use the SpeechRecognition library again. We also have a new module. Pyaudio. This module is required to be able to receive sounds.
Let?s Start Coding!
1- Let?s load the modules
pip install SpeechRecognitionpip install pyaudio
If you are doing this on mac, you have to install it on portaudio. We will do this with brew.
brew install portaudiopip install pyaudio
If you downloaded SpeechRecognition in the previous app, you don?t need to download it again.
2- Let?s download the library and assign the recognizer to the variable.
import speech_recognition as srr = sr.Recognizer()
Again, if you did these steps in the previous application, you don?t need to do it.
3- Let?s convert the sound into text.
with sr.Microphone() as source: r.adjust_for_ambient_noise(source) data = r.record(source, duration=5) print(?Sesinizi Tan?ml?yor??) text = r.recognize_google(data,language=?tr?) print(text)
We introduced the file as a source in the application we created using the audio file. Here we introduce our microphone as the source, and for this, we use the sr.Microphone () method. In the same way, we set our language to Turkish. We can adjust the audio listening time if we want. In this example, it is set to 5 seconds. We can keep the seconds longer if we wish to.
4- Run the code, and our output is ready now.
We repeated the ?Merhaba Dnya? (Turkish word) cycle for five seconds, and the result is as follows. You can see that it works in much longer sentences.
To understand how the module works, we can better see how the engine produces alternatives using the show_all = True option.
text = r.recognize_google(data,show_all=True,language=?tr?)
These two applications we have realized are two straightforward examples of how sound is processed. What we did and the source we used was the Google Cloud Speech API. It is possible to add much more to these applications. If you want to use sounds in your projects somehow, it is easy to research and specialize in these modules. Especially if data scientists want to process audio files and use them in machine learning projects, they can use these libraries during pre-processing data stages. I hope these two applications have been useful for preliminary information. Thank you for reading!
My Linkedin Profile;