I like to record lots of things, lectures, lessons, sessions, environmental sound, etc. They are fun to listen afterwards. But it’s hard to navigate and organise them.
Another relevant background is that I’m involved in a research on voice training (we also used it to apply for the NSF I-Corp programme, there will be a post about it). The data we have are lots of voice lesson recordings. So the first step is to differentiate the teacher instruction parts and the actual singing parts. While doing this, I was just thinking “I’d like to have something that can automatically segment my recordings to help me navigate”. And there we go, I’m creating a audio segmentation tool to do this.
The pic below is what I have so far. I used PyQt5 + Python3 for this. It’s not the best interface design, but I’m definitely learning a lot through doing this, especially object oriented ideas.
Pretty clear that we can see there’s a spectrogram on top, and then a waveform (red is singing, green is conversation/speech), and then control buttons and plot interaction tools. At the bottom is another idea that maybe we can transcribe the speech part and provide cues to the recordings. Google speech recognition API was used for this.
There are still much work to be done. Before deciding on using PyQt (made the decision based on my familiarity with python and deadlines…), I actually looked into other very nice modern interface designs. Maybe when I have time, I might want to transfer it to something fancier like in swift or to a webapp. But at the moment, still a few touching ups to be done: tidier transcriptions, better image manipulation, faster algorithms etc.
It would be nice to add a database structure to organise the recordings and calculate some stats out of the data. But I doubt I’d have the time soon… Will be willing to collaborate so please shoot me an email if you’re interested.
P.S. For the algorithm to differentiate the music parts from the conversation parts, please see have a look at this very nice package (I also used this in Audio Features Self-Similartity Matrix shows something interesting and Audio Features Self-Similartity Matrix shows something interesting Part-2):
I used its feature extraction + SVM function