Audio Segmentation Interface Design

I like to record lots of things, lectures, lessons, sessions, environmental sound, etc. They are fun to listen afterwards. But it’s hard to navigate and organise them.

Another relevant background is that I’m involved in a research on voice training (we also used it to apply for the NSF I-Corp programme, there will be a post about it). The data we have are lots of voice lesson recordings. So the first step is to differentiate the teacher instruction parts and the actual singing parts. While doing this, I was just thinking “I’d like to have something that can automatically segment my recordings to help me navigate”. And there we go, I’m creating a audio segmentation tool to do this.

The pic below is what I have so far. I used PyQt5 + Python3 for this. It’s not the best interface design, but I’m definitely learning a lot through doing this, especially object oriented ideas.

Pretty clear that we can see there’s a spectrogram on top, and then a waveform (red is singing, green is conversation/speech), and then control buttons and plot interaction tools. At the bottom is another idea that maybe we can transcribe the speech part and provide cues to the recordings. Google speech recognition API was used for this.


There are still much work to be done. Before deciding on using PyQt (made the decision based on my familiarity with python and deadlines…), I actually looked into other very nice modern interface designs. Maybe when I have time, I might want to transfer it to something fancier like in swift or to a webapp. But at the moment, still a few touching ups to be done: tidier transcriptions, better image manipulation, faster algorithms etc.

It would be nice to add a database structure to organise the recordings and calculate some stats out of the data. But I doubt I’d have the time soon… Will be willing to collaborate so please shoot me an email if you’re interested.

P.S. For the algorithm to differentiate the music parts from the conversation parts, please see have a look at this very nice package (I also used this in Audio Features Self-Similartity Matrix shows something interesting and Audio Features Self-Similartity Matrix shows something interesting Part-2):

I used its feature extraction + SVM function

Learning graphical models and problog

This is yet another post long overdue….

Along with some other purposes, to get a better foundation in Artificially Intelligence, I revised graphical models and tried out an interesting probabilistic logic programming language “problog”.

(A good starting place:

I’ve not completely understood it yet, but it seems the language is helping in calculating lots of interactions of events according to probability theory and graphical models, etc. It’s been always a very cool idea to me to create automised calculation and machine learning algorithms.

In addition, I would highly recommend these talks by Christopher Bishop, it goes every well with his book Chapter 8:


I wish I had watched this before I got to Rochester. I might have understood David Temperley’s model better. But it’s never tool late. There are lots of potential applications of these models in music information research. And I think that might be one crucial aspect that I was lacking of. It’s funny how and what you think and believe can change so drastically after learning a new powerful subject like this: for me, now it’s even strange to not consider an element of probability in the frameworks and algorithms.

Three PhD Applications


If someone were to tell me three years ago that I’m going to go through two more PhD applications, I think I would have freaked out. But now, I’m feeling I totally needed it and it’s good for me.

My first application was in 2013. I got offers in maths and physics, but chose to attend a double master programme in complex system science.

My second application was in 2015. I got offers in Germany and the US, both relevant to audio, and chose the US one.

My third application was just a few months ago. I have decided to accept the offer from Utrecht University and move there next February. The programme is computational music structure analysis and functional programming. Really looking forward to it!

Since it was some time ago, I don’t recall every detail in my first application. But for sure, I felt I learnt a lot from the process. A new world was opening in front of me. After the 1.5 year in Europe, the perspectives changed again while I was applying for the second time. More focus and knowing directions. But there were more at risk, and I didn’t handle it perfectly. For the third time, I was more cautious and bold at the same time. And the results made me quite happy!

My problem is having too many things and wanting too many things. It sometimes makes me every productive but sometimes overwhelms me. Always trying to find a balance in this!

141 AES convention in LA

I can’t believe I dragged a month to write this…

It was hard to write, to be fair.

This is my “third” AES convention, first fully registered one. Somehow I was in the city when 139 (NYC) and 140 (Paris) took place, so I went to the free parts. But this time, thanks again to the NSF-ICorps funding, I made it to LA and fully registered!

It is a little far fetch from my research. I’m learning a lot about acoustics though. And acoustics sometimes can be very relevant when we are using audio data or considering the cognitive side of music.

I made it easier for myself this time because something else was going on in my life (there will be a post about this). So I mainly went to the tutorials and workshops. There were an interesting tutorial about podcast making and another one about user testing. A thinktank-like session was there, too, but I didn’t go and thought it’s better to give others the opportunity (and being pretty tired). Heard they had a great student party after the day. I also went to the opening ceremony, learnt how the convention came to grow to this size, and how someone published 1000+ papers in these conferences. Finally, as a routine, I also casually wondered through the exhibition.

I also took the opportunity to visit the city a little, by myself and with a lab mate/friend and a childhood friend. Went to a $10, 4 hours concert in the Disney music hall, and lots of fun activities thanks to the convention coupons.

It’s late so I’ll stop here. But it was fun. Maybe someday I’ll show up again. Who knows 😀