Workshop Computational Ethnomusicology: Methodologies for a New Field



This is the first Lorentz workshop I went about a month ago. There were many useful lectures and discussions. It was nice to see many familiar faces again, too.

The time allocation was particularly good: lectures in the morning with plenty coffee time + working group and general discussion in the afternoon (with plenty coffee time of course). Everyone normally got their own office as well. Very nice working environment.

Lectures: There were a mixture of musicologists and computer science lectures. The speakers seemed had made adjustment to the audience – also a mixture of musicologists and computer scientists. There were some wisdom passing around, like: stay healthy from staying away a little from self abstraction, the falling Icarus painting, etc.

Discussions: The coffee time was long enough to meet old friends and new people, discuss about our recent progresses, make new plans about the future, and sometimes just chat away. Gotta love the coffee and juice machine.

Working group: It was a pain to decide which working group we should make and go to. But the ones I went to — language and music, visualisation, open discussion, etc. —  I learnt a lot from them. I hope I contributed to the discussion as well…

Social and Leiden (and Den haag): We stayed in a hotel organised by the Lorentz centre. It’s one of the best hotels I stayed in! Of course we also went explore the city of Leiden (went to a steak place where you can choose your own knife!) and spent a few hours in Den Haag meeting friends.


It’s been so long that I feel like I have forgot a lot about this workshop. We also planned to interview people and update the website… But the trips and the deadline last month successfully kept me away from writing anything about them on the blog. I’ll gradually write them back…

ASA, 5th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan

The second post on my Hawaii trip (Please see the first one here)

This conference is great!

I went to mainly the music and speech talks and poster sessions. Also looking forward to the data science talks on Friday. Some hot topics so far include: end-to-end ML, western and eastern music instruments, speech communication, etc.

There were a range different backgrounds of speakers: performers, teachers, people who works in industry, people from universities, etc., which creates a good diverse atmosphere.

And I’ve always been interested in the production and perception of music and speech. The speech communication sessions this time provided a good amount of information in this field of research. It was just exciting to see how many people is working on topics such as sound production in bilingualism, nuances in minority languages (Javanese, Hawaiian, etc) and popular languages (French, Spanish, Japanese, English, etc.).

The talks were very specialised though. Lots of terms I had to google. There were also patterns in conducting hearing experiments, using ultrasound for tongue movements, eye tracking, lips tracking, etc. This is probably also why there wasn’t anything with music and speech together: as the research question goes deeper and deeper, it’s harder and harder for the interaction between the two fields. But I believe there can be more progress and improvements for both fields if we look at them together. Someone just needs to be the fool.

Another fabulous thing  of this conference is the socials. One free Hawaiian show, two free dinners and a jam sessions. Spot on!

Accidentally, I had very pleasant conversation with Tony F. W. Embleton. We started with the wine holder he had, went to what we’re working on, and living in Canada and England, and history of this conference, and more great experiences.

Also, thanks to my Rochester friend who introduced me to her friend, and further this friend introduced me to her group of friends who’s attending this conference, I met lots of new people. And it’s always open my eyes to see what people are doing in their life and in research.

Another bonus of the conference: lots of languages listening practice. It just feels good to be able to understand fairly well when people talking in English and Japanese (and Chinese of course) and understand somewhat when people are talking in French and Korean.

(off-topic warning for the following content :P)

It also poses a problem though. Without consideration on my speaking skill level in these languages, it’s just hard to speak anything than English in conferences. People are looking for the most efficient way of communication. So when people are good at English, people speak in English naturally. Even when people are not that fluent in English, since they make efforts in speaking in English, some might be offended if we don’t speak back in English.

I heard some music theory conferences still have sessions in different languages. Really looking forward to attend one like that!

I don’t like making comparison when I’m not sure, but I’m definitely sure about this one: I enjoyed this conference much more than the AES I went a few months ago (see the post about it here). It doesn’t imply anything on the quality of the conferences, just a feeling, maybe influenced by the location, the focus, etc. The funny thing is that, both times, when I’m getting off the plane, there was a PhD offer for me. Completely irrelevant to the conference, but probably gonna make me love traveling even more.




141 AES convention in LA

I can’t believe I dragged a month to write this…

It was hard to write, to be fair.

This is my “third” AES convention, first fully registered one. Somehow I was in the city when 139 (NYC) and 140 (Paris) took place, so I went to the free parts. But this time, thanks again to the NSF-ICorps funding, I made it to LA and fully registered!

It is a little far fetch from my research. I’m learning a lot about acoustics though. And acoustics sometimes can be very relevant when we are using audio data or considering the cognitive side of music.

I made it easier for myself this time because something else was going on in my life (there will be a post about this). So I mainly went to the tutorials and workshops. There were an interesting tutorial about podcast making and another one about user testing. A thinktank-like session was there, too, but I didn’t go and thought it’s better to give others the opportunity (and being pretty tired). Heard they had a great student party after the day. I also went to the opening ceremony, learnt how the convention came to grow to this size, and how someone published 1000+ papers in these conferences. Finally, as a routine, I also casually wondered through the exhibition.

I also took the opportunity to visit the city a little, by myself and with a lab mate/friend and a childhood friend. Went to a $10, 4 hours concert in the Disney music hall, and lots of fun activities thanks to the convention coupons.

It’s late so I’ll stop here. But it was fun. Maybe someday I’ll show up again. Who knows 😀


SANE workshop

Sorry for not updating for a while. I’m having a few drafts backed up and not ready to publish yet. But trust me, they will come out soon.

Now, the SANE (Speech and Audio in the Northeast) workshop. This one day workshop totally worthed a 6 * 2 hours road trip! Honestly, I didn’t expected it would be such a high-level workshop. There were lots of people from Google, Apple, MERL, and academic institutes like MIT, CUNY, NYU, etc. All cutting edge results. Brilliant discussions.

There were a range of topics: acoustics, machine learning, speech, sound event in general, etc. But there was only one talk from the Google Magenta team (a very nice talk by Jesse Engel) about music. It was about an unreleased research about the waveform music LSTM training (e.g. deep dream audio, music hallucinations). One interesting RNN architecture was used: Multiscale Truncated Backpropagation (see photos below, sorry about the low quality, had to zoom in). It’s an internship student’s project but the idea of using a hierarchy of nodes was very interesting. Some other insights include the challenges (see photos below). The long term structure one is my favorite. And of course, Wavenet, the autoregressive CNNs, was mentioned. I need to catch up on reading that paper to understand that part of the talk… I heard that the talks are normally uploaded to youtube, so maybe keep an eye on this:


Other topics:

  • Environmental noise detection. One important issue in this area of research seems to be a lack of data. Various data augmentation methods seemed helped.
  • Neural science. An interesting experiment to read a ferret’s mind while letting the ferret’s listen to a human speech. The recovered signal from ferret’s EEG sounded not bad at all!
  • Machine learning structures. Actually this topic was in almost every talk. There was one by a MERL speak Shinji Watanbe, who used beamforming acoustic model + joint CTC attention network to simplify all the signal processing, microphone array, mask estimation, feature extraction and transformation etc. But it was still pretty complicated for me.

Surely one can see how those methods in other areas can be used in music!

It was also a nice mixture of posters and audience in general. I almost went through all the posters and it was very enjoyable talking to the presenters and other audience. Music posters are all from our lab though. Other posters:

  • speech + image processing (a paper to be presented in NIPS 2016, Yusuf Aytar et al)
  • adult vs. kid voice recognition (an internship at Comcast, model was not hard, implementation was done during his internship also, Denys Katerenchuk et al)
  • prosody influences from others (ongoing PhD work, Min Ma et al), and echolocation (it was amazing to know what blind people can do using echolocation)
  • etc

Also, I was there thanks to a NSF I-Corps funding (there will be a post talking about this once I finished the whole programme). One requirement of the funding was actually to “conduct interviews” (mostly likely to be chats in a one day workshop though) about the project we are doing and take photos with them (I promised not to post it online though :P). It’s a very good activity actually: lots of fun and memories with taking photos.

Interesting Posters at ISMIR 2016 (day 2)

Let me start with the posters on day2.

There’s the poster of Bob on Analysing Scattering-Based Music Content Analysis Systems: Where’s the Music? which won the best poster prize this year! It raises the questions of the lack a formal evaluation framework in MIR. And the drawing was great too 😀

From the industry side, I looked at Spotify, Gracenote, Yamaha (Steinberg) ones on the third day. All very interesting: Yamaha was presenting a range of products, from plugins to hardwares; gracenote was more about the metadata database; Spotify was more about what they do in the company.

There was also the paper from a friend: Bootstrapping a System for Phoneme Recognition and Keyword Spotting in Unaccompanied Singing. She used the singing database released by Smule (I didn’t even know it!) and cleverly extracted the phoneme by aligning the lyrics with singing.

One paper was about automatic guitar tabbing: Minimax Viterbi Algorithm for HMM-Based Guitar Fingering Decision. Someone must have been working on similar thing or I must have read it before. Looks like that it’s working, and it’s a good sign that the author is using it himself.

A neat analysis on MIREX tasks was this one: Cross Task Study on MIREX Recent Results: An Index for Evolution Measurement and Some Stagnation Hypotheses. A meta-look on which tasks is doing well and which tasks is in stagnation. Whether it’s good that a task is in stagnation, we don’t know, there might be an explosion after a period of silence…

Lots of new presentation skills: adding 2D code for people to download the dataset, doing survey on the spot, etc.

ThinkTank HAMR ISMIR COGMIR 2016 overview

Information overflow at the moment, thanks to these four events.

ThinkTank was held for the first time and it was very successful! There were 1-2 people from each major music technology company, giving lectures, mentoring around and networking and more. Obviously, it had to be more on general company structure, management, marketing etc., rather than algorithms and technical stuff. But I thought it was very useful for graduate students to know. At least now, with more knowledge in how these big and small companies work (different roles of software engineers, project managers and R&D people…) and what they care about (algorithms, code quality, tests, and team work…), students (at least I) can pay attention to some of the skills that would be truly useful in both industry and academia realm: I always have the dream that the two can work together, although some of the issues are always there, include copyright, PI, etc…But the industrial PhD positions in the UK are quite good at bridging the gap. It seems not that common in the US for some reasons…

HAMR was immediately after ThinkTank, held in Spotify. There were four interesting talks in the first evening and people were making plans as well. There were tutorials on the second day in which people without a plan can find something to do, which I find really neat. Everyone seemed quite focus and there were more than 20 presentation at the end of the day. The quality of the projects wasn’t quite uniform, but still very inspiring!
ISMIR began with tutorials. I went to an overview of the field, which summarised researches related to music very nicely, and an introduction to NLP in music, which had lots of information, tool boxes, examples. I’ve saved their slides for future references. Both are of very high quality. Go go MTG!

The actual conference then started. One very good thing was the poster sessions. Since all the papers are by default posters, I found them of very high quality! Most of the people were very good at presentations as well. The oral presentation was a mix of all different areas. Good stimuli and information resource for everyone!

For a one day event, COGMIR has been a comparable success to ISMIR. It was a shame that I couldn’t see any other poster since I was presenting one. Also a shame that most of the time my brain was just saturated and fatigued from the last week. On poster presenting, I probably did it for 5-6 times. Really enjoyed the process of taking in everyone’s comments and suggestions.

Social went moderately crazy this year. It was a little tiring due to some personal problems but in general it was just enjoyable to see all the familiar faces from other conferences. It was hard, though, to try to split time to different groups of people I know. And I think I actually screwed up a few important conversations. Very sorry…  Of course, what makes me happy is to meet new people. People there are full of characters! Talking to them during the conferences made me realised that there’re still loads to learn, loads of possibilities to explore. It’s one of the great things of conferences, retrospection. Another great thing is purely have fun: bowling and bar crawls were not bad at all! Two records was 4am and 3am, but other night were more sane, about 12-1am. I get heathy sleeps when there are interesting things for me to do!

Next year, ISMIR is going to happen in China. I’m very looking forward to it, just wrote an email to volunteer. Hope my communication skills and knowledge in MIR and China could help a little!

Now it’s been more than 9 hours transit for me. Getting tired. Thinking about writing another post on interesting papers. But perhaps not today. Will follow up soon!