Language learning phase one

I went to the North American Polyglot Symposium in June 2016. My passion for learning languages got rekindled there (it was never completely gone!)

I’ve been listen to many good podcasts about language learning since(I can’t believe I didn’t even know they exist!):

I will teach you a language:

Actual fluency:

Sometimes the topic is a little repetitive, but everytime I think about this topic, there’s something new, especially if I’ve learnt a new language.

There are more good resources I’ve been using:

Multilingual DVDs: I have loads of them at home now (thought of using them for research at one point!). It’s very easy to get in Europe. There’re always subbed/dubbed in 3+ languages for one movie/anime/TV show. Now Netflix Originals have this feature, too. Learning with fun!

Various youtube channels like (영국남자 Korean Englishman, Golden MoustacheEasy Languages, etc.) and a Chinese website that has proper lessons:

The real thing is always practice with native speakers, or just ask them questions. I’m lucky to have many friends who want to like to skype with me. Thank you guys so much! It’s also good to have a teacher who has more experience in teaching the language, and I do pay a little for that. I’m gonna shamely paste my invitation code here (it’s a really nice system though!):


More personal experiences:

The most fun I get from language learning is to look at different representations of the same object and new strutures of new concepts.

I don’t know why but it’s also very automatic to me to repeat after people, especially when it’s a new language and it’s something worth remembering. Imitation, down to every detail, is fun. Strecht the perception!

And plus that I like to talk to myself using different languages: different medolies associate with the same emotion, more ways to express, etc.

Finally, it’s just much more reassuring when you understand more, right?

TAR, Teaching as research, and education

This draft has been staying here for very long. Although the project is not entirely finished (don’t know if it’s ever going to be entirely finished), I think I’ll just say what I remembered from this experience anyway.

In 2016, I participated in the Teaching as research programme with the Centre for Excellence in Teaching and Learning, University of Rochester. It’s been a nice experience. But due to my early departure, I couldn’t get all the fellowships….But that’s ok.

What’s more important is what I remember from tit:

  1. Getting the IRB protocol approved (to get permission to conduct an experiment with human subjects)
  2. Participating in workshops for teaching and diversity, curriculum design, new methods of teaching, etc.
  3. Actually participating the class and distributing questionnaires

The theme of our project was: whether shorts breaks/pauses are going to improve students’ experience in class

We distributed a short questionnaire (5 questions) in every class. The questions are about their interest level in class, their focus, their enjoyment, and whether they feel good to tackle the homework.

The class was twice every week. So we introduced the short breaks (usualy a comic) in of them and didn’t do any break in another. And the statistics of the questionnaries are shown as below:


The box on the left of each subplot is the one without breaks, and on the right is the one with breaks. We do see the difference is not great. But the medians are a little bit higher. Plus that we had different number of subjects every class, it’s actually very hard to tell…

But it was a good thing that some students left the comment that they liked the comics! (We intentionally didn’t tell the students anything about that this is an experiment)

I also trie to plot some individual plots on the first three lectures. Lecture 2 had the comic while lec 1 and lec 3 didn’t. It’s interesting to see that in lecture 2, there are more people on scale 4 & 5. enjoy

I also find hard for anyone to draw any definite conclusion in an experiment like this though. Also because I was taking the class while doing this experiment, it complicates things a little bit more.

Anyway, it was fun to talk with people about education. In the future, I think there will be more teaching duties for me one day. Hopefully there could be something fun with it!


The symmetry between recall and precision

Arguably the most common metrics: recall and precision, are very easy to understand once you see the symmetry.

What makes everything clear is the reference: what is the reference frame for this number (the denominator). For recall, the reference frame is the ground truth data; for precision, the reference frame is the data we found using our algorithm. The numerator is the same thing, so that’s simple.

It’s basically different normalisation methods. Recall normalise with the number of things we are trying to approach (ground truth), so the recall tells you how much we approached the “truth”; precision normalise with the number of things we calculated, so the precision tell us how likely that our calculation is correct.

The ideal case would be a bijection: for every ground truth we have one calculated value corresponds to, and vice versa. But this is not always the case! If it’s possible to have repeated value in the ground truth or the calculated value, be careful to not double count!

The problem is clearer when we allow a certain degree of fuzziness:


For example, the problem in the figure above, we want to see: given the ground truth (in red), how close are the calculated values (in orange). So we need a threshold to define how close is close!

It’s easy so far because in the case above I plotted a bijection. How about the cases below:

What should we do now? The precision and recall better not to be > 1, right?

One way is to “find” a one-to-one(injective) mapping (not necessarily onto or surjective) by not counting the value even if they are close to the ground truth/calculated value.

Also, depending on which reference frame you are in (whether you are calculating recall or precision), you need to discard different values. The mapping can be different! And the way to create the two mappings is symmetric to each other! You can even find a injective/surjective symmetry there. Fun fun!

Commutativity and Associativity

I remember learning associativity and commutativity when I was in elementary school.The formula for the associativity is (a * b) * c =  a * (b * c) and for the commutativity is a * b = b * a , but they look somewhat familiar and confusing to me back then

I think it’d be fun just to draw an illustration:
The key is to think of a concrete operation * (the weird box shape in the illustration)

Think of it as a meat grinder or a blackbox or an actual program. Anyway, it takes two input and gives an output.

So now it becomes clear that the associativity is about how to combine the two operations and the commutativity is about the property of the operation itself.

However, there is a connection between them: the associativity is switching the order of which inputs are getting the operation performed first. It’s commutativity “in time”!

If you still don’t get it, here’re more details: Since commutativity is basically just that, when the input order is changed, something preserve, then we have that associativity is actually also a kind of commutativity: it doesn’t matter whether you mix a and b first or b and c first. On the other hand, like we mentioned before, commutativity is more like a switching back and forth in space when preserving some properties.

Anyway, here’s a fun thing. Knowing that in maths commutativity seems to be less common than associativity, I searched for a real life example of being commutative but not associative. Then from the mighty stackoverflow, there is a surprisingly easy one: giving birth! And then it occurs to me that when there’s probability involved, these properties break down…

Going even further, in the category theory, the input becomes associative function (operation). People then explicitly call the diagram commutative!

Using counterpoint rules to check messy symbolic music data

Thanks to my colleagues suggestions and dataset, I tried to use part of my counterpoint programme to check whether the rules in species counterpoint are helpful for error detection in automatic transcription.

I just tried it on one song and its two voices:

It’s said to be a hymn from long ago. But probably not long enough.

I ran the tests on checking parallel fifth and parallel octave and inharmonic for the first note of each bar. The original song shown above have got more than 5 inharmonic configurations (measure 4 and 7 for example).

Update: an obvious mistake I made here: shouldn’t compare the inner two voices! I ran it again on the first voice with the bottom voice, there were no rule violations. 

And then I ran the tests on the version with messy notes and duration, as shown below. We then got only 4 inharmonic places.

No parallel octave and parallel fifth on either one. This error is not that likely to happen by chance I guess.

Using MOSAIC, an algorithm originally used to segment multiple aligned DNA sequences, to find segments in Chopin’s Mazurka Op.24 No.4

Thanks again to my colleague’s recommendation, I tried a new algorithm on the data I have been working on a while, the Chopin’s Mazurka (related posts see Visualising the problems with current music pattern extraction algorithms and Algorithmic music pattern collection )

The input is a binary matrix consists of whether there is a pattern or not in a particular point in time. Rather like the input for the support figure in Visualising the problems with current music pattern extraction algorithms. Just a toy example: if we have 3 detected patterns [1,1,0,0], [0,0,1,1], [0,1,1,1] (ones represent that these time point belongs to the pattern, zeros represent that these time point do not belong to the pattern) in 4 secs of music,  The matrix will look like




The paper of the algorithm can be found here (only two pages!)

The algorithm is interesting but the results are plain. Using my colleague’s scripts, the segmentations are as follows:


The plot at the top corresponding to the stem plot of the PI in the paper. As we can see, there are lots of noises and almost no plateaus, which are important for the segmentation in this algorithm.

The mid plot is the segmentation results and the bottom plot is the input data (it’s black and white because it’s binary, actually the same plot as in Algorithmic music pattern collection)

Maybe because the vocabulary is only binary, maybe because the algorithm is designed for DNA sequence but not music, it’s not give a great result. More tweaking in the future perhaps.