What I learned during my Masters in NLP

#nlp #datascience #octograd2020 #education

I had the idea to write this post by the Github graduation initiative.

I started my Masters degree in NLP in LMU Munich, Germany in 2018. I had been interested in machine learning for about a year and had completed Coursera's Deep Learning specialization. I wanted to learn NLP more deeply and become a kind of expert in the topic. That's what a Masters degree is supposed to make you, right? After spending 2 years receiving lectures on it and doing programming assignments, I should be able to contribute to the community. Or so I thought.

This degree was built for 2 types of backgrounds: computer science and linguistics. There were some introductory classes for both, which means you wouldn't need one half of them. Because the students had such diverse backgrounds, the bar was set low.

Go beyond lectures

I realized if I just went to class and did the assignments, I wouldn't learn much. So first I looked for more advanced courses. I took the most advanced computer science courses I could find. These were very demanding but there was much room to grow.

You probably won't have much time left with student jobs or assignments, but if this situation applies to you, I strongly encourage you to use the resources provided by the university in any way possible. Attend other lectures as a guest, read the course literature, ask questions in class and try to talk to the professor or instructor about the topic, they will tell you things they don't mention in class.

Read papers

In NLP, the field has been moving incredibly fast in the past few years. The course materials we were given were trying to keep up but they weren't. Using Python in assignments was very recent, and there was barely one Deep Learning course in the last semester. I realized if I really wanted to learn what was going on in NLP, I would have to learn it on my own. I started reading papers.

I got a paper that seemed introductory, I didn't understand much of it so I went to the papers that this one was based on. I didn't understand these either so I kept going back and accumulating papers until I had a 2003 paper I could understand, and 15 others in my queue.

I slowly started to go forward in time and understand the basics of NLP, in a way that I wouldn't do during my lectures. I used this opportunity to make a Github repo with my notes, which has been growing steadily since then.

Understanding papers was challenging at first, but with practice you get better and can understand better what they present. In a way, I think this arduous and long task of reading so many papers was my actual Masters in NLP.

Do programming projects with a professor if you can

The programming side in many of my lectures were optional, which meant most people used their energy on the other demanding mandatory assignments. But I saw an opportunity here. I could do a project, have much of the professor's attention because they wouldn't be busy with other students, and have a more personal mentoring.

And that's what I did. My professor/mentor was a doctor in Linguistics turned NLP researcher, who offered a very fresh view on NLP, since most of the other professors were computer scientist, and their ideas came from the other side.

This professor didn't challenge me on the coding side, but she made me difficult questions about my results, their interpretability, meaning and impact, and forced me to think from a completely different perspective than the one I was used to.

Many times I'd spend hours coding and finally get to have some results, and I'd think 'ok work done'. But then I'd show her my work and she would ask me a thousand questions and I'd have to go back and think. Not think about how to code this or that, but think about what the results were telling me. It was frustrating and fantastic, I learnt so much.

Unfortunately the semester ended and I was faced with more workload, but there was so much material to work with and I want to go back to it. This is the project I did: exploring the semantic similarity between contextualized embeddings.

Own your master thesis

By the time I had to start thinking on the master thesis, I had read many papers, knew what the field was currently doing and I had some sketches of research ideas I could work on during my thesis. I looked for approachable professors who'd want to tutor students for a thesis, and I did a tour of professors, seeing what projects they were working on.

I was surprised to see that they were delighted by my initiative, that I had taken the time to read papers and even to have my own research ideas. They told me most students just take any available project, follow the instructions of the professor and just present something.

I realized then how a little initiative can make you stand out in the crowd. In the end I picked a project suggested by my supervisor, but I made sure to make it mine. I tried to really deeply understand it, read the papers related to it, which by this time it didn't feel like much work.

In the beginning I didn't understand what I was doing, I was just doing what he told me. But slowly, by asking question and realizing my knowledge gaps, I started to own the thesis. I started to talk more in meetings, come up with ideas, and to just be in control of my own thesis.

As of May 2020, I'm still working on it. I recently managed to replicate a paper from April and even improve its results by 2%. I don't know what will come of my thesis or if we'll manage to publish a paper, but I wish I could work on this longer. I'm learning so much.

Conclusion

In the end, whatever degree you choose, you will only learn what you want to learn. So I encourage you to put in the work, do your best. There will be many people graduating next to you, so do the work to stand out.

Top comments (2)

amananandrai • Jun 8 '20

A very insightful blog. Most fresh graduates, aspiring to be data scientist and NLP engineers do not try to analyse the problem much and jump directly into coding and designing models. It would be ok to take a step back do some research, read papers and analyse the problem. Then you will have fresh ideas to solve the same problem. In the end I really loved your blog and approach to learning NLP.

Ane Berasategi • Jun 8 '20

Thank you! I agree, with all the hype in machine learning many people want to just get into it and train neural networks, which you can do of course, but if you do just that, you don't understand what you're doing. I didn't get many fresh ideas haha but it did make me feel like I was understanding what was happening in the field, and I could understand new papers more easily.