loading...
Cover image for Solving Japanese learning problems with code

Solving Japanese learning problems with code

leoat12 profile image Leonardo Teteo ・5 min read

While I polish my skills I also like to get involved in learning languages. I learned English, my native language is Portuguese, and now I'm learning Japanese due to my love for Japanese culture. Yesterday I had the idea to automate a rather boring task that it was hindering my learning using NodeJs. Let's start from the beginning.

Background

I am learning Japanese since 2015 already, it has been a long journey and I am still far from fluency, but I am in a stage where I can read mangas (Japanese comics) with relatively ease and books with assistance of a dictionary. This week I started a new book and decided to give another chance to Anki, a very powerful flashcard application that are very famous among Japanese learners, but it can be used to learn virtually anything. I already used before the same way: I read the the book with a dictionary opened and every word I don't know I add to a .txt file to add to Anki afterwards and then start the memorization process. However, there is a problem, which probably made me stop using Anki before. Let's go into that.

Problem

Anki has an import feature where you can make a .txt file declaring both sides of the flashcard separated by a semicolon, like this:

傍ら;side, edge, beside, besides, nearby, while (doing)
飢える;to starve, to thirst, to be hungry
へたり込む;to sit down hard, to sink down to the floor
払い除ける;to ward off, to brush away, to fling off, to drive away

But you have to do this file somehow and at first I did this manually. I took note of all words I didn't know, maximum of 50 per day to not to have too much to learn at once and after that I went to the dictionary and copied the meaning to the other side of the flashcard. Furthermore, in Japanese there are three types of characters: hiragana, katakana and kanji. Simply put, kanji represent ideas, for example, 愛 means love, while hiragana and katakana represents sounds and are used to describe how kanji must be read. Using 愛 as example, its reading in hiragana is あい, which in our alphabet is written as ai. For a more detailed explanation, you can refer to Wikipedia, where it has a very good summary of how it works. Therefore, it is also important to remember how the words with kanji are read, so I had to make another file that looked like the one below, the words and its reading separated by semicolon:

傍ら;かたわら
飢える;うえる
へたり込む;へたりこむ
払い除ける;はらいのける

The problem is that this manual task is very boring and time consuming. I had to copy every word, look at the dictionary, copy the meaning and reading and after that import into Anki. The dictionary is digital, so it was a matter of Crtl+C + Crtl+V, but still took 30 minutes or so to have 50 words ready. It is also error prone since I can confuse reading with meaning, put in the wrong file or mix words' meanings putting it in wrong rows. I had to do something to improve this experience and make the reading fun again, so I came up with the idea to do a script to do that.

Solution

Since it was a relatively simple script, I decided to take this opportunity to practice NodeJS, which I'm learning right now. However, it is not as simple as it looks since it is necessary to have a dictionary to feed the application. Luckily, I had a dictionary sitting on DynamoDB tables that I created for another project using Lambda and API Gateway to access it. Hopefully, in the near future I can talk about this other project as well, but for now assume that the script has access to a API that return the words found according to the term given as parameter, like this: example.com?term=愛.

With this major problem done, it was just a matter of calling the API and parsing the response and write the files. The entire script was made using just three libraries:

  • axios: http client library to call the API. I had very good experiences with it in the past since it seems much more straightforward than the others I had contact with.
  • fs: standard library to deal with files I/O in nodejs.
  • progress: make it more responsive while the work is done by having a progress bar.

First I declared some variables to store the content of the input file, a file with each word in a line, I split them and stored into a array to be used later. The variables that will store the result are also declared:

let input = fs.readFileSync('input.txt', {encoding: 'utf8'});
let terms = input.split('\r\n');
let outputReading = "";
let outputMeaning = "";

Then I created an axios instance to use and then the function that I use to call the API and get the word I desire:

var instance = axios.create({
    baseURL: "https://api.example.com",
    headers: {'x-api-key': "xxxxxxxxxx"}
});

async function getWord(term){
    const response = await instance.get("/dictionary", {params: {term: term}});
    return response.data.body[0];
}

In the function I call the API and returns the body of the response. The response is an array with the possible results for the search. A simplefied description of the schema is as follows:

{
    "statusCode": 200,
    "body": [
        {
            "Id": 1,
            "kanji": [],
            "kana": [],
            "sense": [
                {
                    "gloss":[]
                }
            ]
        }
    ]
}

The response has more elements detailing the entire word, but it is important for the problem I was trying to solve was the following:

  • kana: An array with all the readings of the word. A word can have more than one reading, but the first one in the array is the most popular and generally the one I'm looking for.
  • sense: An array with the meanings and its information: the part of speech, dialect, related words, antonyms, etc. A word can have different meanings, but one meaning can have a lot of words that are synonyms among them.
  • gloss: The synonyms are stored here in an array.

All the objects stored in the arrays mentioned has a text field where the information we are interested is stored. Going to our previous example with the word 愛 this is what the response looks like in a summarized way:

{
    "statusCode": 200,
    "body": [{
        "kanji": [{
            "common": 1,
            "text": "愛",
            "tags": []
        }],
        "kana": [{
            "appliesToKanji": ["*"],
            "text": "あい",
            "common": 1,
            "tags": []
        }],
        "Id": 1150410,
        "sense": [{
            "gloss": [{
                "lang": "eng",
                "text": "love"
            }, {
                "lang": "eng",
                "text": "affection"
            }, {
                "lang": "eng",
                "text": "care"
            }]
        }, {
            "gloss": [{
                "lang": "eng",
                "text": "attachment"
            }, {
                "lang": "eng",
                "text": "craving"
            }, {
                "lang": "eng",
                "text": "desire"
            }]
        }]
    }]
}

After getting the response, to handle it and get the result in the format I want, I created two functions to handle the meanings and the readings, respectively. Below we have the handleMeanings function as an example:

function handleMeanings(term, word){
    let meaningsArray = []

    for(sindex in word.sense){
        let glosses = word.sense[sindex].gloss;
        for(gindex in glosses){
            meaningsArray.push(glosses[gindex].text);
        }
    }

    let joinMeanings = meaningsArray.join(", ");

    let result = term + ";" + joinMeanings + "\r\n";
    return result;
}

For each sense I iterate through its glosses list and push to an array, then I join everything, pretty simple, that's just what I want.

Conclusion

For the people who saw the title and the "scary" image and thought it was something much more complex, I'm sorry. It was very simple and even anti-climax, but it is really helping me keeping up with my studies. Now the problem is to do all the reviews, I will try my best! :D
If you think something can be coded better, please let me know. NodeJS is still new to me!

Posted on by:

leoat12 profile

Leonardo Teteo

@leoat12

Java Web Developer with a passion for Spring and cloud computing. Know a thing or two about AWS. Trying to learn NodeJS lately with the help of TypeScript.

Discussion

markdown guide
 

Really cool Leonardo! I'm a big fan of Anki as well; it's a great tool for learning languages or really anything. Looking forward hearing more about your NodeJS and Japanese learning journies in the future!

 

Thanks! This was the first idea I really implemented completely so far, I'm very happy with the result. Hopefully, in the near future I will talk about my other ideas!

 

Wow... Would you gimme some tips on learning English? Like I already have the basics and I would like to improve... And I've learnt a bit of Japanese also but didn't manage to know how to read or write... Just learnt through translation or subtitles on animes like Naruto... My native language is Portuguese as well... 😍😍😍 Appreciated your leaning journey... Would you please share your sources?

 

My English learning path is nothing out of the ordinary at the beginning, I did a English course while I was in high school and spend my free time reading mangas and comics in English, I also talked to people on international forums, practicing my writing. When I went to college I had the opportunity to go to the United States for a year to do an international exchange, so there I practice a lot and where I reached an advanced to fluent level. There wasn't any specific sources than simple practice.