back to homepage

moronmode's Korean journey

If you're on mobile, scroll past this ugly split box for the blog section.

Warning: This page assumes you have some familiarity with Antimoon and/or AJATT. You should read up on them first if you want to really read this page, but the tl;dr is that we learn language by consuming lots of native input, and enhance memorization by creating flashcards from certain inputs and putting them into an SRS.

Otherwise, if you want a quick intro, see my opinionated guide on how to get started as an absolute beginner.

Wall of Learnings

Learnings from my mistakes, or other general advice.

  1. Prioritize immersion and remember Anki is only a tool: Anki is motivating and feels good, but it is almost worthless without a solid amount of immersion. In fact Anki/SRS is just a supplement to immersion. You can speedrun 30, 40 cards per day but without immersion you will not get very far. Do not forget to immerse with native content daily.
  2. Immerse actively without TL subtitles: Immersing without any subtitles is important for listening practice, even if it means you can't understand almost anything at all. (shamelessly ripped: No one says, "I'm going to stop using training wheels once I can ride without them". You first take them off.)
  3. Don't study vocabulary (alone): Isolated vocab cards are cringe and a waste of time. Only study words in-context with sentence cards. If you feel like you aren't learning words from putting sentences in SRS, you aren't immersing enough.
  4. Avoid premade decks: Premade decks are OK for absolute beginners to learn some grammar, but do not rely on them. The best deck is the one you make yourself. Once you can find i+1 sentences reliably, make your own deck.
  5. Get a good dictionary: Make sure to get a comprehensive dictionary from the get-go. For Korean, Naver is popular but IMO it kinda blows compared to the dictionaries on korean.go.kr: 한국어기초사전 for more common words and simpler monolingual definitions, 표준국어대사전 for a wider range of words and typically more comprehensive monolingual definitions, and 우리말샘 for slang, regional dialects, nonstandard spellings, etc. 한국어기초사전 in particular contains more complete list of definitions for words than Naver, has example sentences organized per definition, and is available online and offline in an XML format. If you know what you're doing, you can also use this to convert the XML to a GoldenDict-friendly format (xdxf).

Journal / Update Log

2026-04-04 : High-Variability Phonetic Training


Long time no see (once again). I haven't had an update here in a while for the same reason I gave last time: my progress had been steady with no major changes. I read and listen every day. I add sentences to Anki. There isn't too much flair.

But I did recently become interested in a topic called High-Variability Phonetic Training (HVPT). It's a type of minimal pair training that focuses on getting a variety of speakers varying in age, gender, dialect, etc. and the research shows that HVPT improves your ability to perceive phonemes correctly. I became particularly interested because I find that my listening ability relies too much on context/knowledge rather than true phonetic understanding. For example, I still can't reliably tell ㄱ and ㅋ apart, because if I heard for example 기 or 키, the distinction is usually obvious based on context.

If you don't know what minimal pair training is and you also don't know Korean to understand my example above, an English language example could be L/R distinction for Japanese people. Japanese people learning English notoriously mix up their Ls and Rs, because they don't have anything resembling an L sound and their R sound is not quite like an English R. So an example of a minimal pair would be ("light", "right") - two valid words that have different meanings and only differ in one phoneme. So L/R minimal pair training for a Japanese person would be like: listen to audio clips from a corpus of people saying "light" or "right" - after each clip, you are presented with two buttons to choose which word you think it is - after making a choice, you get immediate feedback on if you were right or wrong.

Practically speaking for listening, phoneme distinction doesn't matter as much if you understand your target language well enough to distinguish based on context. Because if I said "turn off the ?ight", it doesn't matter if you heard an L or an R - you can intuit that it was "light" from context. But for speaking purposes, if you can't distinguish the sounds, how can you trust that you're even pronouncing them correctly?

When I first got interested in HVPT, I didn't know where to start because I didn't know how to cultivate a corpus of high-variability audio clips. It's easy to find some individual sound clips, but in order to avoid numerous pitfalls, you need a LOT of audio.

So I asked Google Gemini where I could find such a dataset, and it suggested building one out of an existing dataset called KsponSpeech. This is a dataset of 1,000+ hours of transcribed native Korean speech, in the form of sentences. It then built a pipeline to process those sentences into individual words, and created a local web app to test me on minimal pairs. Even with some failures in the pipeline (data quality issues), the result is a database of 1.5 million word clips. Pretty awesome.

In case anybody is interested in building a pipeline for a language they're learning, at a high level here's how it works:

  1. We use the grapheme-to-phoneme (G2P) functionality of the Montreal Forced Aligner (MFA) to generate a pronunciation dictionary for every word found in the corpus.
  2. MFA runs over the audio/transcript file pairs and tries to match up the words in the text with the spoken audio. This is the "alignment" step and the output is, for each audio clip, a TextGrid: a time-aligned annotation file containing tiers for words and phones.
  3. We look over the TextGrid files and extract words into individual audio clips. We also store an accompanying database entry pointing to the audio clip file with info such as the word, its onset jamo, its onset "base" for minimal pair matching (e.g. 길 and 킬 would have 일 stored as their "base", and we can use a query like base = '일' and onset_jamo in ('ㄱ', 'ㅋ') to find minimal pairs), and similar info for vowels.

Some app-level details:


2025-01-01 : 2025年에 온 걸 歡迎한다!


It's been a long time since my last post here, mostly because my process has been mostly stable for a while now. But the new year feels like a good time to make an update.

I have made a few decisions to up the pace for my Korean journey. I don't like to think of them as new year's resolutions, but they did happen to line up right around the same time. I actually started them about a week ago.

For one, I've committed myself to 20 new sentence cards per day, until I feel like I'm relatively fluent. I previously thought this wouldn't be reasonable based on the amount of immersion I was getting (see my 2024-03-06 post for more context), but now I think it is reasonable. I think discovering an interest in reading changed that, as I was mostly getting audio immersion back then.

I am also going to commit to more audio immersion, particularly on weekends. As I said above, I was primarily getting audio immersion in the past, but when I started reading, reading ended up replacing a lot of my listening time, so my listening is nowhere near where I want it to be. As a challenge to help fix this, I'll attempting to watch all of my Korean dub of Evangelion every weekend, until I completely or almost completely understand it. The series is just short of 10 hours total, so it can easily be split up between ~3.3 hour sessions on Friday, Saturday, and Sunday.

Of course, I'll also see where I can make more time during weekdays to get more listening (probably with regular kdramas rather than some headass anime dub), but since weekdays are busier with work, I will prioritize reading over listening to ensure I can mine 20 sentences per day.

Lastly, as for reading, I have no particular commitments for this, at least not for now. At my current level and the level of books I'm reading, mining 20 sentences per day seems to naturally require a good amount of reading to find that many sentences.


2024-06-07 : Korea trip, general updates


Korea trip

I went on a two-month trip to Korea since I have the option to work remotely for a little bit each year (plus I took some PTO on top of that). It's not super relevant to the actual language journey so I don't have much to spill here, though I did get to pick up a lot of books while I was there.

Regimen update

The biggest change to my regimen since my last update has been more reading. As mentioned above I got to pick up a lot of books while I was in Korea and have been reading a lot more. I was surprised to find out how engaging reading novels can be even when you have to look up so many words.

Here is what my typical regimen looks like about now:

  1. Anki reviews - about 10 minutes per day. I stopped caring to listen to audio (and stopped collecting audio alltogether), and now can blaze through my reps pretty quickly.
  2. Anki sentence mining from TV (via Korean subtitles) - probably about 30-60 minutes per day depending on how many I want to mine. I find myself mining upwards of 20 cards in a day (vs my decided minimum of 10 cards) more often than I expected, but I haven't been super consistent since I got back from my trip.
  3. Audio immersion by watching TV (no subtitles) - 30-60 minutes per day.
  4. Reading immersion (novels), looking up every unknown word - as much time as I can make on my lunch break after eating; usually ~45 minutes.

Passive immersion of course lasts for several hours a day. I'm also experimenting with "extensive reading" as in reading without looking anything unknown up, but not often enough to count it as part of the regimen.

This is also just a typical day or some kind of minimum. On a lazy weekend morning for example I may get much more TV immersion. I also often do more reading when I get home from work, but not consistently.


2024-03-06 : Check-in


I haven't posted here in over a month, so here's a quick check-in of changes over that timeframe.

(Attempting) monolingual

Recently I decided to go as monolingual as possible. For the uninitiated, monolingual learning is when you begin to learn words by reading their definition in your Target Language. When in your SRS, just testing yourself on understanding the sentence and definition rather than relying on a translation (the back of the card would be just the definition, no translation). When you're learning bilingually, as you often do in the beginning of learning a foreign language, you learn by reading the translation of the word in your native language.

For example: the word 본격적 in a bilingual dictionary and in a monolingual dictionary.

I wanted to go cold-turkey, but I also really want to hit a consistent 10 cards per day (third section expands on this), and was struggling to do that going full monolingual.

If I find an i+1 sentence in immersion, I look up the word in a monolingual dictionary first. Then:

Reading

I'm adding more and more reading to my daily regimen. I started with 외국인을 위한 한국어 읽기 (basically graded reading for Korean learners) and also have started to read quite a bit of Korean Wikipedia (always loved reading random shit on Wikipedia).

Cards per day resolution

In my last post I mentioned being unsure about how many cards to do per day. I've decided to set a goal of minimum of 10 cards per day, with a limit of 20 for a single day. I expect like 90+% of days will only be 10 cards, and even days when I break that minimum, it will probably rarely exceed like 15. 20 is just the hard limit I have my deck set to in Anki.

I do love Anki, but I don't want to spend a ton of free time in it, when that could be used to actively immerse with reading and TV.

Evita

Also as a minor update, I finally finished Evita's grammar deck a little over a week ago. Overall, happy with the experience and grateful for the bootstrap it gave me. Only regret is ever wasting reps on the vocab deck, when I could have just spammed 20 cards per day of the grammar deck and been done in just about 4 months (took about 5 ¾ months for me).


2024-01-28 : How many new cards per day?


A maybe unrealistic expectation

In a previous post I noted that I plan on staying "around 20-25" of my own cards per day when I'm done with Evita's deck (currently accounts for 20 of my 30 cards per day). However I've come to a realization that this may be unrealistic. Of course, currently I'm running more cards than that per day, so it may seem fine. I don't mind the daily review burden. But 25 i+1 cards per day for a whole year is upwards of 9,125 new words[1] in a year. Is this realistic?

Why not?

It's commonly said that 10,000 words in a language is fluency[2] which seems unrealistic for me, personally, to achieve in one year.

It's probably wrong to say you can't reach fluency in one year with enough immersion. But I question if I am immersing enough to sustain that level of progress. As noted in my Wall of Learnings, I understand now that SRS is actually only supplemental to immersion. SRS is not a magical black box for learning anything, and without enough immersion to match, making tons of SRS cards will not get you far.

Finding a realistic number

Well... to be honest this is where I am currently lost, so I have no nice conclusion to this post. lol. I have no idea how I can rigorously decide on a number. I just wanted to make this post as an update, especially since it's been almost a month since I've made a post here. I think I want to do at least 10 and upwards of 15, so I will probably start on the high side and adjust downwards if I feel like I'm not keeping up.

Footnotes

[1] "upwards" because i+1 may just include a grammar point or something rather than a new word.

[2] Yes, "fluency" is a bullshit word and whatever, but it's the best way I can get my point across here. It's a fuckton of words.


2024-01-05 : Retention rates, FSRS, and website updates


FSRS

I decided to make the change to Anki's now-built-in FSRS scheduler. It was a hard decision to make since Anki's SM-2 algorithm is battle-tested, but since I have ~98% retention rate for my Korean sentence cards, I decided to go ahead with it (next section expands on this a bit).

For now I've opted to stay within the default parameters for the scheduler. In a month or two when I have a lot of reviewing under FSRS done, I plan on running the optimizer based on only cards that were first reviewed on/after my FSRS start date.

Why high retention is maybe bad

(Shoutout to Shirobon for introducing me to this idea) There is some evidence to show that a high retention rate is bad, because it means that you are reviewing too often (simply put: if you reviewed less often, your retention rate would be lower). I don't mind the increased daily burden, but the theory is that memories are best "encoded" when you review something just as you're about to forget it, forcing you to work harder to remember it. For example, if you reviewed something every day for a month, you aren't really trying hard at all to memorize it, since it's still fresh in your mind every day. If you reviewed it only 3 times in one month and had to think a little harder each time, the memory would last longer.

Website updates

I'm also adding a new box on this page aside from this little "blog" where I'll be summarizing important learnings I've had. I'll still be talking about them here in this blog, but I want to have a small box for quick reference.


2024-01-02 : Method Check-in


How I Study

Seems I forgot in my first post that part of the point of this blog is to record how I'm studying and note changes over time. So it would be a good idea to check-in what that looks like now before the first update.

Anki:

Passive immersion: I typically listen to Korean 24/7 news the whole day at work, except when in meetings (probably about 45-90 min per day in meetings). At home, I continue to listen, but not as constantly as at work.

Active immersion for sentence mining: I watch K-dramas with Korean subtitles on. I watch, not pausing or looking anything up, until I find an i+1 sentence (by now, this happens very often). Pause, look up the unknown word, and make a card[1]. Do this until I have 10 cards and then I'm done with this kind of immersion for the day.

General active immersion: I watch maybe 30-60 minutes of K-dramas, or a full Korean movie, with subtitles off for listening practice. This is a new part of the process as of a few days before writing this.

Footnotes

[1] Since I am still very early on and trying to develop conversational skills, I don't (yet) like to make cards for uncommon or domain-specific words. As for determining what is "common," I typically consult the star-rating when I look up the word in Naver Dictionary. 3-stars is preferable for now, 2-stars is good too, but 1-star or no stars I tend to avoid.


2023-12-30 : Hello World


Intro

This is my first post here but I am already a few months into my Korean journey. It started around the middle of July 2023 and was pretty spontaneous. I had always been interested in other languages and was fascinated by the idea of reaching fluency in another language. Having taken 3 years of Spanish in high school and growing up around many Hispanics (Southern California), Spanish was my first attempt at that, but I admittedly never became very good at it - enough skill to read a bit and have daily conversations even with strangers, but that's all. After I left my hometown, I stopped putting any time towards Spanish anyway. That was almost 6 years ago now.

Feeling a bit inspired to pursue this interest again, I had passively thought about learning Japanese or Korean for a few weeks. Japanese was on the table particularly because I have many friends who either speak or are learning Japanese, though I mostly randomly decided to learn Korean after some late past-midnight coding session. After a few weeks of learning Korean, I decided to look into Japanese for real, but I stuck with Korean just because 한글 is obviously easier than learning thousands of Kanji, and that by anecdotal claims, Korean culture is much more "compatible" with American culture than Japanese culture is with American culture. In the end, though, I found that studying the Han characters is not so bad, and in fact have studied a few hundred 漢字 to make learning Sino-Korean words very easy. And on top of that, all my Japanese-speaking friends speak English anyway too, so it's not like there was much real utility I would get out of that.

In the beginning I used Duolingo because I had used it in the past for Spanish to some "success"[1], but with Korean I quickly realized it was actually pretty useless. After fumbling around with books and online videos for a few weeks, a friend of mine introduced me to Antimoon/AJATT, which he used to become fluent[2] in Japanese in two years.

Anki

My Anki journey started on 2023-08-31, and now I am sitting at a streak of 122 days. I started off with the popular Evita decks: Grammar and Vocabulary. Though Evita's grammar deck is great, I don't want to rely on premade decks forever. Since I started from almost nothing, I'm using it as a stepping stone for "curated" comprehensible cards. Eventually, I got to a point where I can consistently mine sentences from TV shows and other native content, and make my own cards from that. Currently studying 25 new cards per day, the split is 15 Evita grammar cards and 10 of my own cards (I started off with 14 total, 7/7 from Grammar/Vocab, and slowly decided to increase this).

I also decided sometime early December that studying vocabulary in isolation is mad cringe and dropped Evita's vocabulary deck, so I'm just using the grammar deck to get exposure to the many grammar constructs in Korean.

At my current rate I should be done with Evita's grammar deck by mid-March 2024, at which point I will be solely sourcing sentences on my own by finding i+1 input.


Footnotes

[1] It was only a coincidence that I was making any progress; any real progress likely came from studying independently and interacting with Spanish speakers every day.

[2] "Fluency" is of course a loosely defined word. By his own account, by this 2-year mark, he was able to "comprehend 90+% of native content" and the last 10% was mostly advanced or less common vocabulary.