DEV Community

Cover image for GPT-4 Spaced Repetition with "The Making of Modern Ukraine"
Joe Holmes
Joe Holmes

Posted on

GPT-4 Spaced Repetition with "The Making of Modern Ukraine"

tl;dr

I developed a series of Colab notebooks that use the transcripts of Timothy Snyder's "Making of Modern Ukraine" course from Yale University to create spaced repetition flashcards for Anki using GPT-4.

They're intended to supplement listening to the lectures themselves. After finishing each lecture, you can review the key concepts from the lecture with Anki each day. In my experience, this resulted in super effective digestion/retention of the material--in this case, a sweeping and fascinating history from ~988 AD to the present.

You can check out the two Colab notebooks here:

And you can download the whole course's Anki flashcards, as well as its collection of generated markdown notes, here.

Finally, if you use these resources or want to express support for this project, please donate what you can spare to Razom.

Background history

I first discovered the awesome combination of spaced repetition and high quality online courses while working through The Teaching Company's course Everyday Engineering in 2021-2. For each of those lessons, I took a first pass of notes and screenshots, then followed it up with a second pass in which I converted my notes into flashcards. I did so using Obsidian and one of several community plugins for Anki upload, Flashcards.

While this manual process of card creation was a powerful way to retain the material, it took a lot of work over a long period of time (over a year!) So when ChatGPT/GPT-3.5 released, I knew I wanted to try to use it to streamline the process.

My first naive attempt was to upload chapters from US History's PDF guidebook and ask 3.5 to convert the passage to a few cards in a single prompt. It did not do a good job. But when GPT-4 was released and I was fortunate enough to get API access, I decided to dive in further and try developing something more robust.

For my first attempt, I used The Teaching Company's Fall and Rise of China course. The lessons were remarkably clear and vivid and GPT-4 did a great job generating cards, but the ToS prohibits duplicates or derivative works. I knew I wanted to share this process with the wider world somehow, so for my next experiment I opted for Timothy Snyder's Making of Modern Ukraine course, which is offered for free on Yale University's YouTube channel.

How it was built

In the first notebook, the Course importer, creates a .json file with all the course data, using YouTube, Substack, and the OpenAI embeddings API. In the second notebook, all I had to do was upload the JSON file to Colab every time I wanted to generate new cards.

The notebook uses the course playlist's YouTube ID to fetch the information from each video from using the YouTube v3 API. If you want to follow along, you can find your own version of this API key in the Google Cloud Platform under "APIs and Services."

You'll also need your secret key from OpenAI, along with GPT-4 access.

For each lecture, Professor Snyder wrote a blog post on his Substack that provides a brief summary along with a list of some key terms from the lesson. I ended up making a list of these, too, so I could experiment with them later.

The YouTube API gets each video's title, then the youtube-transcript-api pip package gets each one's transcript. As you've likely noticed if you've ever messed with YouTube's generated transcripts, the transcript is chunked into short, irregular strings of words that are each a few seconds long--not very useful for our purposes.

I found through experimentation that one minute was a good chunk length (it translated to a paragraph of text, more or less.) After splitting the transcript into single minute chunks, I used the nltk.tokenize module of the Natural Language Toolkit to ensure each chunk started and ending with a complete sentence.

Next, to scrape the Substack posts, I used the readabilipy library, a Python wrapper of the awesome readability lib from Mozilla. You can see the code that accomplishes all of this here.

At this point, the data object of the JSON file looks like this:

const course_data = {title: 'The Making of Modern Ukraine',
 teacher: 'Timothy Snyder',
 tag: 'MMU',
 lectures_url: 'https://www.youtube.com/playlist?list=PLh9mgdi4rNewfxO7LhBoz_1Mx1MaO6sw_',
 lectures: [
{title: 'Timothy Snyder: The Making of Modern Ukraine. Class 1: Ukrainian Questions Posed by Russian Invasion',
   id: 'bJczLlwp-d8',
   number: 1,
   blog_url: 'https://snyder.substack.com/p/making-of-modern-ukraine-lecture',
   lecture_transcript: [{start: '00:00', 'end': '01:00',
   text: "one minute of talking here"}
// and so on until the end of the lecture
blog_text: 'scraped blog text']},
// and so on until the end of the playlist
}]
Enter fullscreen mode Exit fullscreen mode

With all the content imported, I moved on to creating embeddings of each chunk of transcript text. Embeddings are arrays of ML-generated numbers used to semantically code strings of text and rank their similarities. They were used in this project to look up the relevant sections of the transcript when improving the question and answer pairs.

If you want a great technical introduction to embeddings, Jay Alammar's blog post on the subject is sublime.

The code below adds an embedding to every chunk of the lecture transcript for every lecture in the course. Pretty simple.

import openai
openai.api_key = openai_sk

def get_embeddings(text):
    response = openai.Embedding.create(
        model="text-embedding-ada-002",
        encoding="cl100k_base",
        input=text,
        max_tokens=8191,
    )
    embeddings = response["data"][0]["embedding"]
    return embeddings

for lecture in course_data['lectures']:
  for chunk in lecture['lecture_transcript']:
    chunk['embedding'] = get_embeddings(chunk['text'])
Enter fullscreen mode Exit fullscreen mode

All that was left after this was to export it.

It's interesting to think of AI-friendly data formats like this functioning as a kind of portable memory for LLMs as time goes on. If I ever want to do anything else with this course, I'll now have all of its content in a single JSON file, ready for similarity search and prompt injection with minimal fuss. I foresee creating more files like this in the future!

Flashcard generation

In the second Colab, the JSON file is imported, then the relevant lecture is selected. (As mentioned above, the notebook is designed to generate flashcards "on demand" as lessons are completed. This has the benefit of allowing tweaks to the prompts as I got a better grasp of what was effective.)

I set up a simple OpenAI API wrapper with the following settings:

def get_completion( user_prompt, model="gpt-3.5-turbo", system_prompt="You are a helpful history professor's assistant. You love capturing the details of history, distilling complex ideas into clear language, and effectively teaching for long-term retention.",):
  completion = openai.ChatCompletion.create(model=model, temperature=0.2, messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}])
  return completion['choices'][0]['message']['content']
Enter fullscreen mode Exit fullscreen mode

GPT-3.5 is default simply to encourage saving money, but throughout the notebook I found myself falling back on 4's superior reasoning capabilities. The temperature is low to encourage facticity and the system prompt tries putting the LLM in a professorly state of mind.

I then used FAISS to create a simple function for cosine similarity search.

Put briefly, cosine similarity is an efficient way of measuring the similarity of two vectors (i.e. embeddings), which we'll use to find the most similar transcript chunks to an embedding of each flashcard Q&A (a "query embedding.")

If you'd like to know more about how cosine similarity works, I'd recommend Jeremy Howard's lesson on recommendation systems from Practical Deep Learning for Coders. Here's the relevant section. You can see the code I wrote for this function here.

Now that the set up is done, we're ready to move onto the prompting. I've sketched up the basic structure in a flowchart, which you can click on below to look through:

Image description

In brief, I wrote an "orientation and rules" prompt that explains what the project is and what effective summarization/question answering looks like (lots of detail, always defining proper nouns, etc.) This prompt is inserted into pretty much every subsequent prompt.

Then the lecture transcript is summarized in a "rolling" way, similar to the refine summary type in Langchain, where each micro-summary is inserted into the subsequent summary prompt (called "the story so far") until the summary is done.

The vocabulary terms from the blog post are all defined using the same process. (I tried a few experimental things with the vocab lists, such as inserting them into later prompt injections and developing Cloze deletion Anki cards out of them, but ended up keeping it simple and not using them.)

Then the summary is used to generate 5-7 great questions. I posed this to GPT-4 as a challenge of asking questions whose answers would display a total, complete, expert-level mastery of the content.

I found this more effective than generating the questions and answers at the same time. It seemed like the model had more focus when the tasks were split. However, decisions like these can cause increased expenses down the road, so you have to weigh your priorities.

The vocab definitions and summary are used together to answer the questions, and the timestamps from the transcript have been included in every output so far such that a "SOURCES" section of the flashcard answer is included in each one. This is an important way to mitigate hallucinations and make these cards more valuable learning tools.

Once the questions and answers are generated, they're converted into JSON. (If the GPT-4 Functions update was released when I worked on this, I would've likely simplified this code quite a bit.) After the JSON is imported in the following section, I was able to loop through each question and answer string and give each one its own prompt and output.

In this loop, the prompt first obtains the embedding of the question and answer string using the same get_embeddings function from earlier. It then uses the FAISS cosine similarity search to retrieve the 2 most similar chunks of transcript text to the flashcard. These chunks are inserted into the prompt as background information which GPT-4 is expected to use to improve the quality of the original question and answer.

Once that's done, each refined Q&A is concatenated and added to a big markdown file along with a tiny summary, a haiku about the lesson's subject, the blog summary and defined terms list, and the completed flashcards.

Every time I finished a lecture I'd download this markdown file to my Obsidian vault, review the notes and make whatever small tweaks I wanted to, and then uploaded them into Anki for review.

Results

Relative to the amount of time and work it took, the system performed quite well. If I intended to use these cards and notes for more applications, they'd all probably need a second pass. I'm sure I'd find some important themes to draw out and drill if I listened to the lectures a second time after this rapid fire study session. But as a simple prototype, these notebooks succeeded in retaining the broad strokes of Snyder's challenging and fascinating lectures. Comparing the time it took to write the notebooks (about a week and a half of focused side project time) with the time it took to manually create flashcards for the engineering course I spoke of earlier (one year+), it boggles the mind how much GPT-4 increased my productivity.

Here's an example flashcard from lecture 7, on the rise of Muscovite power:


What are the two key factors that set East Slavic Rus' apart from other post-Viking states, and how do they relate to the rise of Mongol Rus' and its connection to Kyivan Rus'? #card #MMU #MMU7

  • East Slavic Rus' stands apart from other post-Viking states due to its Eastern Christian influence, which connects it religiously and culturally to the Byzantine Empire and other regions. The other post-Viking states were Western Christian.
  • Unlike other post-Viking states like England or Denmark, Rus' no longer exists as an independent entity, raising questions about its successors and their roles in shaping regional politics. Many states come to claim Rus' in the ensuing years.
  • One such successor is Mongol Rus', which is distinct from other Rus' territories like Galicia-Volhynia due to its centuries-long Mongol rule, positioning it within the broader Mongol or post-Mongol states across Asia and eastern Europe. Mongol Rus' is what will eventually become modern Russia.
  • Mongol Rus' is a fresh start after the Mongols, in which Eastern Christianity exists (without alternatives) but the laws and "civilizational package" from Kyivan Rus are largely absent.

SOURCES: Eastern Christian influence 4:30-6:30, Rus' no longer exists 11:00-12:00, Mongol Rus' and its connection to Kyivan Rus' 20:00-21:00


The hashtags at the end of the question are translated by the plugin into tags usable in Anki, such that I can easily find all of a lesson's cards. Note also the sources section at the bottom. One thing I'd do next time for a publicly available course is design these to work as hyperlinks that take the user to the YouTube lecture's timestamp.

Another nice result is that the proper names are almost always spelled correctly, which can be challenging with old Slavic spellings particularly. This was one of my motivations for creating the vocab terms list I described earlier.

While this card's answer content is on the long side, it's still manageable. You can see in the notebook's prompts that I encourage it to be as brief as possible at this stage, but it can still bring in some fluff. In other cards, the answers become way too long, such that midway through the course I was regularly in "Anki hell," taking 30+ minutes each morning to review my cards. A good brain workout, to be sure! But a bit too time-consuming.

Take this one, on the Ukrainian famine of 1932-3:


What were the seven policies that led to the political famine in Soviet Ukraine, and how did they impact food distribution and the lives of Ukrainian peasants? #card #MMU #MMU15

  • The seven policies were:
    1. The return of grain advances
    2. The meat penalty
    3. The blacklists
    4. The national interpretation of the famine
    5. The affirmation of the existing grain quota
    6. The ban on peasants going to cities to beg for food
    7. The separation of the Ukrainian Republic from its neighbors
  • These policies impacted food distribution by prioritizing grain requisition targets over feeding the population, leading to widespread starvation and death among Ukrainian peasants.
  • The famine was a result of political choices about how to treat particular people, not a lack of food.
  • The Soviets had food reserves and were exporting food from ports in Soviet Ukraine.
  • The ban on peasants going to cities to beg for food forced peasants to stay in the countryside, where the state had taken total control and extracted food.
  • Blacklisted collective farms were cut off from the rest of the Soviet economy, making it illegal for them to exchange goods or services.
  • The national interpretation of the famine blamed Ukrainization for the famine and labeled those in favor of Ukrainization as dangerous, putting pressure on party members to continue requisitioning grain or face punishment.

SOURCES: explanation of seven policies 34:00-42:00, Soviets had food reserves and were exporting food 34:00-35:00, unusual situation of peasants fleeing to cities 40:00-41:00.


In this case, I needed to split the card into two sections--one testing for simple recall of the policies, another for the more detailed information in the second section. This was fine, though, because the system already appeared to function best with a "human in the loop" approach, reviewing each lesson's notes and editing its cards before sending it on to Anki.

Where the prompts did not perform as well was in Snyder's more abstract lectures, which took the scenic route through some elaborate and deeply nuanced arguments. The model had a hard time following Snyder's train of thought, but to be fair to GPT-4 I think most humans (myself included) would have a hard time as well. These lectures took place mostly at the start and end of the course, and these flashcards required more human attention after the fact. The shared version at the top of this article contains the edited versions of those lessons, which are well worth the added difficulty as Snyder's sweeping historical statements are brilliant--one gets the sense of watching a groundbreaking historian creating new interpretations of current events before one's very eyes.

Making matters more challenging was that the course assigned readings, so there was important background information we missed out on. Here, more consumer-oriented products like the Great Courses series interface much better with the language model.

Take-aways and next steps

One set of ideas that kept coming to mind during this project was the notion of LLM interaction as a form of "cyborgism" (to be contrasted with RLHF-enabled personification of the language model) as described by Janus in a popular LessWrong article. In it, Janus counsels LLM users to try thinking of the AI as an extension of your own mind--like a special pair of glasses, or a robotic hand--instead of as an Other you're expected to converse with. This can reveal creative possibilities for application, as this project seemed to be at the time. It felt like the AI compressed my own thinking and note-taking process into a tiny fraction of the time it'd otherwise take to learn the subject, like Neo learning kung fu in The Matrix.

Another thought was that history is uniquely well-suited to edtech experiments with AI at this stage in its development, because it's almost all textual. Other disciplines may require more images or fancy script (such as chemistry notation, LaTEX, etc.)

Next time I think I'll try codifying the "human in the loop" aspect of generation more, perhaps splitting the flashcard making process into two: first, generating notes with GPT-4 on the lecture and summarizing it from a variety of angles, then prompting the human to select from these notes which ideas he/she finds most valuable for emphasis. Then the amended note could be uploaded to a new notebook, and that human feedback could be incorporated into the generation of the cards.

Thanks for reading. If you'd like to see more of my work, check out my portfolio site. And if you enjoyed this article, please consider donating what you can to Ukrainian relief efforts. Thank you.

Top comments (0)