TL;DR;
I used ChatGPT API to translate the Rails Guide into different languages:
- Taiwan's Traditional Chinese🇹🇼 https://ai.rails-guide.com/zh-TW
- French🇫🇷 https://ai.rails-guide.com/fr
- Lithuanian🇱🇹 https://ai.rails-guide.com/lt
- Brazilian Portuguese🇧🇷 https://ai.rails-guide.com/pt-BR
- Thai🇹🇭 https://ai.rails-guide.com/th
- Simplified Chinese🇨🇳 https://ai.rails-guide.com/zh-CN
Update on 2023/08/12
I added 3 more langauges
- Japanese🇯🇵 https://ai.rails-guide.com/jp
- Korean🇰🇷 https://ai.rails-guide.com/ko
- Espanõl🇪🇸 https://ai.rails-guide.com/es
What's the Rails Guide?
I guess people who read this article already know Rails, however, just in case, I'll briefly introduce Ruby on Rails and the Rails Guide. Feel free to skip this section if you already knew them.
Ruby on Rails is a full-stack web application framework. With Rails, you can build a website that can access your database's data, return as API payload or render them on the user's browser easily and safely. The Rails Guide is the user manual for developers to learn how to use Rails. The Rails Guide is also a crowd-creation and is in the same repository on GitHub. It has very high quality because it is reviewed and modified again and again by numerous seasoned Rails developers. For anyone who wants to learn Ruby on Rails, I will definitely recommend they read the guide first.
Why translate the Rails Guide?
Translating the Rails Guide is not for diversity. The Ruby on Rails guide is written exclusively in English and it is totally fine. However, there are many talented developers all around the world who just cannot read English well. It is really a pity that they don't have a chance to get in touch with this wonderful and powerful web framework, Ruby on Rails, just because it lacks the information in their languages. I believe by translating the Rails Guide, we'll have a better chance for people all over the world to learn Rails.
Why use generative AI to translate Rails Guide?
First of all, generative AI can produce more human text. Moreover, with more context, it can generate more accurate and suitable translations. You must have read some articles which you could tell immediately that were translated by Google Translate because they felt very unnatural.
Second, although there are already many repositories of rails guide in different languages, https://guides.rubyonrails.org/contributing_to_ruby_on_rails.html#translating-rails-guides. However, the problem is that most of them are out of date. Those repositories also depend on volunteers' efforts. The Rails community used to have some enthusiastic fans who were willing to help translate the guide. Unfortunately, since the popularity of Rails plummeted, it hasn't had enough volunteers to continue the work. Using Generative AI to translate documents saves time and human effort. One person can refine the translation result by his/herself easily. It also means that we can update them more frequently. It could be a more sustainable method.
Proposed Workflow
My original plan was simple.
- Write a script to read the Rails guide files and send their content to ChatGPT to translate to a specified language.
- Then use the existing Rails Guide script to generate HTML files just like the current translation workflow
I may wrap the code into a class,
AiTranslator
, so it should be like this
However, it was not as simple as I imagined 😅
Challenges
There are many challenges in this simple task. I picked some more significant ones here.
Tokens
ChatGPT or other generative AI models can only accept a limited number of tokens. Tokens are composed of both input and output strings. It's not the number of characters or words but only correlated. Tokens are also used for OpenAI to charge your bill.
The current most popular model, gpt-3.5-turbo
only allows 4097 tokens for one request. Remember, it's used for both input and output. That means I cannot just upload a whole file to ChatGPT but I need to process a file piece by piece.
Maybe you think: it's easy, you can just send 1 to 2 phrases for a ChatGPT API call, then you'll never exceed the limit.
You're right. However, each ChatGPT request is independent, they don't share any context. I can show you an exmaple of the web page's ChatGPT. If I ask ChatGPT "Do you know NBA?" then ask it "Who's the champion of 2019?
". It will answer it's Toronto Raptors.
However, if I only ask "Who's the champion of 2019?" directly in a new session, ChatGPT will not be able to answer me because of lacking context.
Unlike Google Translate which is like a strengthened dictionary. We'd better treat the Generative AI model like a very smart student. The more input you give it, the better the result it returns to you. As a result, I want to feed ChatGPT text as much as possible so it can have appropriate context to translate the Rails Guide properly.
My approach is like the code block below.
buffer = []
result = ''
File.readlines(file).each do |line|
if line == "\n" && buffer.join.split.length > @buffer_size
translated_text = ai_translate(buffer.join)[:text]
result += translated_text + "\n"
buffer = []
else
buffer << line
end
end
- I declare a
buffer = []
at the beginning. - Iterate a file line by line. For each iteration, I'll put one line into
buffer
- When the number of words exceeds a threshold, I'll send the request to ChatGPT API with the content in the
buffer
. The threshold,@buffer_size
, is defaulted as700
. It's just an empirical magic number - Plus, we know paragraphs in markdown are separated by blank lines, therefore, I also want to translate a whole paragraph in one ChatGPT request.
Prompt phrase
The prompt phrase for the Generative AI model affects the result drastically. I tried a lot of different combinations. And eventually, I made it this way:
LANGUAGES = {
'zh-TW' => "Traditional Chinese used in Taiwan(台灣繁體中文).",
'lt' => 'Lithuanian',
'fr' => 'French',
'pt-BR' => 'Brazilian Portuguese',
'th' => 'Thai',
'zh-CN' => 'Simplified Chinese',
}
system_prompt ||= "Translate the technical document to #{LANGUAGES[@target_language]} without adding any new content."
-
Translate the technical document
: pointing out that we are translating a technical document excerpt so it will know it does not need to translate some elements like code blocks. -
LANGUAGES[@target_language]
: I don't know whether it is a unique problem for Traditional Chinese. Although they're both Chinese words, the terminologies, writing style and intonation of Traditional Chinese in Taiwan are very different from what Simplified Chinese has. I need to specify it more clearly so I can get the desired result. -
without adding any new content.
: It is also important to tell ChatGPT not to add extra information because we're translating an article. Otherwise, it will just be like some annoying students in your classroom, who keep talking and add much needless knowledge.
Markdown parsing
The Rails Guide is full of code blocks for showing code examples. It's reasonable not to send a code block separately. I made the line reader a simple state machine. It will change the state to :codeblock
when it starts parsing a codeblock and it won't call ChatGPT API until it finishes that block.
state = :readline
buffer = []
result = ''
File.readlines(file).each do |line|
if line.include?("` ` `") # I need to add spaces between the backtick(`), or Dev.to will have problem
buffer << line
state = state == :codeblock ? :readline : :codeblock
elsif line == "\n" && state == :readline && buffer.join.split.length > buffer_size
translated_text = ai_translate(buffer.join)[:text]
result += translated_text + "\n"
buffer = []
else
buffer << line
end
end
Anchors
When you open any rails guide's page, you can see there's a Chapters block on the right serving as a table of content.
That table is generated automatically by a script. The titles, such as <h1>, <h2>, <h3>
, etc. will be assigned id
with the title's text. For example, if the title is "Guide Assumption" in the markdown,
### Guide Assumption
it will be rendered as in the final HTML
<h3 id="guide-assumptions">...</h3>
The link in the table of content can then be referred to the elements with that id value.
It works fine in the original Rails Guide. When you click a link in the Chapters, the browser will jump to the corresponding section. However, a problem happens once all titles are translated. After some investigation, I found that it's related to Turbo. I guess it's a Turbo's bug. My current solution is disabling Turbo for the links in the Chapters block.
<ol class="chapters" data-turbo="false">
...
</ol>
Code
Repository: https://github.com/kevinluo201/rails-guide-ai
This repo is forked from the Rails repo so that it can pull the updates of the guide's files. It only has 2 new files:
It only has 2 new files.
-
guides/rails_guides/ai_translator.rb
: it's the main program. -
guides/ai_translate.rb
: it's the starting point
You can do the following steps if you want to play around with it.
- Set a new environment variable call
OPENAI_ACCESS_TOKEN
and set its value to your personal access token on OpenAI. - add a new language in
RailsGuide::AiTranslator
, for example,'jp' => 'Japanese'
- Open the terminal, go to
guides/
and start translating by executing ```bash
ruby ./ai_translate.rb jp
4. You can also translate a single file, just add a filename after the command
```bash
ruby ./ai_translate.rb jp getting_started.md
- After all files are translated, you can just execute the rails existing script to generate HTML, CSS and JS. Unfortunately, it is likely to fail when you do that. Usually, it is because there are duplicated titles which lead to duplicated
id
in the HTML. You can fix it by finding out which title has the problem and can change that title a bit to avoid the problem. It can also have different problems when translating into different languages. Just try solving them so the process can finish.
bundle exec rake guides:generate:html GUIDES_LANGUAGE=jp
Help Wanted
It is just an experimental project now. There are several issues that can be improved. If you think it is an interesting topic, feel free to discuss it with me.
Current Issues
Anchor links
The table of content is solved by disabling Turbo. However, there are anchor links spread among the articles. They cannot be converted to the correct URL smoothly, especially when it refers to an anchor on another page.
Versioning
The Rails Guide has versions. A version is kind of a snapshot of the guide at a particular time. I haven't thought of a good way to manage them.
Different models
I'm now using gpt-3.5-turbo
. I live in Canada so I cannot use Google's Bard
. Feel free to change the code to be able to switch different models, like gpt4
or llamas 2
EPUB
Epub files can be generated by the Rails guide script. However, it has errors when I want to import them into the Epub reader software, such as "Books" on OSX. I think it may related to the broken anchor links.
Other stuff
If you have any ideas that can make this project more sustainable, please discuss it with me. For example, it's a guide for Rails, why not build it as a Rails app?
Conclusion
The quality of AI translation is not perfect but acceptable. I'm not concerned about the quality. As far I can see, the limitation of tokens and the trained model are the most significant factors. I believe this problem will be solved by swapping the current model (gpt-3.5-turbo
) with a more advanced model in the future. The result shows that this workflow really works and that's the most important lesson for me.
About the cost, I have done many experiments for this idea and I translated the Rails Guides into 6 different languages. It costs me about $27 so each version of the translation costs less than $5 on average. The actual price should be less than that because many experiments just failed.
*Due to its good quality and low cost, Generative AI might be a good solution for technical documents of open-source projects. *
Buy me a coffee
At last, if you like what i'm doing, you can buy me a coffee 😉☕️
](https://www.buymeacoffee.com/kevinluo)
Top comments (12)
Thanks for your informative article on translation! I am one of core maintainers of the Japanese version of Rails Guides.
This would be true on some translation projects but not true, at least on the Japanese one. Our repository is still very active since we released in 2014, as well as rorlakr/rails-guides and morsbox/rusrails repos. On Japanese one, you can check out how we actively maintain it here: github.com/yasslab/railsguides.jp
But anyway, I know some of other translation projects are not actively maintained and I am personally interested in your approach. So I am glad if this information above helps your article more precise. :)
Thanks for your comment!
Sorry, I shouldn't have used "All". I checked the repos and I agreed with you the Japanese translation is pretty active. I'm so envy of that 🥹 If I remember correctly, not only the open source communities documents often have the latest Japanese translation but also a Japanese tranlsation book will be reelased very fast after a new computer-related book is published
Anyway, I think it's still a good idea to utilize LLM like ChatGPT to generate an initial version of translation. It can save a lot of time for the volunteers. Since translating a open source project document doesn't get paid, if we can make volunteers' lives easier, I guess it could make people more willing to participate in and stay longer in the project.
Yeah, using "Some" or maybe "Most of" instead of "All" makes this article more precise. ;)
P.S.
In the Japanese translation project we already use AI-powered tools like DeepL Pro for a draft translation since 2018. And yes, it helps a lot!
Also we do fundraising for the Japanese documentation, which helps to continue the project and reduce the cost to maintain. It definitely helps to learn, especially for new Rails developers in Japan. ;)
Our project is well-documented in Japanese but not in English because most of our expected users are Japanese speakers. But I hope our example helps to translate in other languages. :D
I updated that
hi Kevin. Nice job on doing this. I was recently thinking of using ChatGPT to update the current version I have.
I am the one that translated the one for Rails 6 in Spanish (and yes not fully updated, a changed messed up the styling), and its a lot of work and hours to translate it the 'traditional' way. Due to time constraints, yes its hard to keep them maintained.
Will try to come back and see how your project continues. I'm curious if others are using the translations you did.
Hey @isis, I haven't checked its status recently because my son was born after I finished this side project. 😁
Here's today's GA result of those websites
I think most users are from Asia.
Anyway, after more than one year, LLM technology has improved a lot. e.g. the total amount of tokens can be much more than it was. I heard the concept of "embedding" recently: store the document in a vector database first and the access of it won't count any token. I think now what we'd better do is to wait a little bit longer. Let those tech giants compete among themselves and extend LLM's limitations as far as possible. Then we can start again. And I don't think it will be a pretty near future thing.
@isis My other thought is that it might never achieve the highest quality based on the current LLM's approach because uncertainty or indeterministic is LLM's nature. Maybe we should provide open-source projects translators with a better translation tool which is assisted by AI
Wow, this is awesome, Kevin!
Great job!
In the vast landscape of web development, mastering frameworks like Ruby on Rails is pivotal for building robust and scalable applications. However, delving into the Rails Guide can often feel like navigating a labyrinth of technical jargon and complex concepts.
Enter AI, the beacon of hope for developers seeking clarity amidst the intricacies of the Rails Guide. With its advanced algorithms and natural language processing capabilities, AI offers a transformative solution to the challenge of translation.
Imagine a world where every line of the Rails Guide is effortlessly translated into clear, concise language, accessible to developers of all levels. No longer do you need to grapple with obscure terminology or convoluted explanations – AI bridges the gap between complexity and comprehension with ease.
From beginners seeking to grasp fundamental concepts to seasoned developers navigating advanced features, AI-driven translation promises to unlock the full potential of the Rails Guide for all. With AI as our ally, we can embark on a journey of discovery and mastery within the realm of Ruby on Rails, empowered by clarity and insight.
So, dear AI, can you translate the Rails Guide for me? The answer is a resounding yes – and with AI by our side, the possibilities are endless.
If you want to learn in detail about ai and ai tools to visit this website:
aichief.com/
You may use the gpt-3.5-turbo-16k model to address the token issue.
yeah, but that only moderates the problem a bit since it still cannot swallow the whole article at once