react.dev was released on March 17. I’ve read the beta version for a while. I love the Escape Hatches section which has many correct and recommended usages about react hooks. After new react.dev released, I noticed that there’s no translation. I haven’n played OpenAI API yet, so I think this is a good opportunity to play ChatGPT with its translation feature for react.dev.
TOC
Repo: https://github.com/josephMG/chatGPT-translate-docs
You can get my chatGPT translation from my repo and read README.md to check how to use.
In this repo, I use typescript and nodejs to integrate ChatGPT, and you have to get a API key from OpenAI if you want to run it.
Ask ChatGPT first#
First of all, I have to check feasibility of asking ChatGPT to translate a markdown, hence I copy a part of the markdown file.
It seems to translate fine, but the markdown syntax is gone. So I ask next prompt:
Well done! I only need to put markdown into ChatGPT with the prompt translate the following markdown content to zh-tw, and give me unrendered markdown output.
Let’s start coding.
Build a CLI#
After doing extensive research, I finally found a tutorial on how to implement a CLI tool by Node.js. So I just ask two prompts about inputing docs location and choosing a i18n language.
1 | prompts.next({ |
The locales
is a JSON file that were converted from this csv using convertcsv. Now we have CLI tool, and we’ll discuss the token
.
Learn the Token#
When you sign up for the first time, OpenAI provides you with $18 USD. Based on pricing, the Chat API costs $0.002 / 1K tokens
. What’s token?
tokenizer
The GPT family of models process text using tokens, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.
Each model has a limited number of tokens that it can use. For example, if you use the gpt-3.5-turbo
model, it only allows for 4,096 tokens.
In order not to exceed the maximum number, I have to seperate a markdown to multiple chunks. Thanks to @dqbd/tiktoken, I can easily pass a string and get length.
1 | import { encoding_for_model } from "@dqbd/tiktoken"; |
Chunk the markdown#
In this step, I hava two functions, one is seperate content to paragraph called convertContentToParagraph
, the other is convert paragraph to chunks called convertParagraphToChunk
.
1 | type TextWithTokens = {tokens: number, text: string} |
And here is the config for my ChatGPT
1 | const clientOptions = { |
The
max_tokens
parameter is used for return, whilemaxPromptTokens
specifies the maximum number of tokens that can be used in a prompt.
CHUNK_TOKENS
refers to the number of tokens in a paragraph, whileCONTEXT_TOKENS
refers to the number of tokens in the context.
After these processes, a markdown file is seperated into multiple chunks! Until now, I’ve introduced the important parts of my repo, so you can just pull and give it a try.
Conclusion#
ChatGPT translation: https://github.com/josephMG/chatGPT-translate-docs/blob/main/chatGPT/zh-TW/blog/2021/12/17/react-conf-2021-recap.md
react.dev source: https://github.com/reactjs/react.dev/blob/main/src/content/blog/2021/12/17/react-conf-2021-recap.md
I have translated a blog into Traditional Chinese. In the react-conf-2021-recap
markdown file, I have divided it into three parts, and the cost was $0.02 USD. If you translate the react-conf-2021-recap
file twice, you may notice some minor differences, but the main points remain the same.