· Joseph · AI & Machine Learning  · 5 min read

Use ChatGPT to translate new react-dev doc

react.dev was released on March 17. I've read the beta version for a while. I love the Escape Hatches section which has many correct and recommended usages about react hooks. After new react.dev released, I noticed that there's no translation. I haven'n played OpenAI API yet, so I think this is a good opportunity to play ChatGPT with its translation feature for react.dev.

TOC

react.dev was released on March 17. I’ve read the beta version for a while. I love the Escape Hatches section which has many correct and recommended usages about react hooks. After new react.dev released, I noticed that there’s no translation. I haven’n played OpenAI API yet, so I think this is a good opportunity to play ChatGPT with its translation feature for react.dev.

TOC

Ask ChatGPT first

First of all, I have to check feasibility of asking ChatGPT to translate a markdown, hence I copy a part of the markdown file.

ask 1 answer 1

It seems to translate fine, but the markdown syntax is gone. So I ask next prompt:

ask 2 answer 2

Well done! I only need to put markdown into ChatGPT with the prompt translate the following markdown content to zh-tw, and give me unrendered markdown output. Let’s start coding.

Build a CLI

After doing extensive research, I finally found a tutorial on how to implement a CLI tool by Node.js. So I just ask two prompts about inputing docs location and choosing a i18n language.

prompts.next({
  type: 'input',
  name: 'Path',
  default: "/docs",
  message: 'What\'s the doc path?',
});

prompts.next({
  type: 'rawlist',
  name: 'Locale',
  message: 'What\'s the language you want?',
  choices: locales.map(locale => ({ name: `${locale['Display Name']} (${locale['Language Culture Name']})`, value: locale['Language Culture Name']})),
  default: 29
});

The locales is a JSON file that were converted from this csv using convertcsv. Now we have CLI tool, and we’ll discuss the token.

Learn the Token

free When you sign up for the first time, OpenAI provides you with $18 USD. Based on pricing, the Chat API costs $0.002 / 1K tokens. What’s token?

tokenizer The GPT family of models process text using tokens, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.

Each model has a limited number of tokens that it can use. For example, if you use the gpt-3.5-turbo model, it only allows for 4,096 tokens.

tokens

In order not to exceed the maximum number, I have to seperate a markdown to multiple chunks. Thanks to @dqbd/tiktoken, I can easily pass a string and get length.

import { encoding_for_model } from "@dqbd/tiktoken";
const calcToken = (paragraph: string) => {
  const enc = encoding_for_model("gpt-3.5-turbo");
  const tokens = enc.encode(paragraph)
  enc.free()
  return tokens.length
}

Chunk the markdown

In this step, I hava two functions, one is seperate content to paragraph called convertContentToParagraph, the other is convert paragraph to chunks called convertParagraphToChunk.

type TextWithTokens = {tokens: number, text: string}
const convertContentToParagraph = (content: string, cb: VoidFunction) => {
  const codeMatches = [...content.matchAll(/^```.+\n([\s\S]*?)```/gm)]
  const paragraphMatches = content.matchAll(/\n\n/g)
  let startPos = 0
  let codeIndex = 0
  const paragraph: TextWithTokens[] = []
  for (const paragraphMatch of paragraphMatches) {
    cb()
    if (codeMatches.length > 0 && codeIndex < codeMatches.length) {
      const codeMatch = codeMatches[codeIndex]
      if (paragraphMatch.index && codeMatch.index) {
        if (codeMatch.index! < paragraphMatch.index!) {
          if (codeMatch.index + codeMatch[0].length > paragraphMatch.index) {
            continue
          }
          codeIndex += 1
        }
      }
    }
    const endPos = paragraphMatch.index! + paragraphMatch[0].length
    const text = content.substring(startPos, endPos)
    paragraph.push({ tokens: calcToken(text), text })
    startPos = endPos
  }
  const text = content.substring(startPos)
  paragraph.push({ tokens: calcToken(text), text })
  return paragraph
}

const convertParagraphToChunk = (paragraphs: TextWithTokens[]): TextWithTokens[] => {
  return paragraphs.reduce((chunks, paragraph, index) => {
    let s = chunks[chunks.length - 1]
    s.text = `${s.text}${paragraph.text}`
    s.tokens = s.tokens + paragraph.tokens

    if (index === paragraphs.length - 1) {
      return [...chunks]
    }
    if (s.tokens + paragraphs[index + 1].tokens > CHUNK_TOKENS - CONTEXT_TOKEN) {
      return [...chunks, { tokens: 0, text: '' }]
    }
    chunks[chunks.length - 1] = s
    return chunks
  }, [{ tokens: 0, text: '' }])
}

And here is the config for my ChatGPT

const clientOptions = {
    modelOptions: {
        model: 'gpt-3.5-turbo',
        temperature: 0,
        max_tokens: 4097 - CHUNK_TOKENS - CONTEXT_TOKEN,
    },
    maxContextTokens: 4097,
    maxPromptTokens: CHUNK_TOKENS + CONTEXT_TOKEN,
    debug: false,
};

The max_tokens parameter is used for return, while maxPromptTokens specifies the maximum number of tokens that can be used in a prompt.

CHUNK_TOKENS refers to the number of tokens in a paragraph, while CONTEXT_TOKENS refers to the number of tokens in the context.

After these processes, a markdown file is seperated into multiple chunks! Until now, I’ve introduced the important parts of my repo, so you can just pull and give it a try.

{% video ‘video.mov’ %}

Conclusion

ChatGPT translation: https://github.com/josephMG/chatGPT-translate-docs/blob/main/chatGPT/zh-TW/blog/2021/12/17/react-conf-2021-recap.md

react.dev source: https://github.com/reactjs/react.dev/blob/main/src/content/blog/2021/12/17/react-conf-2021-recap.md

difference

I have translated a blog into Traditional Chinese. In the react-conf-2021-recap markdown file, I have divided it into three parts, and the cost was $0.02 USD. If you translate the react-conf-2021-recap file twice, you may notice some minor differences, but the main points remain the same.

References:

  1. resume-builder-cli-demo - a Nodejs CLI repo
  2. 打造美觀的互動式 CLI 介面
Back to Blog

Related Posts

View All Posts »
AI code review with n8n

AI code review with n8n

Previously I read a post "Automate and Accelerate GitLab Code Reviews with OpenAI and n8n.io". This made me wonder: If I don’t choose GitHub Copilot for code reviews, can I still integrate AI and n8n with GitHub PR reviews? I haven’t written a blog in a long time—it’s time to start again!

[Day 30] Google AI & ML Products 系列總結

這系列文章出自於一個無窮無盡的bug的解題時間,想不到如何解bug、又想不出要寫什麼主題,參考完大家的方向以後,我發現這類型的文章很少、又很實用,才下定決心透過鐵人賽推廣 Google AI & ML Products。 在這次的挑戰裡,給了自己三個目標: 更熟悉docker 開始玩Golang 入門大部分的Google AI & ML Products 但也因為Google AI & ML Products太多了,所以把它分了很多子系列進行,現在再來回顧一下這次的內容。 前面先來個提醒,如果過程中你們有Deploy model做Online predict的,如果測完一定要記得刪掉,不然你deploy的model就會一直被收費喔。

[Day 29] Google AI Hub - 2

今天要來玩的是AI Hub裡面的Reusing a pipeline component,對Python超不熟的我弄了超久。 這邊會需要run起tensorflow的docker docker pull tensorflow/tensorflow:latest-py3-jupyter docker run -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 --name jupyter tensorflow/tensorflow:latest-py3-jupyter

[Day 28] Google AI Hub - 1

話說照第一天的規劃,今天本來要寫Recommendation AI,不過我測了很久始終無法使用Recommendation AI、無法Enable它,所以只好就此作罷。 AI Hub 今天找其他的玩具來玩,翻到了AI Hub,Hub會讓人直接連想到集線器,AI Hub就是把很多AI、ML集中在一起的平台,你可以在上面使用大家的AI model,也可以分享自己的AI上去給大家用。 我今天會來介紹其中幾個內容:service、notebook、tensorflow module,入門一下AI Hub。