· Joseph · AI & Machine Learning  · 5 min read

Use ChatGPT to translate new react-dev doc

react.dev was released on March 17. I've read the beta version for a while. I love the Escape Hatches section which has many correct and recommended usages about react hooks. After new react.dev released, I noticed that there's no translation. I haven'n played OpenAI API yet, so I think this is a good opportunity to play ChatGPT with its translation feature for react.dev.

TOC

react.dev was released on March 17. I’ve read the beta version for a while. I love the Escape Hatches section which has many correct and recommended usages about react hooks. After new react.dev released, I noticed that there’s no translation. I haven’n played OpenAI API yet, so I think this is a good opportunity to play ChatGPT with its translation feature for react.dev.

TOC

Ask ChatGPT first

First of all, I have to check feasibility of asking ChatGPT to translate a markdown, hence I copy a part of the markdown file.

ask 1 answer 1

It seems to translate fine, but the markdown syntax is gone. So I ask next prompt:

ask 2 answer 2

Well done! I only need to put markdown into ChatGPT with the prompt translate the following markdown content to zh-tw, and give me unrendered markdown output. Let’s start coding.

Build a CLI

After doing extensive research, I finally found a tutorial on how to implement a CLI tool by Node.js. So I just ask two prompts about inputing docs location and choosing a i18n language.

prompts.next({
  type: 'input',
  name: 'Path',
  default: "/docs",
  message: 'What\'s the doc path?',
});

prompts.next({
  type: 'rawlist',
  name: 'Locale',
  message: 'What\'s the language you want?',
  choices: locales.map(locale => ({ name: `${locale['Display Name']} (${locale['Language Culture Name']})`, value: locale['Language Culture Name']})),
  default: 29
});

The locales is a JSON file that were converted from this csv using convertcsv. Now we have CLI tool, and we’ll discuss the token.

Learn the Token

free When you sign up for the first time, OpenAI provides you with $18 USD. Based on pricing, the Chat API costs $0.002 / 1K tokens. What’s token?

tokenizer The GPT family of models process text using tokens, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.

Each model has a limited number of tokens that it can use. For example, if you use the gpt-3.5-turbo model, it only allows for 4,096 tokens.

tokens

In order not to exceed the maximum number, I have to seperate a markdown to multiple chunks. Thanks to @dqbd/tiktoken, I can easily pass a string and get length.

import { encoding_for_model } from "@dqbd/tiktoken";
const calcToken = (paragraph: string) => {
  const enc = encoding_for_model("gpt-3.5-turbo");
  const tokens = enc.encode(paragraph)
  enc.free()
  return tokens.length
}

Chunk the markdown

In this step, I hava two functions, one is seperate content to paragraph called convertContentToParagraph, the other is convert paragraph to chunks called convertParagraphToChunk.

type TextWithTokens = {tokens: number, text: string}
const convertContentToParagraph = (content: string, cb: VoidFunction) => {
  const codeMatches = [...content.matchAll(/^```.+\n([\s\S]*?)```/gm)]
  const paragraphMatches = content.matchAll(/\n\n/g)
  let startPos = 0
  let codeIndex = 0
  const paragraph: TextWithTokens[] = []
  for (const paragraphMatch of paragraphMatches) {
    cb()
    if (codeMatches.length > 0 && codeIndex < codeMatches.length) {
      const codeMatch = codeMatches[codeIndex]
      if (paragraphMatch.index && codeMatch.index) {
        if (codeMatch.index! < paragraphMatch.index!) {
          if (codeMatch.index + codeMatch[0].length > paragraphMatch.index) {
            continue
          }
          codeIndex += 1
        }
      }
    }
    const endPos = paragraphMatch.index! + paragraphMatch[0].length
    const text = content.substring(startPos, endPos)
    paragraph.push({ tokens: calcToken(text), text })
    startPos = endPos
  }
  const text = content.substring(startPos)
  paragraph.push({ tokens: calcToken(text), text })
  return paragraph
}

const convertParagraphToChunk = (paragraphs: TextWithTokens[]): TextWithTokens[] => {
  return paragraphs.reduce((chunks, paragraph, index) => {
    let s = chunks[chunks.length - 1]
    s.text = `${s.text}${paragraph.text}`
    s.tokens = s.tokens + paragraph.tokens

    if (index === paragraphs.length - 1) {
      return [...chunks]
    }
    if (s.tokens + paragraphs[index + 1].tokens > CHUNK_TOKENS - CONTEXT_TOKEN) {
      return [...chunks, { tokens: 0, text: '' }]
    }
    chunks[chunks.length - 1] = s
    return chunks
  }, [{ tokens: 0, text: '' }])
}

And here is the config for my ChatGPT

const clientOptions = {
    modelOptions: {
        model: 'gpt-3.5-turbo',
        temperature: 0,
        max_tokens: 4097 - CHUNK_TOKENS - CONTEXT_TOKEN,
    },
    maxContextTokens: 4097,
    maxPromptTokens: CHUNK_TOKENS + CONTEXT_TOKEN,
    debug: false,
};

The max_tokens parameter is used for return, while maxPromptTokens specifies the maximum number of tokens that can be used in a prompt.

CHUNK_TOKENS refers to the number of tokens in a paragraph, while CONTEXT_TOKENS refers to the number of tokens in the context.

After these processes, a markdown file is seperated into multiple chunks! Until now, I’ve introduced the important parts of my repo, so you can just pull and give it a try.

{% video ‘video.mov’ %}

Conclusion

ChatGPT translation: https://github.com/josephMG/chatGPT-translate-docs/blob/main/chatGPT/zh-TW/blog/2021/12/17/react-conf-2021-recap.md

react.dev source: https://github.com/reactjs/react.dev/blob/main/src/content/blog/2021/12/17/react-conf-2021-recap.md

difference

I have translated a blog into Traditional Chinese. In the react-conf-2021-recap markdown file, I have divided it into three parts, and the cost was $0.02 USD. If you translate the react-conf-2021-recap file twice, you may notice some minor differences, but the main points remain the same.

References:

  1. resume-builder-cli-demo - a Nodejs CLI repo
  2. 打造美觀的互動式 CLI 介面
Back to Blog

Related Posts

View All Posts »
[Day 3] BMAD-Method project-1 calculator

[Day 3] BMAD-Method project-1 calculator

第一個project來個簡單的計算機,我打算用Reactjs做一個純前端的計算機出來,他會需要容器化技術docker。先用這個簡單的專案來做暖身。 TOC

[Day 2] BMAD-Method and OpenCode installation

[Day 2] BMAD-Method and OpenCode installation

今天來個單純一點的,安裝BMAD-Method跟OpenCode。BMAD-Method在專案根目錄安裝就好,他只是把流程跟team, role的設定黨寫入資料夾而已。而OpenCode則是像Gemini, Claude code那樣的 The AI coding agent built for the terminal ,但因為我IDE是neovim,用他是為了方便在neovim裡使用。 參考影片:https://www.youtube.com/watch?v=70cN9swORE8 TOC

[Day 1] BMAD-Method intro

[Day 1] BMAD-Method intro

這是自發性的連續寫30篇教學文章,不是很想把文章發在ithelp,來這邊挑戰一下自己寫30天BMAD-Method相關的技術文章,預計會用BMAD-Method做各種不擅長的專案。期間可能會視情況購置需要的AI agent plans,可能是Claude Code, OpenAI, 或Gemini都說不定,看token燃燒速度而定。 第一篇先來介紹介紹BMAD-Method這個 AI Agent Framework吧。 TOC

Use Grafana MCP with Gemini and n8n

Use Grafana MCP with Gemini and n8n

The Model Context Protocol (MCP) is extremely useful. An AI assistant helps you decide when and how to use connected tools, so you only need to configure them. After integrating MCP logging management systems into several of my projects, it has saved me a significant amount of time. In this article, I'm going to integrate Grafana with the Gemini CLI and n8n. I will chat with the Gemini CLI and n8n and have them invoke the Grafana MCP server. structure TOC