Race on to decipher oracle bone characters with AI

Henan Daily
Zhengzhou: For over a century, generations of scholars have pursued the ultimate dream of decoding the oracle bone script, China’s earliest known writing system. Inscribed on ox scapulae or turtle shells, the script was used mainly for divination purposes from the 14th to 11th centuries BC, during the late Shang Dynasty (c. 16th century-11th century BC), and was rediscovered in 1899 at the Yinxu Ruins in Anyang, Henan province.
Among the more than 4,500 oracle bone characters discovered, the meanings of over 3,000 remain a mystery. Scholars now hope the rapid development of artificial intelligence will help accelerate text deciphering and the reassembly of oracle bone fragments.
Their work is exemplified by a competition to promote AI-assisted research and use of oracle bone script announced during the 2025 China (Anyang) International Conference of Chinese Characters, held from April 20-22.
Organized by the Henan Provincial Administration of Cultural Heritage, the Anyang government, the Anyang Normal University and Tencent’s sustainable social value division, the competition seeks candidates for interdisciplinary academic research and algorithm development projects to advance the deciphering of oracle bone characters, as well as for AI-generated cultural products that promote the script and its cultural connotations.
Potential candidates should register and submit their work by Sept 20. The awards will be announced in late October.
Traditionally, deciphering a single oracle bone character requires rigorous research based on intensive reading, according to Liu Yongge, director of the Laboratory of Oracle Bone Inscriptions Information Processing at Anyang Normal University, a key laboratory under the Ministry of Education.
To verify a presumption about a particular pictogram, scholars must examine the evolution of the glyph’s appearance and meaning throughout history, as well as the oracle bone character’s semantic consistency across different contexts and scenarios of use.
They also need to investigate the character’s pronunciation, often by resorting to oracle bone characters with similar composition or meaning, as well as descendant scripts like xiaozhuan (small seal script), for reference.
Liu adds that scholars have to digest a large canon of textual materials — classics from the pre-Qin period (before 221 BC) in particular — in order to make an informed attempt.
In 2016, the National Museum of Chinese Writing in Anyang announced a program offering a reward of 100,000 yuan ($13,900) for successfully deciphering a single oracle bone character. Only three scholars have so far received the reward.
For more than two decades, Liu’s laboratory has been making use of digital methods to facilitate the study of oracle bone inscriptions. As early as 2000, they developed a visual input method for oracle bone characters that is still widely used today.
The more than 4,500 oracle bone characters are categorized by radicals for search and input, enabling users to click and directly paste them into documents.
In 2019, the team launched the online platform Yinqi Wenyuan, which integrates databases of oracle bone glyphs — including their explanations, pronunciations, variations and related research — along with image collections of oracle bones, such as rubbings, facsimiles and photographs, as well as literature in the field.
The platform is one of the most comprehensive and convincing oracle bone script databases, and is open access and free of charge.
With the help of AI, Liu’s laboratory has collaborated with Tencent and other academic institutions to develop a collaborative research platform with a set of intelligent tools, with which some 500 duplicate rubbings have been identified.
The set of tools can also detect the presence of a specific glyph across different rubbings or facsimiles, and identify similar pictograms, sorting them by similarity.
Last year, Liu and his colleagues started making use of large language models to decode oracle bone characters. They aim to build a specialized large language model, which is trained on vast amounts of academic literature and vocabulary, and which is capable of proposing well-grounded hypotheses and eliminating improper assumptions, therefore accelerating the process of deciphering.
He says that the development of this specialized large language model requires the participation of more industry and research organizations. By holding the competition, they hope to involve more young people in the sector and stimulate interdisciplinary collaboration, while also looking to the oracle bone script, in turn, to contribute to the development of large language models.
According to its official website, the competition consists of three areas. In the invitational research area, paleographers and computer specialists will work together on studies to decode oracle bone characters and inscriptions.
They will analyze the evolution of glyphs, their calligraphic structures and grammar, before submitting a research report that includes AI algorithm models, the rationale behind these models and an interpretation of the academic value. Participating teams for this area must undergo a preliminary qualification review.
An algorithm challenge constitutes the second area, focusing on using AI to facilitate the collation of oracle bone script data within a given dataset. Participants will be required to develop code either for the automatic rejoining of oracle bone fragments, which will be evaluated by their accuracy rate and proportion of new achievements; or for automatically identifying duplicate rubbings of the same oracle bones, assessed by the accuracy and the number of unpublished duplicate images.
The last area, however, invites public participants to create images, animations or short videos with AI tools that incorporate the pictograms or historical connotations of the oracle bone script.
Shu Zhan, head of Tencent’s digital cultural lab, explains that through AI technology they are seeking to enhance research efficiency and raise public awareness of oracle bone script.
He expects the competition to accelerate research progress on the script, inspire intelligent restoration and virtual exhibitions of oracle bone inscriptions and other categories of cultural heritage, and lower the threshold for public participation through AI-driven creation, thereby integrating the script into daily life.
Shu adds that in the future, the company will continue to enhance the text corpus for the specialized oracle bone script large language model and advance the virtual aggregation and exhibition of overseas collections of oracle bones.