Let’s peel back the layers of what a true Jawi translator would actually entail, and why this ancient script is fighting for its life in the Unicode era. To understand why building a Jawi translator is a nightmare for AI, we must first abandon a common misconception. Jawi is not Arabic. And it is not a one-to-one cipher for Rumi (Latin Malay).
If you transliterate blindly from Rumi ( p-e-r-g-i ), you might write ڤيرݢي . But a Jawi reader would pronounce that "Pee-ree-gee." Wrong. jawi translator
In Jawi, both look similar. The word باتو could be read as "ba-tu" (stone) or "ban-tu" (help) depending on context. The nasal sound 'n' is often assimilated. Let’s peel back the layers of what a
But if you are willing to do the hard work—to understand tanda baris , to know when to use 'kaf' vs 'qaf', to respect the regional differences—then you are not looking for a translator. And it is not a one-to-one cipher for Rumi (Latin Malay)
When we lose the ability to translate between Rumi and Jawi fluently, we lose access to 700 years of history. We lose the Hikayat Hang Tuah in its original voice. We lose the letters of the Malaccan Sultans.
A single Jawi translator cannot exist without asking the user: Which century? Which country? Which school of thought? We have GPT-4, Gemini, and Llama. Why can't they handle Jawi?
Neural Machine Translation (NMT) needs millions of parallel sentences (Rumi || Jawi). While the Quran has parallel corpora for Arabic, Jawi secular literature is locked in dusty archives. The National Library of Malaysia has thousands of manuscripts, but they are not digitized or aligned sentence-by-sentence.