## DEV Community

Hrishikesh Terdalkar

Posted on

# Devanagari Transliteration in LaTeX

Write in Devanagari to render as IAST, Harvard-Kyoto, Velthuis, SLP1, WX etc.

Devanagari text can be transliterated in various standard schemes. There exist several input systems based on these transliteration schemes to enable users easily input the text. More often than not, a user has a preference of scheme to type the input in. Similarly, at times, one faces a need to render it in a different scheme in the PDF document.

In my case, I prefer using ibus-m17n to type text in Devanagari. While writing articles that contain Devanagari text, I also faced the need to render the text as IAST in the final PDF One could always learn to input text in another input scheme, but that may get tedious. Similarly, transliterating each word using online systems such as Aksharamukha can also be a tedious task. So, I was looking for a way…

Devanagari is the fourth most widely adopted writing system in the world, primarily used in the Indian subcontinent. The script is being used for more than 120 languages, some of the more notable languages being, Sanskrit, Hindi, Marathi, Pali, Nepali and several variations of these languages.

Devanagari text can be transliterated in various standard schemes. There exist several input systems based on these transliteration schemes to enable users easily input the text. More often than not, a user has a preference of scheme to type the input in. Similarly, at times, one faces a need to render it in a different scheme in the PDF document.

In my case, I prefer using ibus-m17n to type text in Devanagari. While writing articles that contain Devanagari text, I also faced the need to render the text as IAST in the final PDF.
One could always learn to input text in another input scheme, but that may get tedious. Similarly, transliterating each word using online systems such as Aksharamukha can also be a tedious task. So, I was looking for a way where I can type in Devanagari, and have it rendered in IAST after PDF compilation. As a solution, I came up with a system consisting of a small set of LaTeX commands to add custom syntax to LaTeX and a python transliteration script (based on indic-transliteration package) to serve as a middle-layer and process the LaTeX file to create a new LaTeX file with proper transliteration.

## LaTeX Compilation System with Transliteration Support

There are two primary components to the system,

1. LaTeX Synatx
2. Transliteration Script

### LaTeX Syntax

XeTeX (xelatex) and LuaTeX (lualatex) have good unicode support and can be used to write Devanagari text. In the current example, I mention the setup with XeTeX.

We first add the required packages in the preamble of the LaTeX (.tex) file.

% This assumes your files are encoded as UTF8
\usepackage[utf8]{inputenc}

% Devanagari Related Packages
\usepackage{fontspec, xunicode, xltxtra}


Using fontspec, we can define environments for font families, to write text in specific scripts. To write Devanagari text, one needs to have a Devanagari font available. (It is assumed here that one may need to write both in Devanagari as well as other transliteration schemes.)

For more on Devanagari fonts, you may check the fonts section of this document. In this section, it is assumed that Sanskrit 2003 font is installed in the system.

To define the environments as mentioned earlier, we add the following lines in the preamble.

% Define Fonts
\newfontfamily\textskt[Script=Devanagari]{Sanskrit 2003}
\newfontfamily\textiast[Script=Latin]{Sanskrit 2003}

% Commands for Devanagari Transliterations
\newcommand{\skt}[1]{{\textskt{#1}}}
\newcommand{\iast}[1]{{\textiast{#1}}}
\newcommand{\Iast}[1]{{\textiast{#1}}}
\newcommand{\IAST}[1]{{\textiast{#1}}}


This provides us with four commands. \skt{} can be used to render Devanagari text. \iast{}, \Iast{} and \IAST{} can be used to render devanagari text in IAST format in lower case, title case and upper case respectively. It should be noted that from the perspective of LaTeX engine, the commands \iast{}, \Iast{} and \IAST{} are identical. They are just different syntactically to aid the python script to perform transliteration and apply appropriate modifications.
It should further be noted that we can define new font families and new commands for any of the valid schemes as per the requirement, which can potentially give us additional commands such \velthuis{}, \hk{} and so on.

### Minimal Example

Equipped with these commands, and some Devanagari text, we have a minimal example as follows, stored in the file minimal.tex,

\documentclass[10pt]{article}

% This assumes your files are encoded as UTF8
\usepackage[utf8]{inputenc}

% Devanagari Related Packages
\usepackage{fontspec, xunicode, xltxtra}

% Define Fonts
\newfontfamily\textskt[Script=Devanagari]{Sanskrit 2003}
\newfontfamily\textiast[Script=Latin]{Sanskrit 2003}

% Commands for Devanagari Transliterations
\newcommand{\skt}[1]{{\textskt{#1}}}
\newcommand{\iast}[1]{{\textiast{#1}}}
\newcommand{\Iast}[1]{{\textiast{#1}}}
\newcommand{\IAST}[1]{{\textiast{#1}}}

\title{Transliteration of Devanagari Text}
\author{Hrishikesh Terdalkar}

\begin{document}

\maketitle

\skt{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

\iast{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

\Iast{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

\IAST{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

\end{document}


### Transliteration Script

The python script is used to perform transliteration and some clean-up on the LaTeX.

python3 finalize.py minimal.tex final.tex


This result in the content being transformed in the following way,

% ...

\skt{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

\iast{ko nvasmin sāmprataṃ loke guṇavān kaśca vīryavān|}

\Iast{Ko Nvasmin Sāmprataṃ Loke Guṇavān Kaśca Vīryavān|}

\IAST{KO NVASMIN SĀMPRATAṂ LOKE GUṆAVĀN KAŚCA VĪRYAVĀN|}

% ...


We can now proceed to compile the final.tex file.

xelatex final


This results in the following output,

### Anatomy of the Transliteration Script

At the core of the transliteration script, there is a function transliterate_between.

def transliterate_between(
text: str,
from_scheme: str,
to_scheme: str,
start_pattern: str,
end_pattern: str,
post_hook: Callable[[str], str] = lambda x: x,
) -> str:
"""Transliterate the text appearing between two patterns

Only the text appearing between patterns start_pattern and end_pattern
it transliterated.
start_pattern and end_pattern can appear multiple times in the full
text, and for every occurrence, the text between them is transliterated.

from_scheme and to_scheme should be compatible with scheme names from
indic-transliteration

Parameters
----------
text : str
Full text
from_scheme : str
Input transliteration scheme
to_scheme : str
Output transliteration scheme
start_pattern : regexp
Pattern describing the start tag
end_pattern : regexp
Pattern describing the end tag
post_hook : Callable[[str], str], optional
Function to be applied on the text within tags after transliteration
The default is lambda x: x.

Returns
-------
str
Text after replacements
"""

if from_scheme == to_scheme:
return text

def transliterate_match(matchobj):
target = matchobj.group(1)
replacement = transliterate(target, from_scheme, to_scheme)
replacement = post_hook(replacement)
return f"{start_pattern}{replacement}{end_pattern}"

pattern = "%s(.*?)%s" % (re.escape(start_pattern), re.escape(end_pattern))
return re.sub(pattern, transliterate_match, text, flags=re.DOTALL)


We can provide the start and end patterns as \iast{ and } respsectively, to transliterate the text enclosed in these tags.

Using this function, we can write a generic function to work with any transliteration scheme.

def latex_transliteration(
input_text: str,
from_scheme: str,
to_scheme: str
) -> str:
"""Transliaterate parts of the LaTeX input enclosed in scheme tags

A scheme tag is of the form \\to_scheme_lowercase{} and is used
when the desired output is in to_scheme.

i.e.,
- Tags for IAST scheme are enclosed in \\iast{} tags
- Tags for VH scheme are enclosed in \\vh{} tags
- ...

Parameters
----------
input_text : str
Input text
from_scheme : str
Transliteration scheme of the text written within the input tags
to_scheme : str
Transliteration scheme to which the text within tags should be
transliterated

Returns
-------
str
Text after replacement of text within the scheme tags
"""
start_tag_pattern = f"\\{to_scheme.lower()}"
end_tag_pattern = "}"
return transliterate_between(
input_text,
from_scheme=from_scheme,
to_scheme=to_scheme,
start_pattern=start_tag_pattern,
end_pattern=end_tag_pattern
)


Note: The names of schemes (and therefore the corresponding LaTeX commands) have to conform to the names of schemes used
by the indic-transliteration package.

IAST is a case-insensitive transliteration scheme, and as such, we might be interested in specific capitalization of certain words (e.g. proper nouns). We can use the post_hook argument to provide this function. Using that, we can create a function to handle the three variants of IAST mentioned previously, namely, \iast{} (lower), \Iast{} (title) and \IAST{} (upper).

def devanagari_to_iast(input_text: str) -> str:
"""Transliaterate parts of the input enclosed in
\\iast{}, \\Iast{} or \\IAST{} tags from Devanagari to IAST

Text in \\Iast{} tags also undergoes a .title() post-hook.
Text in \\IAST{} tags also undergoes a .upper() post-hook.

Parameters
----------
input_text : str
Input text

Returns
-------
str
Text after replacement of text within the IAST tags
"""
intermediate_text = transliterate_between(
input_text,
from_scheme=sanscript.DEVANAGARI,
to_scheme=sanscript.IAST,
start_pattern="\\iast{",
end_pattern="}"
)
intermediate_text = transliterate_between(
intermediate_text,
from_scheme=sanscript.DEVANAGARI,
to_scheme=sanscript.IAST,
start_pattern="\\Iast{",
end_pattern="}",
post_hook=lambda x: x.title()
)
final_text = transliterate_between(
intermediate_text,
from_scheme=sanscript.DEVANAGARI,
to_scheme=sanscript.IAST,
start_pattern="\\IAST{",
end_pattern="}",
post_hook=lambda x: x.upper()
)

return final_text


Finally, there are other utility functions to remove comments and clean excessive whitespaces.

### Extras

Additionally, we may want some more structure to our setup, such as,

• Separation of ontent into multiple files
\input{sections/section_devanagari.tex}
\input{sections/section_iast_lower.tex}
\input{sections/section_iast_title.tex}
\input{sections/section_iast_upper.tex}

• Bibliography
\bibliographystyle{acm}
\bibliography{papers}


#### Final LaTeX Preparation

We may have used the scheme tags across multiple sections. One option is to apply the transliteration script on every section file, to create a new set of section files and use those to compile the final LaTeX file.

A simpler solution is available in the form of latexpand which resolves the \input{} commands to actually include the content and create a single consolidated LaTeX file.

latexpand main.tex > single.tex


Now, we can run the python script on this file to resolve the transliteration tags.

python3 finalize.py main.tex final.tex


#### Compilation

When working with BibTeX, we often need to multiple times to get the correct rendering of references in the PDF. Usually, this requires

xelatex final
bibtex final
xelatex final
xelatex final


Alternatively, we can use latexmk which takes care of the tedious compilation routines and reduces our job to a single command,

latexmk -pdflatex='xelatex %O %S' -pdf -ps- -dvi- final.tex


Another benefit of using latexmk is, we can clean the numerous files generated by LaTeX engine using a one-liner as well,

latexmk -c


#### Makefile

Finally, we can place all of the console commands together in a Makefile.

all: .all

.all: main.tex sections/*.tex papers.bib
latexpand main.tex > single.tex
python3 finalize.py single.tex final.tex

latexmk -pdflatex='xelatex %O %S' -pdf -ps- -dvi- final.tex

clear:
latexmk -C
rm single.tex
rm final.tex

clean:
latexmk -c


Thus, now we can focus on writing content in the .tex files and once we are done, simply use the command,

make


### Requirements

We have made use of a number of external tools, and it is required to have these setup prior to the described solution.

#### Minimal Requirements

The minimal example mentioned earlier requires only three things,

#### Extra Requirements

The extras have some more dependencies.

## Devanagari Fonts

Nowadays, there are several good Devanagari fonts available. Google Fonts also provides a wide variety of Devanagari fonts.

Two of my personal favourites are,

## Code

The source code for the entire setup is available at hrishikeshrt/devanagari-transliteration-latex.