ENIAT standards

ENIAT – Early New Indo-Aryan Texts, Lausanne

General introduction

The texts in the ENIAT collection stem from different sources and follow different input conventions. The starting point was the corpus of texts digitized by Winand Callewaert in a Devanāgarī font (Parvati), i.e. the ca. 50 texts that form the source material of his Bhakti Dictionary.

The collection is intended as a forum to facilitate exchange of sources and resources. Ideally all texts will be available in transliteration (in Roman characters) and in Devanāgarī script.

The depository follows as strategy that the transliterated version constitutes the normative standard; the Devanāgarī format is automatically derived from it. Corrections, additions, etc. must therefore be implemented in the transliteration.

Where scholars produced and delivered text in Devanāgarī, the collaborator(s) of ENIAT will produce a transliterated version.

The Lausanne team intends to achieve a certain standardisation of the files entrusted to its depository:

– Unicode
– format of textual references
– markup of layout (footnotes, pagination, etc.,) and outline (titles, sections, etc.)
– conventions for transliteration that allow automatic transcription in Devanāgarī (all inherent short a must be represented in the transliteration, divider between adjoining vowels which are not diphtongs)
– uniform syntax of file names and extensions to distinguish formats

 

Files

Each text is stored as a separate file. A text specific introduction with the information about the source, the author or provenance of the digitization, the outline, the structure of references, etc., is added at the beginning of each file.

 

Transliteration

All short a-s after consonants (considered “inherent” and often not pronounced in modern standard Hindi) are transliterated, e.g., “rāma”, not “rām”.

Where two consecutive vowels do not form a diphtong (ai, au) but are joined with hiatus, a backslash separates them (a\i, a\ī, ā\u). The intervocalic backslash is changed to colon (e.g., a:ī) in the printout. (The backslash is employed also in other places where the transliteration in Devanāgarī (see below, 2.1) will otherwise not use the word-initial form of vowels, e.g., after parentheses, daṇḍa and sigla letters.)

 

References

The texts of the repository do not all observe the same conventions for the placement and syntax or structure of the textual references.

Where references were modified or created by the editors of the repository, the punctuation of textual references identifies and distinguishes the part of a reference. Semicolon is used exclusively to separate complete references, e.g. in indexes or quotations. Within references the sequence of separators is colon, comma, period, slash, which allows for references with five parts, e.g. part:section, chapter, verse/line, or 3.42, 11.9/2. To this may be added a siglum for the title of the source text using letters. This may result in Ke 94.2 for the Rāsa Māna ke Pada by Kevalarāma (where first and last unit are not used), or TM 1.13/2 for the Rāmacaritamānasa by Tulsīdāsa (with the line number of the verse added).

 

Markup

Markup with tags in pointed parentheses imitate the format of tags in xml (e.g., <p> … </p> for beginning and end of paragraphs). The tagged information must be enclosed by the beginning and the corresponding end tag.

The nomencalature is at present not standardized. (Ideally one should aim at following the conventions of the Text Encoding Initiative which provides an international standard.)

Examples of tags used in deposited files:

<mt> dohā </mt> : identifications of metres
<st> : subtitle
<m> : metrical line; the abbreviation suggest also the Hindi “mūla”, i.e. “root (text)”.
<cm> : commentary
<hd> : header
<pn> : page number
<funo> : footnote number (in running text)
<fntxt> : foot note text
<quote> : passages quoted from other (identified) texts
<col> : colophon
<t1>, <t2> etc : title of level 1, 2 etc., level four being the highest, most important, e.g. the book or section title.
These tags are necessary where the tagged information needs to be transliterated into Devanāgarī as integral part of the source text. The marking of fields allows to except certain fields from the transliteration process (for output in two typescripts).

 

Devanāgarī transcription

The transliteration into Devanāgarī employs the transliteration module of Oliver Hellwigs HindiOCR (a stand-alone version which was kindly made available by Oliver Hellwig); this program allows to define the beginning and end of passages to be skipped for the transliteration (e.g., tags, comments, footnotes, etc., all enclosed by curly braces, {…}, for the purpose). We wish to thank Oliver Hellwig very cordially for his continuing competent and patient support.

 

Inquiries, comments and corrections may be addressed to:

Maya Burger (maya.burger_at_unil.ch)
Nadia Cattoni (nadia.cattoni_at_unil.ch )

 

 

 

 

 

Leave a Reply