The LEXalia Library
“realia
you can read”
Lexalia are artifacts and images whose lexical content makes
them interesting. The LEXalia Library
photographs and scans lexalia, then builds searchable text corpora and indexed
image banks for educational and research use.
The LEXalia Library is particularly
concerned with the complex scripts used by Thai, Burmese, Khmer and Lao
in Southeast Asia, and in dozens of languages – from Arabic to Urdu – found
across South and Central Asia and the
A key
component of the project is the LEXalia List – a checklist of some 500
items that can help guide collection of lexalia. These include drug warning inserts, bumper
stickers, and even McDonald’s placemats: all of the texts that help define the
boundaries of adult language competence in societies around the world. The list helps assure broad coverage within –
and consistent coverage across – writing systems, languages, and
countries.
Easily overlooked,
lexalia exist in the interstices of three separate worlds too ephemeral or prosaic to demand the
status of cultural artifacts, not quite substantial enough to be respected as
fully fledged documents, and still too complex and insistent on interpretation
to be dismissed as simple images. They
are, indeed, ‘realia you can read’ – lexical resources of primary importance as
content, rather than as snapshots of content.
Characteristics and use Lexalia typically incorporate between 5 and 500 words of lexical
content. This allows for
headlines, bumper stickers, and
‘beware of dog!’ signs at the short end, while accommodating a wide range of
one or two-page forms, including leases, loan agreements, and product warning
inserts at the other. These texts are
useful not only for the stories they tell, but for they way they are told,
using abbreviations, slang, or social registers not ordinarily encountered in
regular classroom materials.
Lexalia hold
the greatest interest for the less commonly taught languages (LCTLs), which are
generally starved for the extensive vernacular content that instruction of more
widely encountered languages can take for granted. Nowhere is this more true than for those least
common languages that require complex scripts., These langauges use non-roman
alphabets, non-linear ordering, and context-dependent letterforms or ligatures.
Complex-script
languages engender the very largest gaps between the
Other applications
Beyond educational applications, lexalia provide extremely useful,
hard-to-obtain data for a variety of research purposes
in computational linguistics. These
include data on the creation of transliterated phonetic equivalents (as in
product, drug, and chemical names), special-purpose neologisms and formal terminology
(as in contracts), and ground truths (of the image data) for optical character
recognition and text-to-speech. Lexalia
are also helpful in applied linguistics; e.g. for extending the coverage of
dictionaries and studying the process of language modernization, as well as the
social sciences – lexalia are often the most visible manifestation of the
collision between globalization and local societies.
Finally (and,
hopefully, not entirely overshadowed by our technical interests), there is the content
of the content: the message of social
values and official attitudes. For
example, Thailand’s “Drinking is prohibited by the Five Precepts” (accompanying
some liquor advertising), “Please give your seat to monks” (found on public
transport), and “Rise to pay homage to the King” (flashed before every movie
showing), as well as messages about Aids, clean food preparation, etc. have
equivalents around the world, but the exact manner in which each country’s
local messages address global concerns provide direct, powerful illumination of
national cultures.