The LEXalia Library

realia you can read”

 

Lexalia are artifacts and images whose lexical content makes them interesting.  The LEXalia Library photographs and scans lexalia, then builds searchable text corpora and indexed image banks for educational and research use.

     The LEXalia Library is particularly concerned with the complex scripts used by Thai, Burmese, Khmer and Lao in Southeast Asia, and in dozens of languages – from Arabic to Urdu – found across South and Central Asia and the Middle East.  Lexalia supply the data we need to develop new theories, and put them to the test:  investigating automaticity in reading, the lexical coverage and true cultural content of classroom curricula, and, ultimately, the mechanisms we use for evaluation of both student performance and pedagogical methods.

     A key component of the project is the LEXalia List – a checklist of some 500 items that can help guide collection of lexalia.  These include drug warning inserts, bumper stickers, and even McDonald’s placemats:  all of the texts that help define the boundaries of adult language competence in societies around the world.  The list helps assure broad coverage within – and consistent coverage across – writing systems, languages, and countries.

     Easily overlooked, lexalia exist in the interstices of three separate worlds   too ephemeral or prosaic to demand the status of cultural artifacts, not quite substantial enough to be respected as fully fledged documents, and still too complex and insistent on interpretation to be dismissed as simple images.  They are, indeed, ‘realia you can read’ – lexical resources of primary importance as content, rather than as snapshots of content.

Characteristics and use   Lexalia typically incorporate between 5 and 500 words of lexical content.  This allows for  headlines, bumper stickers, and ‘beware of dog!’ signs at the short end, while accommodating a wide range of one or two-page forms, including leases, loan agreements, and product warning inserts at the other.  These texts are useful not only for the stories they tell, but for they way they are told, using abbreviations, slang, or social registers not ordinarily encountered in regular classroom materials.

     Lexalia hold the greatest interest for the less commonly taught languages (LCTLs), which are generally starved for the extensive vernacular content that instruction of more widely encountered languages can take for granted.  Nowhere is this more true than for those least common languages that require complex scripts.,  These langauges use non-roman alphabets, non-linear ordering, and context-dependent letterforms or ligatures.

     Complex-script languages engender the very largest gaps between the U.S. classroom experience and on-the-ground reality.  Teaching materials – which are typically created using the limited set of printing fonts available in the U.S. – are woefully inadequate at preparing the student for the modern or stylized printing styles that dominate the urban landscape.  And even more pertinently, we know very little about how non-fluent speakers learn to read complex scripts, or whether they can ever develop true automaticity, automatically ‘chunking’ text rather than methodically decoding it.

Other applications   Beyond educational applications, lexalia provide extremely useful, hard-to-obtain data for a variety of research purposes in computational linguistics.  These include data on the creation of transliterated phonetic equivalents (as in product, drug, and chemical names), special-purpose neologisms and formal terminology (as in contracts), and ground truths (of the image data) for optical character recognition and text-to-speech.  Lexalia are also helpful in applied linguistics; e.g. for extending the coverage of dictionaries and studying the process of language modernization, as well as the social sciences – lexalia are often the most visible manifestation of the collision between globalization and local societies.

     Finally (and, hopefully, not entirely overshadowed by our technical interests), there is the content of the content:  the message of social values and official attitudes.  For example, Thailand’s “Drinking is prohibited by the Five Precepts” (accompanying some liquor advertising), “Please give your seat to monks” (found on public transport), and “Rise to pay homage to the King” (flashed before every movie showing), as well as messages about Aids, clean food preparation, etc. have equivalents around the world, but the exact manner in which each country’s local messages address global concerns provide direct, powerful illumination of national cultures.