TAL Journal: Special issue on NLP for Under-Resourced Languages (59-3)
Until recently, most of the research work in Natural Language Processing (NLP) has been focused on a few well-described languages with many speakers. The lack of interest for other "under-resourced" languages and language varieties can be accounted for by a variety of reasons: lack of funding, of human resources, of appropriate technology, of complete and precise linguistic descriptions, of academic recognition by the scientific community, etc. Under-resourced languages nevertheless pose important scientific challenges, which open avenues of progress for NLP in general. First, at a time when state-of-the-art methods usually require large amounts of annotated data, work on under-resourced languages often imposes methods able to deal with small-sized datasets (small data). Second, given the difficulty of finding resources such as lexicons or corpora, the collected datasets are often very heterogeneous with respect to the time, space or domain parameters, e.g. corpora of texts corresponding to different geolinguistic varieties and different topics at different time points. This also often involves dealing with variation in the writing, due either to an evolution of spelling standards in time or also to the lack of spelling standards for languages or language varieties which are mostly oral and only seldom written. Third, NLP for under-resourced languages tends to be carried out in isolated or sparse research groups, and the resulting products are often in different formats and standards. Discovering, accessing, and making those resources interoperable so that they can be reused can become a challenge in itself. When dealing with under-resourced languages, issues of interoperability of data and metadata become of crucial importance for combining and re-using the few resources and tools that might be available.
The goal of this issue of Traitement Automatique des Langues (TAL) is to give an overview of current research on NLP for under-resourced languages from all over the world, encompassing a large variety of tasks.
Authors are invited to submit original papers on all aspects of NLP for under-resourced languages, in particular regarding, but not limited to, the following issues and tasks:
Methods for the acquisition, collection and elicitation of resources and annotations (e.g., OCR, crowdsourcing, etc.), for textual or spoken data
Spelling normalisation and character-level models for spelling variation
Projection of annotations from closely-related languages and cross-lingual models
Methods to deal with data sparsity, low quality issues and out-of-vocabulary words
Language and language variety identification, in particular for short texts and mixed language texts with code-switching
Computer-assisted language learning and writing aids (spelling correction, predictive text and word completion)
Issues related to reusability of NLP tools, techniques and resources for languages other than those originally targeted, with special concern for interoperability and reusability of resources and tools
Computational approaches for under-resourced and endangered languages documentation
We also invite authors to provide a short but accurate description of the languages or language varieties under study, focusing both on their linguistic and sociolinguistic characteristics:
Brief history, location of current speakers ;
Main linguistic properties (morphology, syntax) and language family ;
Writing system ;
Vitality, approximate number of speakers, and contexts of use.
TO NOTE
IMPORTANT DATES
Submission deadline :May 15, 2018 May 25, 2018
Notification to authors after the first review: July 16, 2018
Notification to authors after the second review: September 30, 2018 October 20, 2018
Final version:November 30, 2018
Publication: January 2019
THE JOURNAL
TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, http://www.atala.org) since 1960 with the support of CNRS (National Centre for Scientific Research). It is now published online, with an immediate open access to published papers, and annual print on demand. This does not change its editorial and reviewing process.