===================================================================================== TWEET-NORM 2013 Tweet Normalization Workshop at SEPLN 2013 Madrid, Spain 15-20 September, 2013 http://komunitatea.elhuyar.eus/tweet-norm/ ===================================================================================== Call for papers ===================================================================================== TWEET-NORM 2013, that will be held in the 29th edition of the Annual Conference of the Spanish Society for Natural Language Processing (SEPLN2013) in Madrid (Spain), invites researchers to submit articles or unpublished recent studies relating to systems, methods and algorithms for lexical normalization of tweets in Spanish and, specially, to participate in the proposed shared task. Introduction ------------ One of the most important challenges facing us today is how to process and analyze the large amount of information on the Internet, and especially social networking sites like Twitter, where millions of people daily express ideas and opinions on any topic of interest. These texts, called tweets, are characterized by having a short length (140 characters) that is too small compared with the size of traditional genres. Consequently, users of these networks have developed a new form of expression that includes SMS-style abbreviations, lexical variants, letters repetitions, use of emoticons, etc. The result is that current NLP tools can have problems to process and understand these short and noisy texts unless they are normalized first. The TWEET-NORM lexical normalization task proposes the automatic "cleansing" of a set amount of tweets by identifying and normalizing, abbreviations, words with repeated letters, and generally any out of the vocabulary (OOV) words, regardless of syntactic or stylistic variants. While there has been some progress in this field for English tweets there are very few studies and resources available to date for Spanish. Thus, the aim of the workshop is to provide a forum for discussion and communication where researchers can test approaches, algorithms and resources in order to promote the application of techniques and algorithms in this area. To do this, a shared task in which the participants will have to normalize a set of tweets, is proposed. An annotated corpus will be provided to the participants in order to develop and test the proposed solutions. Corpus ------ The corpus is composed by tweets gathered between the 1st and 2nd of April 2013 covering the geographic area of the Iberian peninsula, but ignoring those regions that have co-official languages. A large portion of these messages contain serious normalization problems. From this initial corpus two subsets are generated: a development set consisting of 500 tweets, and a test set consisting of 2000 tweets. Corpora will be available in the web page of the workshop at http://komunitatea.elhuyar.eus/tweet-norm/resources/ Registration ------------ Participants are required to register for the task in order to obtain de corpus by sending an email before May 31 to tweet-norm@elhuyar.com Submitting articles ------------------------ Submitted papers will have a maximum length of 4 pages, should follow the format established by the SEPLN (http://nil.fdi.ucm.es/sepln2013/callen.html) and will be sent by web. Important Dates --------------------------- May 30: Registration deadline for participants and publication of the development set. July 5: Publication of the test set. July 15: Result submission deadline. July 25: Publication of results. July 31: Article submission deadline. September 20: Workshop at SEPLN 2013 in Madrid.