=====================================================================================
TWEET-NORM 2013
Tweet Normalization Workshop at SEPLN 2013
Madrid, Spain

15-20 September, 2013

http://komunitatea.elhuyar.eus/tweet-norm/



=====================================================================================
Call for papers
=====================================================================================

TWEET-NORM 2013, that will be held in the 29th edition of the Annual Conference 
of the Spanish Society for Natural Language Processing (SEPLN2013) in Madrid (Spain), invites researchers to submit articles
or unpublished recent studies relating to systems, methods and algorithms for lexical normalization 
of tweets in Spanish and, specially, to participate in the proposed shared task.


Introduction
------------

One of the most important challenges facing us today is how to process and analyze the large amount
of information on the Internet, and especially social networking sites like Twitter, where millions of people
daily express ideas and opinions on any topic of interest. These texts, called tweets, are
characterized by having a short length (140 characters) that is too small compared with the size of traditional genres.
Consequently, users of these networks have developed a new form of expression that
includes SMS-style abbreviations, lexical variants, letters repetitions, use of emoticons, etc.
The result is that current NLP tools can have problems to process and understand these short and noisy texts unless they are normalized first.

The TWEET-NORM lexical normalization task proposes the automatic "cleansing" of a set amount of
tweets by identifying and normalizing, abbreviations, words with repeated letters, and generally
any out of the vocabulary (OOV) words, regardless of syntactic or stylistic variants.

While there has been some progress in this field for English tweets there are very few
studies and resources available to date for Spanish. Thus, the aim of
the workshop is to provide a forum for discussion and communication where researchers can
test approaches, algorithms and resources in order to promote the application of techniques and algorithms 
in this area. To do this, a shared task in which the participants will have to normalize a set of tweets, is proposed.
An annotated corpus will be provided to the participants in order to develop and test the proposed solutions.


Corpus
------

The corpus is composed by tweets gathered between the 1st and 2nd of April 2013 covering the geographic area of the Iberian peninsula, 
but ignoring those regions that have co-official languages. A large portion of these messages contain serious normalization problems.

From this initial corpus two subsets are generated: a development set consisting of 500 tweets, and a test set consisting of 2000 tweets.
Corpora will be available in the web page of the workshop at http://komunitatea.elhuyar.eus/tweet-norm/resources/


Registration
------------

Participants are required to register for the task in order to obtain de corpus by sending an email before May 31 to tweet-norm@elhuyar.com


Submitting articles
------------------------

Submitted papers will have a maximum length of 4 pages, should follow the
format established by the SEPLN (http://nil.fdi.ucm.es/sepln2013/callen.html) and will be sent by web.



Important Dates
---------------------------

May 30: Registration deadline for participants and publication of the development set.

July 5: Publication of the test set.

July 15: Result submission deadline.

July 25: Publication of results.

July 31: Article submission deadline.

September 20: Workshop at SEPLN 2013 in Madrid.