2015/11/11: TweetMT Corpus is publicly Available in the Resources section!
2015/09/11: Workshop Proceedings have beem published!
2015/09/08: Workshop Program is out!
2015/07/21: Paper submission deadline is extended to July 30th! Even if you did not take part in the shared task send your paper!
2015/06/23: Results released! Available in the Participation section.
2015/06/10: Paper submission is opened until July 21st!
2015/05/19: Delayed the deadline for test, May 26-29
2015/04/27: A list of additional parallel and monolingual resources is available in Resources section.
2015/04/23: Development-set released! Available in the Resources section.
2015/03/01: Registration is open
TweetMT is a workshop and shared task on machine translation applied to tweets. It will take place in September, 2015, in Alicante, co-located with SEPLN 2015. The objective of the task is to bring together interested researchers to join forces to experiment with and compare different approaches to tweet MT. This workshop is a follow-up to two other workshops organized previously also at SEPLN: TweetNorm2013 and TweetLID2014.
The machine translation of tweets is a complex task that greatly depends on the type of data we work with. The translation process of tweets is very different from that of correct texts posted for instance through a content manager. Tweets are often written from mobile devices, which exacerbates the poor quality of the spelling, and include errors, symbols and diacritics. The texts also vary in terms of structure, where the latter include tweet-specific features such as hashtags, user mentions, and retweets, among others. The translation of tweets can be tackled as a direct translation (tweet-to-tweet) or as an indirect translation (tweet normalization to standard text (Kaufmann&Kalita, 2011), text translation and, if needed, tweet generation). Although the first approach looks attractive, the lack of parallel or comparable tweets for the working languages (Petrovic et al., 2010) tends to lead us towards an indirect approach. Some authors also try to gather similar tweets in other languages (CLIR).
Work in this area is scarce in the literature but a growing interest is evident (Gotti et al., 2013). An important point of reference is the work done to translate SMS texts during the Haiti earthquake (Munro, 2010).
The current task will focus on MT of tweets between languages of the Iberian Peninsula (Basque, Catalan, Galician, Portuguese and Spanish), as well as English. The organizing committee will release development data including parallel tweets that will enable participants to train their systems. For the final evaluation participants will have to submit the automatic translation of a number of tweet corpora in a short period of time. The evaluation will be carried out using automatic distances to the reference corpora.
These corpora are not meant to be representative of all types of messages that can be observed in informal communication. This is instead an initial attempt at tackling part of the task which starts by addressing one of its simplest parts. We are planing on using more informal and varied corpora in future tasks as we make progress on these initial issues.
The workshop aims to be a forum where researchers will have a chance to compare their methods, systems and results.
- March 1: Registration opened
- April 21: Release of the development-set
- May 12: Registration deadline
- May 26: Release of the test-set
- May 29: Result submission deadline
- June: Evaluation. Publication of results
- July 30: Paper submission deadline
- August 10: Paper acceptance notification
- August 31: Papers’ camera ready version
- September 15: Workshop
15.30-17.00: SHARED TASK
15.30-16.00 Overview of TweetMT: A Shared Task on Machine Translation of Tweets at SEPLN 2015.
Iñaki Alegria, Nora Aranberri, Cristina España-Bonet, Pablo Gamallo, Hugo G. Oliveira , Eva Martínez, Iñaki San Vicente , Antonio Toral and Arkaitz Zubiaga
16.00-16.15 EHU at TweetMT: Adapting MT Engines for Formal Tweets
Inaki Alegria, Mikel Artetxe, Gorka Labaka and Kepa Sarasola
16.15-16.30 The UPC TweetMT participation: Translating Formal Tweets using Context Information
Eva Martínez Garcia, Cristina España-Bonet and Lluís Màrquez
16.30-16.45 Dublin City University at the TweetMT 2015 Shared Task
Antonio Toral, Xiaofeng Wu, Tommi Pirinen, Zhengwei Qiu, Ergun Bicici and Jinhua Du
17.30”-19.00: GENERAL TALKS
17.30-18.10: INVITED TALK Meritxell Gonzale
18.10-18.30: Language Segmentation of Twitter Tweets using Weakly Supervised Language Model Induction
18.30-18.50:Understandability of machine translated Hindi tweets before and after post-editing: perspectives for a recommender system information
Ritesh Shah and Christian Boitet
Gotti, Fabrizio, Philippe Langlais, and Atefeh Farzindar. “Translating Government Agencies’ Tweet Feeds: Specificities, Problems and (a few) Solutions.” NAACL 2013 (2013): 80.
Jehl, Laura, Felix Hieber, and Stefan Riezler. “Twitter translation using translation-based cross-lingual retrieval.” Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2012
J. Kaufmann and J. Kalita, “Syntactic normalization of twitter messages,” in International Conference on Natural Language Processing. (ICON 2011). New Delhi: McMillan, India, 2010, pp. 149–158
Robert Munro. 2010. Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge. In AMTA Workshop on Collaborative Crowdsourcing for Translation, Denver.
S. Petrovic, M. Osborne, and V. Lavrenko. The Edinburgh Twitter corpus. In Proceedings of the NAACL HLT 2010 Workshop on Computational. Linguistics in a World of Social Media , pages 25–26, 2010.