- The objective of the normalization task is to standardize a set of tweets in terms of spelling. We will release a tweet corpus which will contain tweets in serious need of normalization.
- Hashtags and IDs will not be considered OOV words.
- These OOV words will be manually annotated as variant/invariant/special-case, while the former will be accompanied by the normalized word.
- The corpus we will release contains a list of tweet IDs. Each tweet ID will be associated with its corresponding OOV words and manual annotation.
- Participants will test their methods on a reference corpus (for which no annotations will be provided), and another small development corpus with annotations.
- Finally, they will have to annotate the test corpus automatically in a short period of time.
- The task consists on either tagging an OOV as correct or proposing the correct form of an OOV. The evaluation will consider and OOV word proposal correct if the proposed form matches the reference:
- Correct: if the original form was correct (class 1) or unknown (class 2) and no normalization is done, or if the original form required normalization and the correct normalized form is proposed (class 0)
- Wrong: otherwise.
- The performance of the system will be measured in terms of Precision, according to the following formula:
- Precision = #correct proposals / #OOV words in the whole collection