Literatur
Details zur Publikation
Im Folgenden sind alle verfügbaren Informationen aufgeführt, die zur gewählten Publikation vorliegen.
Baldwin, Timothy & Marco Lui (2014). »Accurate Language Identification of Twitter Messages.«. In: Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), Nr. . S. 17–25.
Download der Publikation
Download via Original-URL
Weiterführende Informationen
Abstract: We present an evaluation of “off-theshelf” language identification systems as applied to microblog messages from Twitter. A key challenge is the lack of an adequate corpus of messages annotated for language that reflects the linguistic diversity present on Twitter. We overcome this through a “mostly-automated” approach to gathering language-labeled Twitter messages for evaluating language identification. We present the method to construct this dataset, as well as empirical results over existing datasets and off-theshelf language identifiers. We also test techniques that have been proposed in the literature to boost language identification performance over Twitter messages. We find that simple voting over three specific systems consistently outperforms any specific system, and achieves state-of-the-art accuracy on the task.
Rezension verfassen
Diese Publikation ist bislang noch nicht rezensiert worden. Sie können die erste Rezension schreiben!
Rezension schreiben
BibTex-Export
Sie möchten die bibliografische Angabe in ein Literaturverwaltungsprogramm oder in LaTeX importieren? Einen BibTex-Datensatz erhalten Sie hier.