Dies ist das Newsblog des Sprachwissenschaftlichen Instituts an der Ruhr-Universität Bochum.





Ruhr-Universität Bochum
Sprachwissenschaftliches Institut



Powered by PivotX - 2.3.11 
XML-Feed (RSS 1.0) 
XML: Atom Feed 

« Vortrag am Donnerstag… | home | Jahrestagung der DGfS… »

Vortrag am Dienstag, 26.01.2010, 14:30 - 16 Uhr

Freitag, 15. Januar 2010. Aus der Kategorie 'Vortragsreihe'. Das Sprachwissenschaftliche Institut lädt ein zu dem Vortrag von Hans van Halteren (Nijmegen): Text Classification, a walk down memory lane In this talk I will present my excursions into the field of text classification, which partly also reflect the general developments which started some decades ago with a task called authorship attribution.
As for the classification task, authorship recognition is still a central one, being developed on collections with known authors and applied on collections with unknown authors, e.g. 14th century Dutch scribes. Apart from author identity, possible target classes can be properties of the authorm such as gender, age, psychological profile or second language proficiency. Alternatively, the history of a text can be investigated, e.g. the source language of translations of speeches in the European Parliament.
As for the algorithms, there has been a development from traditional small feature sets paired with techniques such as PCA to enormous feature sets paired with techniques such as Support Vector Machines.

Der Vortrag beginnt um 14:30 Uhr.