Neural networks for language identification: A comparative study

Author(s): Cunningham, P., Byrne, J.

Journal/Book: Inform Process Manage. 1998; 34: The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, England. Pergamon-Elsevier Science Ltd. 395-403.

Abstract: Since the advent of Jordan's recurrent network [Jordan, M. I. (1986) Serial Order: A Parallel Distributed Processing Approach. Tech. Rep. No. 8604. Institute for Cognitive Science, University bf California, San Diego.] which allows: the processing of data with a temporal component, neural networks have been used routinely for sequence processing. This type of network is analysed in this paper for its ability to discriminate between different languages based on its processing of a small sample of text. The motivation for developing this model was for its potential use in the on-line version of a Trinity College 1872 Printed Catalogue, a library catalogue which has entries in 14 different languages spanning over 5 centuries. It was thought that neural networks would perform well where entries to be analysed comprised only a few words. The neural network's performance was compared with that of trigrams and a suffix/morphology analysis. The trigrams proved to be superior, classifying-over 92% of the entries correctly compared to 88% for the neural network and 85% for the morphology/suffix analysis. Trigrams were also far superior in the speed at which statistics were compiled and the rate at which text was processed.

Note: Article Cunningham P, Univ Dublin Trinity Coll, Dept Comp Sci, Dublin 2, IRELAND

Zurück | Weiter