2004年10月10日

Random Forests in Language Modeling

It is about the language model. The reading outline was as follows:

Title: Random Forests in Language Modeling

Author(s): Peng Xu and Frederick Jelinek

Author Affiliation: Center for Language and Speech Processing, the Johns Hopkins University, Baltimore, MD 21218, USA

Conference Title: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.

Language: English

Type: Conference Paper (PA)

Treatment: Practical (P) Experimental (X)

Abstract: In this paper, we explore the use of Random Forests (RFs) (Amit and Geman, 1997; Breiman, 2001) in language modeling, the problem of predicting the next word based on words already seen before. The goal in this work is to develop a new language modeling approach based on randomly grown Decision Trees (DTs) and apply it to automatic speech recognition. We study our RF approach in the context of n-gram type language modeling. Unlike regular n-gram language models, RF language models have the potential to generalize well to unseen data, even when a complicated history is used. We show that our RF language models are superior to regular n-gram language models in reducing both the perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system.
Descriptors: Natural Language Processing basic problem
Identifiers: random forests, language model, decision tree, perplixity

Personal feeling: It introduces the decision tree language model and random forests concepts. The main idea is wonderful. Random generating some decision tree language models and combine them for a whole model. This model could solve the data sparseness problem at some extent.

Some thing could be updated: The random decision tree generation method was not good enough. I believe we can use some optimization principles for get better random decision trees.

没有评论: