It is about the language model. The reading outline was as follows:
Title: Random Forests in Language Modeling
Author(s): Peng Xu and Frederick Jelinek
Author Affiliation: Center for Language and Speech Processing, the Johns Hopkins University, Baltimore, MD 21218, USA
Conference Title: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.
Language: English
Type: Conference Paper (PA)
Treatment: Practical (P) Experimental (X)
Abstract: In this paper, we explore the use of Random Forests (RFs) (Amit and Geman, 1997; Breiman, 2001) in language modeling, the problem of predicting the next word based on words already seen before. The goal in this work is to develop a new language modeling approach based on randomly grown Decision Trees (DTs) and apply it to automatic speech recognition. We study our RF approach in the context of n-gram type language modeling. Unlike regular n-gram language models, RF language models have the potential to generalize well to unseen data, even when a complicated history is used. We show that our RF language models are superior to regular n-gram language models in reducing both the perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system.
Descriptors: Natural Language Processing basic problem
Identifiers: random forests, language model, decision tree, perplixity
Personal feeling: It introduces the decision tree language model and random forests concepts. The main idea is wonderful. Random generating some decision tree language models and combine them for a whole model. This model could solve the data sparseness problem at some extent.
Some thing could be updated: The random decision tree generation method was not good enough. I believe we can use some optimization principles for get better random decision trees.
没有评论:
发表评论