At this noon, I studied for a paper which was written by Changling Huang. In this paper, there is a good idea for WSD. Usually, we do WSD by making choice of the number of the semantic classes. But we do not make very good language model for the words in the context. And in this paper, the context is fully used to construct the language model. There is a noun named observational window for every word. Make a statics of probability for the words in the window to the key word, and build a vector for the key word of the context. If you make the vectors for all words, you can get a vector space. Then choice some typical high-frequency words to imply the clustering algorithm to get a lot of sets of words.
In this paper, there is a conclusion that using this method for a large corpus, you can get a lot of semantic sets which is consistent with Cilin at average probability of 81%.
This is a very good idea for WSD. But I think there are lots of other good method for WSD. We should mine them.
没有评论:
发表评论