Bill_Lang: Text Mining

These days I was learning many papers and articles about Text Mining.

First, I found some English materials about it. And I find out that it is very promising. It can be used into lots of NLP areas. But I have so many questions that I couldn't fully understand them. So I changed my way. I downloaded lots of papers from CNKI(China National Knowledge Infrastructure). And after read through some of them I began to have some impression about Text Mining.

And between them I found out some significative things. There are some techniques to calculate the words' weights, for example, tf*idf can be used for this purpose. But I saw two other ways. But the basic thoughts are the same. The weights of words is proportional to the frequency that is in the whole corpus; and inversely proportional to the number of documents which include them. There are two other ways to calculate it. I think whether it is necessary to see about their validity of the given problem. May be it is needed.

The other significative thing is that Rough Set theory can be used in Text Mining. In the paper Methods for Information Extraction and Data Mining from Web the author used Rough Set theory to reduce the rule set of the mining results. But there is no detailed information about how they used RS for their application.

This evening, I was notified by Dr.Tliu to make a full study of TM and think about how to make full use of it.

It is a challenge for me to do so. I should try my best and never give up in a hurry.

Bill_Lang

2004年3月25日

Text Mining

没有评论:

发表评论