2005年8月4日

Reading(6):Rule vs. Stat, Evaluation, Baseline

This morning, I wanted to write a program for the distribution of the existed knowledge base. Because I had not mastered C# for such task. I used C++ instead. I overloaded the operator "<" and "==" for sort and unique algorithms, respectively. Yes. C++ was effective for my task.

---------------------------------------------------------------------------------
Pages: 152~157 of Natural Language Understanding, second edition, by James F. Allen, 1995
I'd like to pick the useful information about this book. As the beginning chapters were about syntax parser, I changed my reading strategy. I reviewed the book and picked some useful segments. There were some useful skills for my NLP research. Just sharing them with you.

Evaluation:
To the most research topic in Machine learning and Natural Language processing, when we had the estimating probabilities set and the related algorithms, we would like to know the comparison between your method with other classical ones. The usual solution was dividing the corpus into training set and testing set. The classical share of testing data was 10% to 20% of total. Another refined testing method was cross validation. It used the different segments of all corpus for testing. Each time, different training set was used and new testing data was tested. Then average result was the final result. This method ensured the most likely testing result. I had seen lots of machine learning packages used this method. But in my research, I had not used any. I would try it in the forthcoming NLP experiments.

The importance of baseline:
How to measure the performance of an experiment? I believed that we must consider the difficulty of the problem. There was a concept named as Basline. It meant the performance run by the most simple method. Just like to part-of-speech tagging, if you chose the mostly result in the training corpus, you would achieve about 90% accuracy. It was amazing. The reason was that there were more than half the words of the corpus had single pos. Therefore, we could use this method as the baseline for evaluation complicated algorithms. Unless your method achieved far great than 90%, its effect was not good. To baseline, there was another concept I wanted to share with you. That was upper bound. It was the best result achieved by human craft. The closer your method to upper bound, the best was it.
---------------------------------------------------------------------------------

没有评论: