2004年7月31日

Drift in Tieli

Having a drift in Tieli was a hope of us. Today this will came true.

We all went to Tieli at 7:00 this morning. After three and a half hours in our beach wagon, we arrived to the Tonglong manor. Have a simple lunch, we began to our main activity--Drift in Yijimi River.

We were excited on the rafts and labber to each other. The stream was not quick. We should row to pull our rafts.

We were not only labberring to eacher, but also playing with some strangers all the way. There were two times to stop our rafts at the strands. The persons who were late were labbered by the earliers. And the first time Yiheng Chen and Xueting Li were labbered by all of us. The second time Wanxiang Che, Yongguang Huang, and Huipeng Zhang were labbered by all of us.

All of us had enjoied this drift. Good time it was. I thought so.

2004年7月30日

The ACE CR next task

The ACE CR next task was to use the newly decision tree to co-refer the CR relation of the evaluation corpus.

This was the core step of my whole task. I must pay more attention about it.

2004年7月29日

C4.5 to VC

The original C4.5R8 source code was in C format and running under Unix enviroment. The debug and trace works were difficult. In order to extract some inner result I must change the source codes into Cpp format and ender Windows format.

This work was tiring. As there were lots of function definition not in Cpp format and lots of re-definition errors must bu solved.

Just now I had finished the C4.5 project, but there were some linking errors in the Consult project.

Tomorrow would be a tiring day, too.

2004年7月28日

Fix the 27 features

After I had tested all paramaters of C4.5, I could confirm the 27 features for my ACE CR task.

There was a bug in c4.5R8, that the atof() function in C4.5.c could not change data into float. After changing atof() to atoi(), the program could identify the option of cutoff.

So I could do the next taskof consulting new case.

2004年7月27日

Three bugs

There were three bugs in my ACE CR samples generation programs. I found them when I wanted to improve the F-socre of the CR algorithm.

And after I debugged them, I kept on my program. The newly scores of my CR module were as follows:

------------------------------------------------------------------
bnews 3:1 train:test
------------------------------------------------------------------
Precision: 5224/(5224+2246)=0.6993 Recall: 5224/(5224+1741)=0.7500 F: 2PR/(P+R)=0.7238
------------------------------------------------------------------
treebank 3:1 train:test
------------------------------------------------------------------
Precision: 2739/(2739+970)=0.7385 Recall: 2739/(2739+1133)=0.7074 F: 2PR/(P+R)=0.7226
------------------------------------------------------------------
nwire 3:1 train:test
------------------------------------------------------------------
Precision: 5627/(5627+1365)=0.8048 Recall: 5627/(5627+3551)=0.6131 F: 2PR/(P+R)=0.6960
------------------------------------------------------------------
all 3:1 train:test
------------------------------------------------------------------
Precision: 13837/(13837+3752)=0.7867 Recall: 13837/(13837+6178)=0.6913 F: 2PR/(P+R)=0.7359
------------------------------------------------------------------

The samples amount was less than ever. As I thought the three bugs were little terrible.

The newly F-score was good enough for me to cintinue my tasks.

2004年7月26日

Visit Sun Island

In this summer vacation we had not gone to anywhere for a visit. This noon our ten students of our lab went to visit the Sun Island which is one of the most beautiful place of interesting.

We planned to visit the science and technology hall. But it is free only on Monday in a week. So we changed to visit the Sun Island. It is very beautiful. And we had one hour's riding on double bikes. Then we visited all sights. The wonderful pictures are as follows:

Our Group Photo



A lovely Squirrel



2004年7月25日

Update to new features

I added three new features into the features set. They were entity_information_match, gender_match, and number_match. And the new test results were as follows:

类型 训练样例 测试样例 P R F
bnews 348405 116134 0.729318 0.707436 0.71821
treebank 151249 50417 0.750872 0.72314 0.736745
nwire 351814 117271 0.808811 0.62384 0.704385
all 851468 283822 0.793076 0.693803 0.740126

The results displayed that the three new features were effective.