2004年7月23日

The initial result

The C4.5R8 was used for my task. And I had got the first result of three samples, as follws:

------------------------------------------------------------------
bnews 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (348405 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
65340 13719( 3.9%) 1172 17106( 4.9%) ( 5.1%) <<
Evaluation on test data (116134 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
65340 4709( 4.1%) 1172 3819( 3.3%) ( 5.1%) <<
(a) (b) <-classified as
---- ----
3468 3498 (a): class +
321108847 (b): class -
Precision:3468/(3468+321)=0.91528 Recall: 3468/(3468+3498)=0.49785 F: 2PR/(P+R)=0.64491

------------------------------------------------------------------
treebank 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (151249 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
20672 4558( 3.0%) 384 5630( 3.7%) ( 3.9%) <<
Evaluation on test data (50417 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
20672 2700( 5.4%) 384 2407( 4.8%) ( 3.9%) <<
(a) (b) <-classified as
---- ----
1688 2184 (a): class +
22346322 (b): class -
Precision: 1688/(1688+223)=0.8833 Recall: 1688/(1688+2184)=0.43595 F: 2PR/(P+R)=0.58378
------------------------------------------------------------------
nwire 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (351814 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
53589 11635( 3.3%) 880 14826( 4.2%) ( 4.4%) <<
Evaluation on test data (117271 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
53589 6599( 5.6%) 880 5942( 5.1%) ( 4.4%) <<
(a) (b) <-classified as
---- ----
3763 5507 (a): class +
435107566 (b): class -
Precision: 3763/(3763+435)=0.89638 Recall: 3763/(3763+5507)=0.405933 F: 2PR/(P+R)=0.5588
------------------------------------------------------------------
all 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (851468 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
96104 31886( 3.7%) 1753 37773( 4.4%) ( 4.6%) <<
Evaluation on test data (283822 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
96104 13422( 4.7%) 1753 12154( 4.3%) ( 4.6%) <<
(a) (b) <-classified as
---- ----
8872 11236 (a): class +
918 262796 (b): class -
Precision: 8872/(8872+918)=0.90623 Recall: 8872/(8872+11236)=0.44122 F: 2PR/(P+R)=0.59348

Although the F-scores were near to the international results of MUC, I fell them could be improved.

Keep on to improved them.

没有评论: