2004年7月31日

Drift in Tieli

Having a drift in Tieli was a hope of us. Today this will came true.

We all went to Tieli at 7:00 this morning. After three and a half hours in our beach wagon, we arrived to the Tonglong manor. Have a simple lunch, we began to our main activity--Drift in Yijimi River.

We were excited on the rafts and labber to each other. The stream was not quick. We should row to pull our rafts.

We were not only labberring to eacher, but also playing with some strangers all the way. There were two times to stop our rafts at the strands. The persons who were late were labbered by the earliers. And the first time Yiheng Chen and Xueting Li were labbered by all of us. The second time Wanxiang Che, Yongguang Huang, and Huipeng Zhang were labbered by all of us.

All of us had enjoied this drift. Good time it was. I thought so.

2004年7月30日

The ACE CR next task

The ACE CR next task was to use the newly decision tree to co-refer the CR relation of the evaluation corpus.

This was the core step of my whole task. I must pay more attention about it.

2004年7月29日

C4.5 to VC

The original C4.5R8 source code was in C format and running under Unix enviroment. The debug and trace works were difficult. In order to extract some inner result I must change the source codes into Cpp format and ender Windows format.

This work was tiring. As there were lots of function definition not in Cpp format and lots of re-definition errors must bu solved.

Just now I had finished the C4.5 project, but there were some linking errors in the Consult project.

Tomorrow would be a tiring day, too.

2004年7月28日

Fix the 27 features

After I had tested all paramaters of C4.5, I could confirm the 27 features for my ACE CR task.

There was a bug in c4.5R8, that the atof() function in C4.5.c could not change data into float. After changing atof() to atoi(), the program could identify the option of cutoff.

So I could do the next taskof consulting new case.

2004年7月27日

Three bugs

There were three bugs in my ACE CR samples generation programs. I found them when I wanted to improve the F-socre of the CR algorithm.

And after I debugged them, I kept on my program. The newly scores of my CR module were as follows:

------------------------------------------------------------------
bnews 3:1 train:test
------------------------------------------------------------------
Precision: 5224/(5224+2246)=0.6993 Recall: 5224/(5224+1741)=0.7500 F: 2PR/(P+R)=0.7238
------------------------------------------------------------------
treebank 3:1 train:test
------------------------------------------------------------------
Precision: 2739/(2739+970)=0.7385 Recall: 2739/(2739+1133)=0.7074 F: 2PR/(P+R)=0.7226
------------------------------------------------------------------
nwire 3:1 train:test
------------------------------------------------------------------
Precision: 5627/(5627+1365)=0.8048 Recall: 5627/(5627+3551)=0.6131 F: 2PR/(P+R)=0.6960
------------------------------------------------------------------
all 3:1 train:test
------------------------------------------------------------------
Precision: 13837/(13837+3752)=0.7867 Recall: 13837/(13837+6178)=0.6913 F: 2PR/(P+R)=0.7359
------------------------------------------------------------------

The samples amount was less than ever. As I thought the three bugs were little terrible.

The newly F-score was good enough for me to cintinue my tasks.

2004年7月26日

Visit Sun Island

In this summer vacation we had not gone to anywhere for a visit. This noon our ten students of our lab went to visit the Sun Island which is one of the most beautiful place of interesting.

We planned to visit the science and technology hall. But it is free only on Monday in a week. So we changed to visit the Sun Island. It is very beautiful. And we had one hour's riding on double bikes. Then we visited all sights. The wonderful pictures are as follows:

Our Group Photo



A lovely Squirrel



2004年7月25日

Update to new features

I added three new features into the features set. They were entity_information_match, gender_match, and number_match. And the new test results were as follows:

类型 训练样例 测试样例 P R F
bnews 348405 116134 0.729318 0.707436 0.71821
treebank 151249 50417 0.750872 0.72314 0.736745
nwire 351814 117271 0.808811 0.62384 0.704385
all 851468 283822 0.793076 0.693803 0.740126

The results displayed that the three new features were effective.

2004年7月24日

deeply analyse the features

The initial result of the coreference resolution has been got. But with the F-scores low leve, the problems appeared.

I believed that the precision was very high than the recall. The reasons were two. Firstly, the chosen twenty-four features were not good enough. There were some co-features had only been used inexplicit. They could be changed to be clearly. Secondly, the used research on my same problem displayed that the proportion of the positive and the negative was nearly 1:3. However my samples were in the proportion of 1:10.

The following task was to solve them.

2004年7月23日

The initial result

The C4.5R8 was used for my task. And I had got the first result of three samples, as follws:

------------------------------------------------------------------
bnews 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (348405 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
65340 13719( 3.9%) 1172 17106( 4.9%) ( 5.1%) <<
Evaluation on test data (116134 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
65340 4709( 4.1%) 1172 3819( 3.3%) ( 5.1%) <<
(a) (b) <-classified as
---- ----
3468 3498 (a): class +
321108847 (b): class -
Precision:3468/(3468+321)=0.91528 Recall: 3468/(3468+3498)=0.49785 F: 2PR/(P+R)=0.64491

------------------------------------------------------------------
treebank 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (151249 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
20672 4558( 3.0%) 384 5630( 3.7%) ( 3.9%) <<
Evaluation on test data (50417 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
20672 2700( 5.4%) 384 2407( 4.8%) ( 3.9%) <<
(a) (b) <-classified as
---- ----
1688 2184 (a): class +
22346322 (b): class -
Precision: 1688/(1688+223)=0.8833 Recall: 1688/(1688+2184)=0.43595 F: 2PR/(P+R)=0.58378
------------------------------------------------------------------
nwire 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (351814 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
53589 11635( 3.3%) 880 14826( 4.2%) ( 4.4%) <<
Evaluation on test data (117271 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
53589 6599( 5.6%) 880 5942( 5.1%) ( 4.4%) <<
(a) (b) <-classified as
---- ----
3763 5507 (a): class +
435107566 (b): class -
Precision: 3763/(3763+435)=0.89638 Recall: 3763/(3763+5507)=0.405933 F: 2PR/(P+R)=0.5588
------------------------------------------------------------------
all 3:1 train:test
------------------------------------------------------------------
Evaluation on training data (851468 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
96104 31886( 3.7%) 1753 37773( 4.4%) ( 4.6%) <<
Evaluation on test data (283822 items):
Before Pruning After Pruning
---------------- ---------------------------
Size Errors Size Errors Estimate
96104 13422( 4.7%) 1753 12154( 4.3%) ( 4.6%) <<
(a) (b) <-classified as
---- ----
8872 11236 (a): class +
918 262796 (b): class -
Precision: 8872/(8872+918)=0.90623 Recall: 8872/(8872+11236)=0.44122 F: 2PR/(P+R)=0.59348

Although the F-scores were near to the international results of MUC, I fell them could be improved.

Keep on to improved them.

2004年7月22日

"0D 0A" and "0A"

There is some difference between "0D 0A" and "0A". They are all typed as enter string of text file. But in the DOS state of the text file, the system will change "0A" to "0D 0A" automatically. If you want to write "0A" into a text file, you must use binary openmode for opening the text file and change a iint variable with the value of 10 to a char type variable, then write it itto the file.

The tolerant mode for "endl" and when you click enter key, the input value is "0D 0A".

This problem was discouraging me a little. However it had been solved.

Just now I trained the samples of the ACE corpus, the final result was the close test with 96.1% accuracy and the open test with 95.9%. The accuracy was too high to believe. I could check them tomorrow.

2004年7月21日

So hot weather

It was very hot in these days. Each day, when we went to the eatery, we felt we were in bathhouse.

This evening we came to the third floor of our eatery building. The top of the floor was plastic cloth, and the windows could not open, the air conditions were as the same in plastic bags.

When we were having our supper, the sweat beads were flowing in our face.

So hot eatery, we would not come there for a long time.

2004年7月20日

Makefile

I wanted to use C4.5 for my coreference resolution task. But there were some problems with the makefile of some library.

There were so many decision tree function library, and most of them were used under Unix. After I install the cygwin software I began to use them. But there were some problems of the makefile file. There were sme compile enviroment viriables that I could not understand completely.

This bottleneck must be solved tomorrow.

2004年7月19日

Recovery my machine

Last evening, I tried to install cygwin and delete some files in disk C. This morning, I started the machine and there was a prompt of "NTLDR is missing……". I found some material about this error prompt message. It was said that NTLDR was a file for starting XP operating system. If there was a prompt "NTLDR was missing", we could copy it from a system cd or same edition XP operating system.

So I made a dos starting floppy disk to start the computer into dos state and copy NTLDR file to disk C. The "NTLDR is missing" was missing. However the computer was rebooting endlessly.

I asked for help from Carl. He told me to install a new system on another disk, change the start order and then delete the new system files. I carried out his idea and managed to recover my operating system.

This idea was very good and useful. Thanks for Carl.

2004年7月18日

C4.5

I had my original plan for ACE coreferecne resolution evaluation plan. But there were a limit of Quilan's demo file. So I wanted to use C4.5 method for my task. But there was not very good package of C4.5 for me.

This problem must be solved in these days.

2004年7月17日

送别最后一人

下午4点多接到电话,说是我们班还有一位同学要坐火车离开哈尔滨。原来是我们班里的最后一位同学离开哈尔滨。

送人离开哈尔滨似乎有了固定的程序:帮同学拿好行李,坐车到火车站,买站台票,到候车亭,挤在人群里等着检票,随着人流到车厢门口,把行李帮同学在火车上放好,同学和大家握手拥抱告别,同学上车,同学在里面大家在外面开始伤感,火车启动大家挥手告别。这个程序我想就是标准的大四送别流程吧。不知怎么,送这最后一位同学我竟然没有流一滴眼泪,或许已经麻木,或许知道将来还能见到。

送别最后一位同学,我的大学生涯也就告一段落了。人总是要越走越远的,祝我们班级的全体同学工作顺利,身体健康,前程似锦……

2004年7月16日

Machine Learning Resource

Reviewing the bbs forum of NanJing University, I found lots of useful information. In this forum I liked the AI and DataMining block very much. And I had not visited them some days. This afternoon I found the links to lab of Zhi-Hua Zhou who was one of the best person in Machine Learning area. His lab was named as "LAMDA" which was shorted from "Learning and Mining Data". In the links of this lab I found there was a page listed Machine Learning Resources. There were so many things I had not seen and known.

So good resources. I could review them one by one!

2004年7月15日

The EntityInformation

Based on yesterday's related works, I had finished the entityinformation extraction task.

Just now my program had been tested over.

Good news! Keep on!

Tomorrow I should finish the CR features extraction sub-task.

Let me begin!

2004年7月14日

The ACE basic process

The ACE source files were in special format. We must pre-process them for our deep analysis. And based on the outline that taozi, car and me had made yesterday I had realized them this afternoon.

The coming problem was of my coreference features extraction.

Ok. Keep on this working enthusiasm.

Let me begin!!

2004年7月13日

The eyereach of anaphora resolution

Before today I thought the anaphora resolution was hot research point with only lots of papers. However, when I read Anaphora Resolution: the state of art, Mitkov recommanded a primer Graham Hirst's book "Anaphora in natural language processing". I searched this by Google. And the result shook me up. There were more than one hundred books about anaphora. The earilest book about anaphora was Benveniste, E. «L'anaphorique prussien din et le système des démonstratifs IE», Studi baltici 3: 121-130.

It seemed that the books I could read about anaphora were more and more.

2004年7月12日

ACL Workshop

This morning I translated the Front matter of the Relation of Discourse/Dialogue Structure and Reference '99 workshop of ACL. When I was doing this work, I felt a little that I was listenning their reports. Ok. I could read the first paper of this workshop tomorrow. The translated text document was as follows:

The Relation of Discourse/Dialogue Structure and Reference
话语结构和引用的关系
专门讨论会
第37届年度计算语言联合会议
编辑
Dan Cristea http://thor.info.uaic.ro/~dcristea/
Nancy Ide http://www.cs.vassar.edu/~ide/
Daniel Marcu

1999年6月21日
美国马里兰大学

组织者
Dan Cristea http://thor.info.uaic.ro/~dcristea/
Nancy Ide http://www.cs.vassar.edu/~ide/
Daniel Marcu

程序委员会:
Nicholas Asher(University of Texas) http://www.utexas.edu/cola/depts/philosophy/faculty/asher/main.html
Eugene Charniak(Brown University) http://www.cs.brown.edu/people/ec/
Udo Hahn(Freiburg University) http://www.coling.uni-freiburg.de/~hahn/hahn.html
Lynette Hirschman(MITRE Corporation)
Graeme Hirst(University of Toronto)
Massimo Poesio(University of Edinburgn) http://cswww.essex.ac.uk/staff/poesio/
Ehud Reiter(University of Aberdeen)
Michael Strube(university of Pennsylvania) http://www.eml.org/english/homes/strube/index.html
Wietske Vonk(Max Planck Institute)
Marilyn Walker(AT&T Labs Research) http://www.research.att.com/~walker/cv.html


讨论会程序
星期一, 6月21日

8:35-8:45 欢迎
引用和语用
8:45-9:10 An Integrated Approach to Reference and Presupposition Resolution
Robert Kasper, Paul Davis, and Craige Roberts
9:10-9:35 Approaches to Japanese zero pronouns: Centering and relevance
Tomoko Matsui
9:35-10:00 Anaphora Resolution using an Extended Centering Algorithm in a Multi-modal Dialogue System
Haeksoo Kim, Jeong-Mi Cho, and Jungyun Seo
10:00-10:25 Knowledge-Lean Coreference Resolution and its Relation to Textual Cohesion and Coherence
Sanda Harabagiu and Steven Maiorano
10:25-11:00 休息
基于语料库的方法
11:00-11:25 Posting and Resolving Bridging Anaphora in Deverbal NPs
Elena Not, Lucia Tovena, and Massimo Zancanaro
11:25-11:50 Discourse Structure and Co-reference: An Empirical Study
DanCristea, Nancy Ide, Daniel Marcu, and Valentin Tablan
11:50-12:15 Building a Tool for Annotating Reference in Discourse
Jonathan Decristofaro, Michael Strube, and Kathleen McCoy
12:15-1:30 午餐
引用和自然语言生成
1:30-1:55 Generating Anaphoric Expressions: Pronoun or Definite Description?
Kathleen McCoy and Michael Strube
1:55-2:20 Cb or not Cb? Centering Theory Applied to NLG
Rodger Kibble
14:40-15:00 Comprehension of Coreferential Expressions
Peter Gordon and Randall Hendrick
2:45-3:15 休息
基于语义的方法
3:15-3:40 Reference-based Discourse Structure for Reference Resolution
Helen Seville and Allan Ramsay
3:45-4:05 Reference Hashed
Frank Schilder
4:05-4:30 Logical Structure and Discourse Anaphora Resolution
Livia Polanyi and Martin van den Berg
4:30-4:45 休息
4:45-5:30 Discussion and Conclusion


2004年7月11日

MIT AI Lab: How to do research?

When I reviewed the documents in my computer, I found out the Word format file MIT AI Lab: How to do research. This is a classical article for any researcher. Last Oct. 29 I read it once. Just in this afternoon I reviewed it.
I tagged the important sentences of this file. They were useful for me. I listed them as follows:

与相关人员保持联系
阅读文献,始于今日
本领域最重要的十篇论文

阅读论文可分为三个阶段:
第一阶段是看论文中是否有感兴趣的东西。AI论文含有摘要,其中可能有内容的介绍,但是也有可能没有或者总结得不好,因此需要你跳读,这看一点那看一点,了解作者究竟做了些什么。内容目录(the table of contents)、结论部分(conclusion)和简介(introduction)是三个重点。如果这些方法都不行,就只好顺序快速浏览了。一旦搞清楚了论文的大概和创新点,就可以决定是否需要进行第二阶段了。在第二阶段,要找出论文真正具有内容的部分。很多15页的论文可以重写为一页左右的篇幅;因此需要你寻找那些真正激动人心的地方,这经常隐藏于某个地方。论文作者从其工作中所发现的感兴趣的地方,未必是你感兴趣的,反之亦然。最后,如果觉得该论文确实有价值,返回去通篇精读。

读论文时要牢记一个问题,“我应该如何利用该论文?”“真的像作者宣称的那样么?” “如果……会发生什么?”。理解论文得到了什么结论并不等同于理解了该论文。理解论文,就要了解论文的目的,作者所作的选择(很多都是隐含的),假设和形式化是否可行,论文指出了怎样的方向,论文所涉及领域都有哪些问题,作者的研究中持续出现的难点模式是什么,论文所表达的策略观点是什么,诸如此类。将阅读与程序设计联系在一起是很有帮助的。如果你对某个领域感兴趣,在阅读了一些论文后,试试实现论文中所描述的程序的“玩具”版本。这无疑会加深理解。可悲的是,很多AI实验室天生就是孤僻的,里面的成员主要阅读和引用自己学校实验室的工作。要知道,其他的机构具有不同的思考问题的方式,值得去阅读,严肃对待,并引用它们的工作,即使你认为自己明晓他们的错误所在。经常会有人递给你一本书或者一篇论文并告诉你应该读读,因为其中有很闪光的地方且/或可以应用到你的研究工作中。但等你阅读完了,你发现没什么特别闪光的地方,仅仅是勉强可用而已。于是,困惑就来了,“我哪不对啊?我漏掉什么了吗?”。实际上,这是因为你的朋友在阅读书或论文时,在头脑中早已形成的一些想法的催化下,看出了其中对你的研究课题有价值的地方。

Jo Cool有了一个好想法。她将尚不完整的实现与其他一些工作融合在一起,写了一份草稿论文。她想知道这个想法究竟怎么样,因此她将论文的拷贝发送给十位朋友并请他们进行评论。朋友们觉得这个想法很棒,同时也指出了其中的错误之处,然后这些朋友又把论文拷贝给他们各自的一些朋友,如此继续。几个月后,Jo对之进行了大量修订,并送交给AAAI。六个月后,该论文以五页的篇幅正式发表(这是AAAI会议录允许的篇幅)。最后Jo开始整理相关的程序,并写了一个更长的论文(基于在AAAI发表论文得到的反馈)。然后送交给AI期刊。AI期刊要花大约两年的时间,对论文评审,包括作者对论文修改所花费的时间,以及相应的出版延迟。因此,理想情况下,Jo的思想最终发表在期刊上需要大约三年时间。所以牛人很少能从本领域出版的期刊文章中学到什么东西,来得太迟了。


有很多讨论某个AI子领域(如连接主义或者视觉)的邮件列表,选择自己感兴趣的列表加入。


当你读到某份让你感到很兴奋的论文,复印五份送交给对之感兴趣的其他五个人。他们可能会反馈回来很好的建议。本实验室有很多针对不同子领域的非正式(持续发展的)论文讨论组,他们每星期或每两星期聚会一次,对大家阅读完的论文进行讨论。


只要自己写下了些东西,将草稿的拷贝分发给那些可能感兴趣的人。(这也有一个潜在的问题:虽然AI领域的剽窃很少,但也确实有。你可以在第一页写上“请不要影印或者引用”的字样以做部分防范。)大部分人并不会阅读自己收到的大部分论文,因此如果只有少数人返回评论给你,也不用太在意。你可以如此反复几次——这是期刊论文所必需的。注意,除了自己的导师,一般很少将两次以上的草稿送给同一个人。


维护一份自己感兴趣参考文献的日志。

“参考文献”图。所谓的参考文献图,是指引用组成的网:论文A引用B和C,B引用C和D,C引用D,等等。

与他们讨论自己认为确实优秀的论文

从某个时间开始,你将会开始参加学术会议。如果你确实参加了,你会发现一个事实,几乎所有的会议论文都令人生厌或者愚蠢透顶。(这其中的理由很有意思,但与本文无关,不做讨论)。那还去参加会议干吗?主要是为了结识实验室之外的人。外面的人会传播有关你的工作的新闻,邀请你作报告,告知你某地的学术风气和研究者的特点,把你介绍给其他人,帮助你找到一份暑期工作,诸如此类。如何与别人结识呢?如果觉得某人的论文有价值,跑上去,说:“我非常欣赏您的论文”,并提问一个问题。获得到别的实验室进行暑期工作的机会。这样你会结识另外一群人,或许还会学到另外一种看待事物的方式。

找出该领域最棒的期刊是什么,向该领域的高人请教。然后找出最近几年值得阅读的文章,并跟踪相关参考文献。

找出该领域最著名的学者,阅读他们所著的书籍。跟该领域的研究生泡在一起。参看外校研究该领域的系的课程表。拜访那里的研究院办公室,挑选有用的的文献。


做习题集。尽可能早地选修尽可能多的数学课,其他领域的课程以后选也很容易。


每一个人都需要知道认知心理学的某些知识

在MIT,Susan Carey开了一门很好的有关发展心理学的初级研究生课程。

如果你想研究自然语言处理,语言学是很重要的。不仅如此,它还包含了很多有关人类认知的约束。在MIT,语言学主要由Chomsky学院负责。你可以去看看是不是符合自己的兴趣。George Lakoff最近出版的书《Women, Fire, and Dangerous Things》可作为另外一种研究程序的例子。


做科研笔记

定期翻阅你自己的笔记本。有些人会做月度总结,方便将来的引用。笔记中记录中的东西经常可以作为一篇论文的骨干。这会使生活变得轻松些。相反,你会发现写粗略的论文——标题,摘要,分标题,以及正文的片段——是一种记录自己当前工作的有效方式,即使你并不准备把它变成一篇真正的论文。(过一段时间你或许会改变想法)。
你或许会发现Vera Johnson-Steiner的书《Notebooks of the Mind》很有用,该书并不是描写如何做笔记的文献,它描述了随着思想片断的积累,创新思想是如何出现的。


阅读有关如何写作的书籍。Strunk和White的《Elements of style》对基本的应该如何不应该如何做了介绍。

先写一个草稿,然后返回修订。写草稿有助于理顺思路,如果写不出来正文,那就写个大纲。逐步对之细化,直到已经很容易写出子部分的内容。如果连草稿也写不出来,隐藏掉正在写作的所有窗口,然后随便输入自己脑袋里想到的东西,即使看起来好像是垃圾。当你已经写出了很多文本后,重新打开窗口,将刚才写的东西编辑进去。另外一个错误是以为可以将所有的内容依次写出。通常你应该将论文的核心内容写出来,最后才是介绍部分。

坚持记日记也是练习写作的方法(也会使你试验更多的文体,不仅仅是技术论文)。这两种方法还有其它的实质作用


如果这还不够,还可从其他从事这一研究的人那里借用一些词语用法。


论文的写作要有利于读者查找到你所做的工作。无论是段落的组织还是通篇的组织,都要将最核心的部分放在前面。要精心写作摘要。确保摘要已经反映出你的好思路是什么。确保自己明白自己的创新点是什么,然后用几句话表达出来。

写完一篇论文后,删掉第一段或者头几句话。你会发现那是与内容无关的一般性话语,更好的介绍语句在第一段最后或者第二段的开头。

要写出有用的评论,需要读两遍论文。第一遍了解其思想,第二遍开始作评论。 如果某人在论文中屡次犯同一错误,不要每次都标记出来。而是要弄清楚模式是什么,他为什么这样做,对此还可以做什么,然后在第一页清晰地指出或者私下交流。


不要在论文写毁灭性的批评如“垃圾”。这对于作者毫无帮助。花时间提出建设性的建议。要设身处地地为作者着想。评论有很多种。有对表达的评论,有对内容的评论。对表达的评论也可以很不同,可以是校对打字稿,标点,拼写错误,字词丢失等。应该学一些标准的编辑符号。还可以是校正语法,修辞,以及混乱不清楚的段落。通常人们会持续地犯同一语法错误,因此需要花时间明确地指出。接下来是对组织结构的评论:不同程度(子句,句子,段落,小节乃至一章)的次序混乱,冗余,无关的内容,以及丢失论点。很难描述对内容进行评论的特征。你可能建议作者扩展自己的想法,考虑某个问题,错误,潜在的问题,表达赞美等。“因为Y,你应该读X”是一种总是有用的评论。


注意,作为一种礼貌,在要求别人评论之前,应首先用拼写检查器对自己的论文进行检查

确保论文可读性比较好。


论文在投往期刊之前,应该交流一段时间,并根据反馈的评论进行适当的修订。要抵制那种急匆匆地把结果投往期刊的做法。在AI领域,没有竞赛,而且不管怎么说,出版周期的延迟要大大超过对草稿进行评论的时间。读一读你想投稿的期刊或者会议的过刊,确保自己论文的风格和内容是适合的。很多出版物都有一页左右的“作者投稿须知”,仔细看看。


论文被决绝了——千万不要沮丧灰心。

无论是哪种的评审,作为评审者都要有礼貌。

如果你的导师有定期的研究讨论会,自愿去作演讲。

对于比较正式的报告——特别是你的答辩——应该在几个朋友面前练习一遍,请他们批评指正。


本质上所有的AI程序设计都使用Common Lisp。

设计或者应用程序设计中用到的大不相同。开始学的时候,可以先看看Abelson和Sussma的《Structure and Interpretation of Computer Programs》,并做一些练习。这本书与AI程序设计本质上并不相干,但是包含了一些相同的技术。然后读Winston和Horn写的Lisp书第三版,书里有很多优雅的AI程序。最后,进行实际的程序设计,而不是阅读,才是最好的学习程序的方法。


要给代码加注释。使用正确的数据抽象。将图和你的代码隔离开,由于你使用的语言基本上是Common Lisp,因此可移植性很好。诸如此类。经过头几年的学习后,应该写一些自己的标准AI模块,


像论文一样,程序也有可能过于追求完美了。不停重写代码以求完美,最大化的抽象所有的东西,编写宏和库,与操作系统内核打交道,这都使得很多人偏离了自己的论文,偏离了自己的领域。(从另外一方面,或许这正是你需要将来谋生的手段)。


查阅实验室的研究总结。其中有一页左右的篇幅描述了每个教师以及很多研究生目前在做什么。如果你对某些教师的研究工作感兴趣,查阅其最近的论文。在第一学期,与尽可能多的教师交谈。去感受他们喜欢做什么,他们的研究和指导风格是什么。

记住,一旦选好了题目,你必须与导师就论文完成的标准达成清晰的一致。如果你和他对论文具有不同的期望,最后你肯定死得很惨。必须定义好“完成测试”的标准,像一系列的能够证明你的理论和程序的例子。

做论文的过程中,有很多浪费时间的方式。要避免下列活动(除非确实跟论文相关):语言表达的设计;用户接口或者图形接口上过分讲究;发明新的形式化方法;过分优化代码;创建工具;官僚作风。任何与你的论文不是很相关的工作要尽量减少。一种众所周知的现象“论文逃避”,就是你突然发现改正某个操作系统的BUG是非常吸引人也很重要的工作。此时你总是自觉不自觉的偏离了论文的工作。要记住自己应该做些什么

2004年7月10日

离别的日子(4)

晚上7:21,T18,7车厢。王震坐这趟火车到北京。slchen,taozi,zsq和我陪同王震寝室的三位同学一起欢送他。大家把行李帮他搬上火车放好,然后大家一起下来站在车厢旁边。王震很激动,和大家逐一拥抱。

我很感激王震,真的。我清楚地记得去年7月25日下午我们班长告诉我王震的电话号码,说是刘老师的实验室正在招人,如果想去的话和王震联系一下。当时我正在忙着完成我的科技创新,抱着试一试的态度我给王震打了电话。晚饭在风味食堂王震向我介绍了刘老师的情况的实验室的情况,我也说了说我的情况。离开食堂时王震很高兴的说我一定能够进实验室。第二天上午王震打来电话说先到刘老师那里看看情况,没想到一会儿就打来电话叫我来面试。也就是这么短暂的一天时间,我的命运或许就发生了重大改变。因为刘老师和实验室的其他老师还有师兄师姐师弟师妹们深深的影响了我,让我找到了稍后奋斗的目标和动力。就在面试后的几分钟里,我被分到王震旁边的机器,他向我详细介绍了实验室的常用资源和一些需要注意的事情。刚开始的一段日子我在编程方面得到了他耐心的指点。在实验室后来的日子里我们相处的非常的融洽,甚至在有一次大家一起吃饭时成立了“Beer-4”。

想着这些我的眼眶也不仅湿润了。临上火车时,王震再次与大家逐一拥抱告别。大家站在车窗外,看着王震在里面哭着,我们也都不禁眼泪夺眶而出。这样持续了大约三分钟。火车启动的那一刻王震哭得更厉害了。是呀,换了是我我也会使他这个样子的。

zsq和我跟着火车走了一段,与王震挥手告别……。

离别是痛苦的,连续四天的送别,让我们成为了真正的大学毕业生。离别是痛苦的,就像婴孩的诞生。学业有成的铮铮男儿就是要到社会最需要的地方去练就自己,去成就自己的事业,去贡献自己的青春。

告别这段痛苦的离别,我也要开始进入实验室紧张有序的工作和学习之中。明日开始。

Let me begin!!

2004年7月9日

离别的日子(3)

上午大家都忙着搬寝室,中午好容易才挤出时间到火车站来。班里还有一位去大连工作的同学下午1:20的火车。

还像送前几位同学一样,我们在火车车窗外一字站成一排,大家一起挥着手与他告别。每位离开的同学都会哭,最“坚强”的也会在火车启动时眼噙泪水。

晚上9:20,班里最后一位女生到山东。剩下的两位暂时不走的女生哭成了泪人。火车启动后大家坐上最后一趟11线回到学校。

一天下来太困了,坐在床边就能睡着。

2004年7月8日

离别的日子(2)

早上、下午、傍晚、晚上分别有兄弟的火车。这一天就在哈站和寝室之间往返。送的人多了渐渐的眼睛有些麻木了,能哭出的眼泪也越来越少了。

以前见到别人在车厢外那种分别时的痛哭流涕体会的不是深刻,前些日子一位在北师大上学的高中同学说她送人时不敢去火车站,人到学校门口就哭得不行了。我总以为自己非常的坚强,不会在这种场合流泪。但是我错了。离别的兄弟哭得站不起身来,大家抱在一起非常痛苦,这种场景下只有眼泪占据着我的大脑。在火车站送人真的很苦。

晚上10:00,送别今天最后一位同学已是夜里十点左右。回到寝室屋子里空空的,一个人躺在床上回想着当初兄弟们的夜聊,想着兄弟们一起经历的酸甜苦辣,心里真不是滋味。困了,也不知何时就睡着了。

2004年7月7日

离别的日子(1)

回想四年前的今天我正在完成我的高考的最后两科。当时的考试注定了我今日与同寝兄弟的离别。离别是痛苦的。

上午我们寝室的老大K58到苏州,我们寝室有空的四位兄弟在火车站送他。和我们逐一拥抱后老大上车了。我们手里握着站台票,“眼睁睁”的看着自己的兄弟在车厢里站着。就这样我们车厢外四位兄弟,车厢里我们的老大,平时都非常坚强的我们都在哭着。那眼泪想憋都憋不住,使劲地在脸颊上淌着。老大和我一样,都是那种自力更生、艰苦奋斗的人,我们一起经历过太多的风风风雨雨。哭,是不舍。当火车启动的那一刻,我们都在大哭,老大在里面不断的招手,我们跟着火车跑,边跑边哭。他要到苏州去工作了。我们为他骄傲,为他自豪,为他祝福。下午老大发来短信说他很难受,心里还有很多的话没有说。其实他想说的我们都知道,前些日子有次老大醉得不省人事,在寝室里大闹了一场,说着胡话把寝室的每个兄弟都“点评”了一番。老大的心肠很好,祝福他在新的工作岗位上一切顺利。

晚上9点左右,我们寝室最老实的兄弟(我们原寝的老四)也踏上火车。同去送行的还有我们班的两位女生。我们五个人将他送上火车后,又是同样的场景,他在里面哭,我们在外面哭,两位女生也是非常的难受。这时电话成了我们唯一的交流工具。老四说出了太多的心里话。由于火车晚点发车,看着旁边四人哭着,我的眼泪一会儿停住了,在那里看着货车轮子发呆。突然一声气流的啸声,火车启动了。我又憋不住了,跟着车厢跑了一段,手心贴着车厢玻璃送别了我亲爱的兄弟。走出火车站时我们的心情都很沉重,两位女生说送站太痛苦了,以后“不想”再来了。可回到寝室她们又打来电话询问明日都有谁什么时间离开这里。

离别的日子是痛苦的日子。

2004年7月6日

毕业典礼

上午是学院的毕业典礼,与会的有学院领导(Dr.Tliu也到场了),学生家长代表,学院老师代表和全体毕业生。老师们、学生家长代表们、同学们都很激动。那些热情洋溢的话语深深的鼓舞和激励着每一个毕业生。给我印象最为深刻的是学院教师代表王宇颍老实的发言。在王老师的发言中包含了她对我们的几点建议。主要内容就是希望我们能够在新的工作和学习的岗位上继续勤奋、踏实、努力、创新,工作中不要畏惧的失败和挫折,要不断前进。王老师提到一句话:“很少有懒散的人能够取得成功,很少有人一直勤奋而永远失败。也就是所谓的勤能补拙。”

下午是学校的毕业典礼,学校所有的党政领导和一些学院领导参加了毕业典礼仪式。一些程序性的发言后是王树国校长的毕业致词。王校长没有带任何手稿,就这样发自肺腑的给我们提出了许多的建议和意见。“不要小看了自己,也不要高估了自己。人要有志气,要自信,要踏实。”这就是王校长给我们的话。语重心长而不失长者风度,侃侃道来在场的每位学生都被王校长的这番话语感动了……

2004年7月5日

Paper's poster news

Just now, I have heard the news about taozi's and my papers news. Taozi's paper was presentation class, and my paper was poster class. I analyzed the reasons for this result of my paper as follows:

Fines: I began to study in this field at the middle ten days of April. I began to wrote my paper at June 1. My paper's period was short. The coreferriing samples were few to construct a full tree for Chinese noun phrase coreference resolution. The poster class was enpugh for my work. I was lucky! Thanks for Dr.Tliu, and Mrs.Qin's guide.

Lacks: I had not make full use of my time to this papers experiment. I had not mastered of the methods of this field. The noun phrase recognition was very easy. I had not tagged more coreferring samples.

The later plan: I could read all the papers of this area, and be familiar with all machine learning methods.

Try to make better results.Let me keep on.

2004年7月4日

ACE Coreference

The ACE coreference task for me was not easy.

And all the processing detailed flows were clearly in my brain. But how to choose the feature vector was a big problem. Beacuse it was not same as my recent work. So, my idea was to pick all entities and analyze their features.

Let me begin!!

2004年7月3日

Class group photo

Our monitor wished everybody of our class has some photo with bachelor clothes. We could have the photos the day before yesterday in our plan. But as the rain, we delayed to this morning.

All of us were delighted and many many photos were made. They were the good memory of our class.

This afternoon, we had the medical examination. This evening all of our boy classmates went to Zhangxin's family. Zhanxin's parents, aunt and uncle kept open doors to us. Thanks to them.

The left days of our class can be counted on one's fingers.

2004年7月2日

Work for ACE

The tasks have become explicit. My task was for Coreference of entities.

This afternoon, we, ACE group, discussed the detailed information about the ER, CR, Re tasks. My task was definite with the input and output. I would research the detailed informatioin about the guidlines and evaluation about CR.

The bottleneck of our tasks was the ER(entity recognition). And the most difficult problem was the noun phrase nesting recognition. And with the help of Mjs, we could solve it.

Try to finish this task.

2004年7月1日

The Last Dinner of Our Class

There were so many businesses we must to do. And this evening was the only space for us to have the last dinner.

Our monitor spoke some blessing words to us. And then we had the dinner and drank a lot. Some classmates were very athrill and said so many words. After the dinner, all of us had some sings.

All good things come to an end. We had undergone so many things including happy, painful, and lots of flavour. We were sorry to part company with so many good friends.

There were only five days for us to keep company. We would cherish them.