Bill_Lang: 2004-08

2004年8月31日

回到学校

下午三点整，我们回到了哈尔滨。为期一周的北京之行画上了圆满的句号。明天还要补上体检并且补上没有上的课程。今天好好休息，利用明天的时间好好总结，拟定计划，开始新的征程！

2004年8月30日

上午还是李航博士的讲座，今天的主要内容最小长描述原理、最大熵方法和超平面分类器。讲课方式和昨天的一样。讲完所有内容后李航博士将他和他的同事的一些学习统计机器学习的经验向我们进行了介绍。大致思想是机器学习的三要素模型、策略和算法的各种组合变化无穷，针对某一种具体问题需要具体的分析。如果问哪种方法适用于那种应用，答案是没有固定结论。针对具体应用需要多多对比。机器学习博大精深，方法模型更是不胜枚举，没有必要将每种方法都学习一遍。与其泛泛的学习很多方法，不如将少数的几种扎深扎透。其实机器学习方法之间是相同的，学透了少数的几种就可以触类旁通。李航博士介绍的那些参考书籍和文章都很有价值，应该抽出时间来好好研读。

下午的讲座是北京语言大学的荀恩东博士的《Perl语言与自然语言处理》。荀恩东老师是咱们哈工大毕业的博士，还是校友亚！ Perl 语言以前就听说过非常适合于文本处理，曾经也自学过一阵子，但是一直没有真正用上。今天荀老师介绍后方知其强大功能和作用。讲座内容中主要介绍了Perl语言的一些基本的数据结构和基本的流程控制等语句。Perl中功能最为强大的是其模式匹配功能，也就是正则表达式的应用。荀恩东老师在介绍完Perl的基本特点后介绍了Perl 在自然语言处理中的英勇，主要有以下几个实例：查词典、词频统计、汉语分词、词性标注、简繁转换、网络机器人、连接数据库、调用Google API等。看着那些短短的数行代码就可以完成C++里面需要数倍行数代码才能实现的操作，心里不禁很激动，回学校后一定要好好学习并使用Perl。其中有一个演示非常有提示作用，在利用Perl来完成简繁转换的时候，采用的方法是在Word里面利用记录的宏操作的代码来嫁接到Perl中进行操作。Office中集成了大量的多种功能的Com组件，我们可以利用录制宏这一功能在嫁接出许多特殊的功能应用。

听完荀恩东老师的讲座后，我们为期两天的计算语言学讲习班也就结束了。晚上11：00我们乘上K39列车回哈。

2004年8月29日

首期全国计算语言学讲习班

上午八点半，首期全国计算语言学讲习班正是开幕。上午的主讲人是微软亚洲研究院的李航博士，主讲内容是《统计机器学习指南》。李航博士在这个领域很有造诣，他的演讲覆盖了整个机器学习领域的主要内容，首先介绍了统计机器学习的概况，然后逐一介绍EM算法、最小描述长度准则、最大熵、超平面分类器等方面。统计机器学习的三要素是模型、策略和算法。每个层次都有很多变化，即使面对同一个模型、策略，可以采用的算法也会有很多种，当把模型确定以后剩下的其实本质上是一个优化的问题。讲座中每种方法李航博士都会首先介绍一下该方法的历史，然后采用数学公式来描述问题，并在介绍的末尾部分给出进一步学习的内容和参考文献。

下午的讲座是厦门大学的史晓东博士主讲的《机器翻译系统的快速构建》。讲座中，史晓东教授讲述了机器翻译系统的快速构建的方法和基本使用的资源和工具。其中提到了许多著名的机器翻译系统。采用的模型有Bayes模型、EM算法等等。提到一个重要的思路就是在机器翻译过程中翻译生成的句子中有些词语与源语句中的任何词语都不匹配，这个时候描述方法是假设源语句开头有一个NULL，出现这种对应不上的情况时就认为是从NULL得到的。这样就可以描述这种情况并给出相应的概率模型。词对齐在机器翻译中用处非常大。

两位学者都是中文信息处理届的著名专家，从他们的讲座中，深深地感受到统计和机器学习在NLP领域中的重要性。

2004年8月28日

会议第三天

上午的特邀报告是北京语言大学的荀恩东博士介绍他们在中文信息处理中的一些研究成果。接下来的两个分会场都是在介绍算法、少数民族语言处理和知识库、概念体系相关技术。由于是最后一天很多论文都没有报告人。车万翔师兄主持了第二个会场。其中比较有意思的是一个《基于hownet概念获取的中文自动文摘系统》。作者采用的方法和常规的机械文摘思路差不多，创新的地方在于采用Hownet来计算词语的重要性并由此来计算句子的权值。采用的评价方法是将50篇人工标注的文摘与机械文摘进行对比句子的精确率和召回率。经过详细询问得知人工文摘不加任何限制，文摘的获取没有任何限制，标注人员愿意怎么标就怎么标。最后获得的结论也是一般性的随着文摘长度百分比的增加精确率逐渐下降、召回率逐渐增加。

廖先桃的报告《HMM与自动规则提取相结合的中文命名实体识别》使整个会议的倒数第二个报告，她在演讲时各个方面都准备得很好。有不少与会人员提出问题和她交流。

会议的最后一项是上午11：00左右在教二楼301举行的闭幕式。闭幕式由北京语言大学的杨尔弘老师主持。会上教育部语言信息管理司司长李宇明教授给大家描述了中文信息处理的远景，其中提到的一件事情很让我们震惊。那就是今年在上海召开的全球华人物理学家大会上规定不准用中文而只能用英文，诺贝尔奖得主丁肇中博士坚持用中文作了报告。现今国际社会中文还不是我们想象中的那么热。中文的前途需要我们大家一起努力。李明宇教授在《光明日报》上发表的一篇文章《强国的语言与语言强国》中表达了他的一些看法。闭幕式上中国中文信息学会秘书长曹右琦老师给我们总结了一些会议的特点。学生会议的一个最大的特点就是与会者的新面孔很多，老面孔很少。曹老师提到了老面孔中比如有车万翔。（当时师兄好像也有点受宠若惊）最后曹老师还给大家发放了中文信息学会的入会申请书，鼓励大家加盟。

随着闭幕式的结束，我们为期三天的第二届全国学生计算语言学研讨会就落下帷幕。明天开始就是为期两天的“首期全国计算语言学讲习班”。

2004年8月27日

会议第二天

上午有两个特邀报告：IBM研究中心的攀越博士的《业务语义和信息集成》和富士通公司的孟遥博士的《内容理解与信息服务》。接下来的报告中很多是文本主题跟踪的。本质上文本主题发现和跟踪就是文本聚类，采用各种加权方法来对文档中的特征进行加权和聚类。有一个关于语言生成的报告非常有意思，报告题目是《基于短语本位语法体系的混合模板汉语生成》。其中提到的一些语言生成的背景知识有趣。现在的自然语言生成的主要策略有四种：封装文本（也就是设置好一些句子的触发条件，一旦条件满足就生成句子。这种方法生成的句子流畅度高，但是通用性差，而且很难移植。），基于模板的方法，基于短语的方法和基于特征的方法。本文的作者采用的是一种混合模板的方法，首先依据一些触发条件按照模板生成短语，然后将生成的短语依据一些触发条件按照句子模板生成句子。本文的工作只是一种试探，模板制作困难，而且可移植性非常差，采用的人工评价方法是人认为满意就可以。但是本文的工作可以说是确定了一种语言生成的体系结构，如果能在模板生成时采用统计学习的方法在大量文本中学习获得从而解决移植性和通用性的问题，那么前途还是不可限量的。

相对而言，下午的内容让我收获颇多。首先是微软研究院的研究院高剑峰博士介绍了一下他们在刚结束的ACL上发表的一篇关于自适应分词系统的构建策略。主要背景是现在什么是词语在自然语言处理界还没有达成统一认识，不同的语料库的有不同的处理方法。通常在某个语料库上训练好的分词系统换一个评测语料就会效果很差。采用的方法是将基本的贝叶斯模型按照线性加权的思想来修改，通过训练出更加精细的适应语料的模型来完成分词任务。这种方法实验结果显示可以大大改良统一系统在不同训练语料上的评测结果。另一位负责信息抽取的博士展示了微软研究院的信息抽取方面的工作。现阶段微软研究院的信息抽取工具建立在SQL Server2003的基础上，将来会提供一些API接口来供用户进行信息抽取基础上的研发。主要抽取思路是在Chunk的基础上抽取一些关系，采用可视图的方式来进行展示。还有一位今天上午刚刚达完辨的博士给我们展示了他在中文Chunk识别方面的工作。具体的思路也是在用线性加权的方法来构建精细的Chunk识别模型。微软研究院讲座的最后是中文信息处理领域的老前辈黄昌宁博士的一点评论。黄昌宁老师简单说了一下中文信息处理领域的前景和介绍了一些高剑峰博士在微软亚洲研究院工作期间取得的成绩。原来高剑峰博士短短4年左右时间里已经在ACL上发表了7篇文章。真是牛人呀。这也更加说明了这个领域是一个年轻人的时代，只要多多努力，多多思考就会有成绩。

最为精彩的就是讲座后在北京语言大学会展大厅的几家单位的演示。演示单位有北京语言大学的分词标注系统、语料库检索系统、汉字教学系统，车万翔师兄演示我们实验室的集成好的Demo，清华大学周强博士演示的句法树库和句法分析，微软亚洲研究院高剑峰博士演示的分词系统，TRS公司的企业信息采集系统、图像检索系统，中科院的信息抽取系统等等。各家单位的演示都很精彩。就在演示场地的旁边一堆学生将黄昌宁老师围坐在中间。黄老师非常高兴的细致耐心的向大家解答各种问题。我想黄老师请教了指代的难度和指代消解的研究方法。黄老师同意我的研究方法。后来我又询问了知识库在中文信息处理中的用处。黄老师很认真地说我的这个问题在十年前是自然语言学界广泛探讨的问题，人们梦想通过构建知识库来解决大量的自然语言处理中存在的问题，进而推动人工智能的发展；日本当年投资了好几亿美元来构建了许多知识库，但是到目前为止真正用上的很少很少。还有人问到一些关于如何进行自然语言处理研究的问题。有一位四川师范大学的同学问黄老师如何在他们师范院校的环境下进行研究。黄老师给了一个很好的提示，那就是通过对大规模语料库的分析和理解来研究一些语言现象，可以读一下最新出版的《语料库语言学》。

2004年8月26日

会议第一天

早上8：30，在北京语言大学的逸夫楼一楼报告厅举行第二届全国学生计算语言学研讨会的开幕式。开幕式由北京语言大学的罗智勇主持，北京语言大学的副校长崔希亮致了开幕词。北京语言大学有个很独特的地方，那就是外国学生和中国学生的数量是1：1。整个学校以对外汉语教学为主，不是简单的像教中国小学生那样来教外国人学习汉语，而是借助外国学生的母语来学习。北京语言大学在促进中外交流方面做出了突出贡献。TRS公司董事长施水才代表中文信息处理学会和TRS公司做了一个简短的报告。主要内容是中文信息学会是一个很好的学会，公司和研究机构协作可以快速将实验室的研究成果转换为产品服务于社会并产生效益，现在国内外的各种NLP相关评测对中文信息处理的促进很大，中文信息处理很有前途。东芝公司、富士通公司也都有代表发言。

开幕式上的特邀嘉宾是北京大学俞士汶教授，主讲内容是大规模知识库的构建。北京大学计算语言所在这方面已经积累了很多的成果。报告开始提到报告中提到了现代汉语的诸多特点：
1）书面汉语的语言单位不清晰；
2）词缺乏形态；
3）虚词的词性以及用法灵活多变；
4）句法结构嵌套无标记；
5）时态、语态、语气多变。

现在NLP的研究领域中出现了一些新课题，比如：歧义、指代、省略、篇章级隐喻等。指代作为一个较新的领域，也存在很大的难度。

接下来就是正式的会议，上午报告在北京语言大学教学二楼的301和401同时举行，301主题是分词，401主题是机器翻译。根据我自己的兴趣我挑选了一些报告来听。机器翻译的第一个报告题目是Fuzzy Matching in Machine Translation Evaluation，其中对于自动文摘评价有价值的思想是将机器翻译生成的结果与标准翻译结果进行对比，对比的方法是看出现的n-gram的重叠率。这个思想对于自动文摘评价的启发就是对于那种非机械文摘可以比较重叠的n-gram（n可以取多种情况），对于机械文摘可以借助这种方法来对比流畅度。上午的报告有一些没有报告人，所以两个会场结束得都比较早。

下午的两个会场的主题分别是语言学研究、基于语料库的语言分析技术和机器翻译。我主要听的是前一个主题。此次会议中词义消歧的报告有三个。他们采用的都是无指导的方法。这也充分体现了词义消歧的研究趋势。本次会议的论文中仅有三篇是与指代消解相关的“基于语料分析的‘这/那＋NP’的指代消解算法”、“采用优先选择策略的中文人称代词的指代消解”和我的“基于决策树的中文名词短语指代消解”。第一篇文章没有相关报告，第二篇文章的作者是山西大学计算机系的罗云飞。他采用的方法和我的方法本质上是一样的，但是他完成的是单一的代词的指代消解，而且实验比我多了一些单个特征对最终结果的影响效果分析，主体方法上加入了一个指代相似度的概念。但是他的名词短语是人工标出的。我提问的训练语料标注过程、样本数量分布也没有得到答案。对比他的文章和我的文章，我开始领悟到一点如何写好文章的门道（就是要分析多一些，突出创新点，共知的东西简单描述，实验也要充分一些。以后的实验可以将问题集中到指代消解上，其他底层的工作如果现在没有好的解决方案就手工标注。）

2004年8月25日

未名湖

早上来到北京语言大学逸夫楼注册完后，我们来到北京大学找到王震。王震领我们在北大方正的办公楼顶的食堂吃了一顿工作餐。接下来的时间王震工作，车师兄有事，剩下我们三人开始参观北大。我们主要在北大未名湖转了一圈。今日天气不太好，整个天空有点灰蒙蒙，但是未名湖的美是好没有受到影响。意料之外的是斯诺的墓居然在未名湖畔，墓碑上放着许多白色郁金香，不禁让人肃然起敬。湖畔有两座小山，绿树成荫，看书学生也很多。这里确实是读书学习的好地方。整个学校很像一个景色宜人的公园。北大、清华相隔不远。剩下的时间里我们到清华大学的一侧转了一下，还参观了清华园的牌坊。时间不多，王震下班后与我们会合。又经过地铁、出租车的辗转我们来到张刚师兄的家。张刚师兄家很大很漂亮。张刚师兄是我们实验室最早毕业的同学，现在中科院读博，各方面能力都很强，为人也很谦虚。在那里还见到郑实福师兄。

晚饭后我们回到了北京语言大学的宿舍。和我同一宿舍的是华中师范大学的两位入学不久博士生，他们专程到这里来参加这个会议了解领域动向、寻求研究方向。

2004年8月24日

出发到北京

早上早早的起床踏上了去北京的火车。
由于昨天考试，今天我一个人坐火车到北京。火车上没有原先预计的那样孤独。坐在一起的几个人一会儿就变得熟悉起来，大家也比较愿意交谈。一位赵东的脑外科医生给我们讲了很多他的经历，说是要准备报考哈医大的博士生。他们医学博士两年就可以毕业。一位家在绥化的大三的同学在北京地质大学学计算机，我们交流了一些课程、考研之类的信息。坐在我旁边的是一位北京科技大学的校女篮主力队员，由于有伤在身，回家休养，现在回校又要集训。还有一位家在大庆，中山大学电子信息工程毕业，准备到乌鲁木齐支边两年，然后再回校攻读硕士研究生。12个小时的火车在我们的闲聊之中很快的过去。从北京站出来我和那位地质大学的同学一起乘坐地铁。我从鼓楼大街下车后打车来到我们实验室在北京的住处。吃晚饭，洗个澡，看看电视，很快就睡觉了。

2004年8月23日

Exam of Fault Tolerance

I was preparing for the Fault Tolerance Exam from yesterday evening to this morning. However, it is difficult for me. As there were some theme confusing me. I had tried my best. I thought so.

Tomorrow, I will come to Beijing for taking part in SWCL2004. It is a good chance for changing minds with others.

2004年8月22日

Prepare for the Fault Tolerance Exam

This was my whole task today.

2004年8月21日

A good book

I recommand a good book to you. It is named as The Elements of Statistical Learning: Data Mining, Inference, and Prediction. It is one of the Springer Series in Statistical. There is a sentence at the beginning of the preface. "We are drowning in information and starving for knowledge".-- Rutherford D.Roger.

It is some newly opinion of supervised and unsupervised learning to me.
"The learning problems that we consider can be roughly categorized as either supervised and unsupervised. In supervised learning, the goal is to predict the value of an outcome measure based on a number of input measures; in unsupervised learning, there is no outcome measure, and the goal is to describe the association and patterns among a set of input measures".

This book's coverage is board, from supervised learning(prediction) to unsupervised learning. The many topics include neural networks ,support vector machines, classification trees, and boosting--the first comprehensive tretment of this topic in any book.

So perfect I believe.

2004年8月20日

The final class of the Wearable Computing

This afternoon, we finished the last Wearable Computing. After three presentations, Prof. Daniel made some conclusion of our five days studying. The main idea of his words were as follows:

This was a challenge for us to listen, speak, read, write and think fully in English. This was not imaginabale before us taking in part in the classes. It could be view as a special training of our English.

The whole process was a reduction of the research. When you do some research, firstly you must read hundreads of papers. You should read the paper, understand others works, come up new ideas and relaize them. After that, you should sell your ideas to others for money for supporting your research. So he requested us to write a business plan on some application wearable computing. The works above consisted of the research.

Finally we took a group photo. It's good memory.

2004年8月19日

Most busys(2)

As had submitted the discription document of ACE evaluation, I could be concerned fully on the preparing for the presentation of Wearable Computing. Geted up at 6:30 like yeaterday, prepared the presentatio until 8:30, listened the Fault Tolerance class, continue prepared the presentation until 13:50, and beginning at 14:00 I began to do my presentation of "Power Management". However, I didn't think I prepared enough. What I could do was to present as well as I can.

The 35 minutes was hard to me. First I kept calm to describe the diagram one by one. But when I began to introduce the third part, I began to read the slides befor the audience. I knew it not well. As this part I prepared not well. At the end of my presentation, I read the five conclusions. As in the others' presentations, Dr.Daniel P. Siewiorek asked me a question about some points' meanings.

I struke my staff at Four-Flat. Today was the deadline we must remove to the Nine-Flat. After the class I removed in a hurry. It was tiring for us.

Came to the new bedroom, I felt novelty. When I came back at night, the domitory was quiet. It was good enough for me to read some materials.

2004年8月18日

Most busys(1)

This was the most busy day in my life. Yeah, I thinked so.

I dragged myself out of bed at 6:30, still half asleep, after my alarm clock had quivered for half an hour. About 6:50, I began to finish my ACE Evaluation task. Yeah! The simulating evaluation achieved good performance. The unweighted F value, total Value, and Value-based F score were 84.3%, 75.6%, 81.1%, respectively. Good news!! Following the original plan, I generated the final the 246 APF documents. When I checked the final documents, I found lots of the mentions clustered in right entities. But the "PRO" type mentions' accuracy was not good enough. I believed this was because of missing entity type and entity subtype. Just so. I wished we could gain good performance of the ACE evaluation.

At 8:50, I came to A315 quickly to listen the Falult Tolerance Class. The class was perfect, as in pure English.

So lucky that we were free this afternoon. Based on the system description of the ACE EDR, I added some new features and finished the ACE EDR Co-Reference algorithmic approach and system description. But when I finished this document, it was nearly 20:00. There were only four hours for me to complete the homework of the Fault Tolerance and make the powerpoint for presentation of tomorrow.

Time was very pressing to me. I read the papers as fast as I could. At 22:30, I had the outline in my mind, and began to prepare the ppt. Until the building administrator came here I finish half of it.

It was nearly 23:00 when I came back into my bedchamber. Today's homework I had not finished. Writing in a poor light, I completed it at 00:30.

Being busy could entich my life.

2004年8月17日

Two tasks

Go on listening the Fault Tolerant Techniques and finish the ACE CR Co-Reference Evaluation. They are my tasks.

Beginning at 6 this afternoon, I began to finish the ACE task. And based on my consideration I had finish a foundamental version of the testing documents. Now the ACE evaluation program is running on my test results.

It is too late. My Fault Tolerant Homework has not completed. I must come back now.

2004年8月16日

Dr.Daniel P. Siewiorek

I am very glad to study in the classes under Dr.Daniel P. Siewiorek's teaching. He would teach us two curricula during these five days: Fault Tolerant Computing and Wearable Computing. The timetable was like this: every morning of Fault Tolerant Computing, and the afternoon of Wearable Computing.

In the morning, Dr.Daniel teached us by himself. Frankly speaking, his speech speed was little quick for me to understand. I believed I could be adaptive to his speed. The main content of this morning's Fault Tolerant Computing were some basic concepts. They were common and useful in this area.

We met the big challenge that each of us should make a presentation in 30 minutes and comparing three papers. This afternoon, there were three of us made their presentations. There were some problems. Somebody's speech speed was too fast to hear clearly. There was not any hierarchy. The listeners could understand them clearly. I will give my presentation after three days. I must get rid of these problems.

I must go to sleep early. There is another busy day.

2004年8月15日

Finish the preparing works

Right now, I could say I have done the fully preparation for ACE EDR Co-Reference.

Based on last days' workd, I had completed the comparing tests. The first one was simulating the evaluation of ACE EDR Co-Reference. And I got the final unweighted F score of 83.2, weighted F score of 80.7, the final whole value of 71.5. The second one was adding the original eneity class attributes to construct the EMD output corpoa for testing. The final values were unweighted F score of 85.3, weighted F score of 81.8 the final whole value of 72.4.

The missing eneity class attributes could not affect a lot!

So I believed I have made all the preparation. Let me meet the challenge!

2004年8月14日

Good effect

Finished today's task, I had linked all the five programs and tested ten training documents. The final result displayed that my method was with good effect. The current performance was little lower than the perfect effect.

Ok. All the preparing works had been done. The next works was simulating the real inviroment to evaluation my system on the training corpoa.

2004年8月13日

Entity Class Classification

I began my additional EDR CO-Reference Task Evaluation plan with the Entity Class Classification.

ACE would not provide the corresponding entity class. This was additional classification task for me. There were three schemes for me to finish it.

The First was Adopting the modules of EMD. As the EMD output content included the entity class. So this was the comman directly scheme. However, after a consideration, I found out the bugs in this scheme. The EDM's output was including not only entity class attributes, but also other more information. According the algorithmic approach, if the other information was provided and we only needed classify the entity class, we could use some machine learning algorithm based on the provided entities and eitity mentions features. I used some methods to classify the entity class rather than using EMD modules.

The second was using SPC as a default vlaue for entity class attributes. Why? There was a statistical rule that SPC is 89.45% of the all entity class attributed in the whole training corpoa. And using one third corpus for testing, under the scoring plan of ACE entity class, 8798.5 could be got.

The last one was using decison trees algorithmic approach. Discussing with Xiantao Liao, she said I could make some manual rules for the classification of entity class attributes. There were lots of feature values I could use. And decision tree was the best choice for generating the best rules on the training corpoa. I tested this scheme. The classification accuracy was 90.3%. And using one third corpus for testing, under the scoring plan of ACE entity class, 8942.75 could be got.

Comparing the three above schemes, I decided to using decision trees as the final scheme.

The next problem was the whole system evaluation by myself.

2004年8月12日

My problem of summarization evaluation

I have studied one of the three papers about summarization evaluation of ACL 2004. The guidline was as follows:

Title: Automatic Evaluation of Summaries Using Document Graphs
Author: Eugene Santos Jr., Ahmed A.Mohamed, and Qunhua Zhao
Organization: Computer Science and Engineering Department University of Connecticut 191 Auditorium Road, U-155, Storrs, CT 0269-3155 {Eugene,amohamed,qzhao}@engr.uconn.edu
Abstract: Summarization evaluation has been always a challenge to researchers in the document summarization field. Usually, human involvement is necessary to evaluate the quality of a summary. Here we present a new method for automatic evaluation of text summaries by using document graphs. Data from Document Understanding Conference 2002 (DUC-2002) has been used in the experiment. We propose measuring the similarity between two summaries or between a summary and a document based on the concept/entities and relation between them in the text.
Essentials: There is a bottleneck in text summarization research field that is how to evaluate the quality of a summary or the performance of a summarization tool.
In this paper, there is a newly view of summaries evaluation based on the document graphs(DG). In the document graphs there are only two kinds of nodes: concept/entity nodes and relation nodes. Currently, only two kinds of relations, "isa" and "related to", are captured for simplicity. Comparing the similarity between two document graphs, we could evaluate two document. There was a trend that the similarity comparison between the summary DG and the source DG is nearly same as that of the evaluation by human. So this tool could be instead of the human involvement of the evaluation.
Use for reference: The concept/entity node and the relation node were generated in the bound of the noun phrase. But in Chinese there is not any good noun phrase identifier. This is a bottleneck of this approach using in Chinese summarization.

2004年8月11日

Four vs. Six of my works

This morning, I had another task of Automatic Text Summarization Evaluation. And Dr.Tliu gave me the suggestion that I could divide my works into two parts: four to Automatic Text Summarization Evaluation and six to ACE EDR Co-Reference Evaluation.

ACL 2004 finished just now. There was a workshop about the Text Summarization. I found that there were three papers about the evaluation about summarization. They were: Automatic Evaluation of Summaries Using Document Graphs, Rouge: A Package for Automatic Evaluation of Summaries, and Evaluation Measures Considering Sentence Concatenation for Automatic Summarization by Sentence or Word Extraction. In the preface of this workshop there were some words exciting me. Automatic Evaluation of Summaries played a more and more important role in the Text summarization research area. That is to say that it was the bottleneck of the research in Text Summarizarion inhancement to advance steps.

So good a research area.

2004年8月10日

The final evaluation of ACE EDR

According to the original plan we got the evaluation corpus 9 o'clock this morning. But we did not run the processing programs for getting the final documents. As there were some little problems in the EMD task processing program. We tested our systems on the training corpus.

At 20:00 the EMD program came to its best state. After about one hour all the final documents of EMD, EDR, and RMD had been run out. During the programs running time, we were busy in writing the systems descriptions.

At about 21:10, we submitted all the resuit files with the three system descriptions.

Then we could have our supper.

We were preparing for these evaluation tasks for more than one month. It was like my experience of the American MCM.

Good experience!! But there was one more ACE evaluation task for me: EDR Co-Reference. There were some doubts about it. I should write an email for interpretation.

2004年8月9日

Keep on the improve our systems

No other things could disturb us. We were all concentrate on the precessing programs' improvement.

2004年8月8日

Prepare the document for ACE CR

It's time to prepare for the document for ACE CR.

2004年8月7日

Link with Entity Mention Dection

My co-reference resolution was a sub task of the EDR evaluation. The deadline of the ACE EDR was nearly to the end. So I had to test the EMD output for my system. And the newly evaluation result displayed that the the entity scores had little enhancement. However that was the result of three papers evaluation.

Tomorrow I could finish the more detail evaluation.

2004年8月6日

Comparing the evaluation result with different train scale

Yes, I have got the final evaluation result by the perl language program ace04-eval-v09.pl. When I used three fourth train corpus for training and one fourth for testing, the final evaluation result was as follows:
Unweighted-F=70.4% Value=49.6% Valued-based-F=50.5%

The unweighted F socre was nearly to 74.17% that was my best original result under my evaluation method. But the ACE evaluation main item value was little lower.

Last night I found we should submit some description about our algorithmic approach and comparing with some primiary system. And I wanted to konw whether there is a rule that the final value would be better and better with the train corpus larger and larger. I planned to do about seven experiments with the training scale at from 10% to 70%.

Just now I had finished four of them . Each experiment would be spent about one hour. And there was an astonishing rule that the main three evaluation scores were floating a little.

This rule was only reflected by the comaring evaluation results. However, I would make deep analysis.

2004年8月5日

Playing Table Tennis

I have nearly not played table tennis for two months. This evening , when I was tired with the ACE evaluation result file format, Shiqi called us together to play table tennis. After changing cloths, I went to the baseroom of 12th dorm.

I played with Yongguang Huang. We all liked to smash and top-spin the ball. And this style was tiring.

We played about one hour.

2004年8月4日

GetFinalDocument

The last program of the ACE CR tesk was finished just now. The main flow of it was composed by four parts: GetDecisionInformation, FormatAPF, Clusteri, and GetFinalDocument.

I had just finished the last part and gotten the final document. But there were some errors in it. It was too late. So I would debug all errors tomorrow.

2004年8月3日

Three Programs

I need three programs to solve my last task of ACE CR. The were the result after my deep consideration and abstract.

The first program was named as EMDProcess. It prepared all materials for c4.5.
The second one was named as ExtractDecisionResult.
The last one was named as GetFinalDocument.

Just now I had finish the first two. Tomorrow I could finish the last one.

Good news! Keep on!!

2004年8月2日

ACE CR next plan

There were some problems to me on ACE CR. However the detail plan was in my heart now.

In order to finish ACE EDR evaluation task with Xiantao Liao, I could
reduce some tagged corpus into the EMD output format,
suppose that they are the final result of Xiantao Liao's programs,
construct the final train samples with the hypothesis that the samples were all positive,
use the final trainned decision tree to extract the decision result,
run the trainning samples gotten programs with the decision result, if fall across positive result , do some cluster process,
output the final XML documents.

This process flow could be realized in thw days.

Hoho, another two tensional work days.

2004年8月1日

DLL interface problem

One of my recent tasks was to make a co-reference resolution dll for automatic summarization. The detailed steps were in my mind. However when I relaized and tested the dll, there was an error information.

I asked lots of people. The answer was that when you used vector in the interface of some dll modules, there would be some errors. This problem had not been solved directly so far. Some compromise idea was to use char*.

订阅：博文 (Atom)