It was time for me to write my month reports. Although there were two days left for us to submit, I'd like to write it today. As this was the end day of this year. Why not finish this year's work in this year? If I suspend to tomorrow or the day after tomorrow, I would keep bad habit.
I had done the month reports easy. I made it based on my blog and my regular working documents. This time I introduced the papers management experience first. And then wrote the papers reading outline of seven papers. I made a detail plan of next month later.
Right now, I knew the advantage of writing blog. Why not keep on?
2004年12月31日
Dicsourse-new detectors for definite description resolution: a survey and a preliminary proposal
Title:Dicsourse-new detectors for definite description resolution: a survey and a preliminary proposal针对确定性描述的新描述识别:综述和初步建议
Author:Massimo Poesio; Olga Uryupina; Renata Vieira
Author organization:
Massimo Poesio, University of Essex,Computer Science and Cognitive Science(UK)
Olga Uryupina, Universitat des Saarlandes, Computerlinguistik (Germany)
Renata Vieira, Unisinos, Computacao Aplicada (Brazil)
Conference: Proceedings of the Workshop on Reference Resolution and its Applications. ACL2004
Summary:
English:
Vieira and Poesio (2000) proposed an algorithm for definite description (DD) resolution that incorporates a number of heuristics for detecting discoursenew descriptions. The inclusion of such detectors was motivated by the observation that more than 50% of definite descriptions (DDs) in an average corpus are discourse new (Poesio and Vieira, 1998), but whereas the inclusion of detectors for non-anaphoric pronouns in algorithms such as Lappin and Leass’ (1994) leads to clear improvements in precision, the improvements in anaphoric DD resolution (as opposed to classification) brought about by the detectors were rather small. In fact, Ng and Cardie (2002a) challenged the motivation for the inclusion of such detectors, reporting no improvements, or even worse performance. We re-examine the literature on the topic in detail, and propose a revised algorithm, taking advantage of the improved discourse-new detection techniques developed by Uryupina (2003).
中文:
Vieira and Poesio (2000)提出了一种确定描述(definite description, DD)消解算法。算法中采用了一系列的启发式规则来检测上文中未出现过的话语描述。这种研究的动机在于观察一些平衡语料发现超过50%的话语描述都是当前上文中未出现过的话语描述。但是非指代性代词的消解算法如Lappin and Leass(1994)在精确率上有所提高,而在指代性代词的消解算法由于不是分类问题而导致提高非常少。事实上,Ng and Cardie (2002a)在这类消解问题上进行过开创性研究,结果是没有提高甚至性能有所下降。我们重新详细调研了这个研究点,并在Uryupina (2003)提高未登录描述消解技巧的基础上提出了一种修改性的算法。
为什么要做这个题目:
Poesio and Vieira(1998)在语料上进行研究发现众多语料,诸如Penn Treebank等,有52%的确定性描述(definite description, DD)是当前上文中未出现过的。
别人怎么做的:
Vieira and Poesio (2000)提出了一种确定描述(definite description, DD)消解算法。算法中采用了一系列的启发式规则来检测当前上文中未出现过的话语描述。但是包含非指代性代词识别的消解算法如Lappin and Leass(1994)在精确率上有所提高,而在包含指代性代词识别的消解算法由于不是分类问题而导致提高非常少。事实上,Ng and Cardie (2002a)在这类消解问题上进行过开创性研究,结果是没有提高甚至性能有所下降。
众多相关研究(Poesio and Vieira,1998;Bean and Riloff,1999; Ng and Cardie, 2002a; Uryupina,2003)都一致认为在DDs的DN识别中许多因素都起着重要作用。绝大多数算法采用混合算法来识别肯定性的DDs,识别DN的专有名称,识别功能性的DDs,识别被修饰的DDs来确定关系。
这些研究一致认为DN识别不能和指代消解分开实施。
问题在哪里:
这些机器学习方法的一个问题是这些系统在菜地和DD消解上都没有达到很好的效果,对比一些特殊的算法如下:Ng and Cardie的最好的一个程序版本在各种指代表达式上的F=65.8,但是在DD上的F=29.6(Vieira and Poesio的最好的结果为F=77),代词上的F=28.2(Tetreault, 2001的代词消解算法评测达到F=80)。很明显,这些算法的对比只能在同一个数据集上。正如Mitkov2000讨论的那样,在指代消解的评测中预处理和后处理对指代消解算法的效果具有很大的影响。但是我们认为在效果很好的系统上进行DN识别的评测可以更好的达到我们预期的结果。
作者提出了怎样的新方法:
本文的工作首先是对比了DN识别前后各种算法的代消解效果。采用Vieira and Poesio的算法来检测不包含DN识别的指代消解算法效果,在采用Uryupina的特征集的基础上加上一个简单的统计消解模型来检测加上DN识别的消解效果,数据集都采用Cardie and Ng提到的MUC-7中的数据。
实验结果如下:
----------------------------------
--------| R | P | F |
----------------------------------
Pronouns| 65.5 | 63.0 | 64.2 |
----------------------------------
DDs | 56.7 | 56.1 | 56.4 |
----------------------------------
Table 7.Evaluation of the GUITAR system without DN detector off raw text
----------------------------------------------
--------------------| R | P | F |
----------------------------------------------
Without DN detection| 44.7 | 54.9 | 49.3 |
----------------------------------------------
With DN detection | 41.7 | 80.0 | 54.6 |
----------------------------------------------
Table 8.Using an oracle
这个实验说明了DN识别可以提高精确率30%左右。但是还不能说明它在高性能的指代消解系统上的提高。
本文工作中又提出了一个新的DN识别的特征集,评测数据也是在MUC-7上进行。但是实验没有进行完,现在还没有实验数据和分析。
对个人的研究的指导意义
可以借助这种方法来更好的完成ACE的指代消解的算法和评测。
存在问题和个人想到的改进方案
暂时无
Author:Massimo Poesio; Olga Uryupina; Renata Vieira
Author organization:
Massimo Poesio, University of Essex,Computer Science and Cognitive Science(UK)
Olga Uryupina, Universitat des Saarlandes, Computerlinguistik (Germany)
Renata Vieira, Unisinos, Computacao Aplicada (Brazil)
Conference: Proceedings of the Workshop on Reference Resolution and its Applications. ACL2004
Summary:
English:
Vieira and Poesio (2000) proposed an algorithm for definite description (DD) resolution that incorporates a number of heuristics for detecting discoursenew descriptions. The inclusion of such detectors was motivated by the observation that more than 50% of definite descriptions (DDs) in an average corpus are discourse new (Poesio and Vieira, 1998), but whereas the inclusion of detectors for non-anaphoric pronouns in algorithms such as Lappin and Leass’ (1994) leads to clear improvements in precision, the improvements in anaphoric DD resolution (as opposed to classification) brought about by the detectors were rather small. In fact, Ng and Cardie (2002a) challenged the motivation for the inclusion of such detectors, reporting no improvements, or even worse performance. We re-examine the literature on the topic in detail, and propose a revised algorithm, taking advantage of the improved discourse-new detection techniques developed by Uryupina (2003).
中文:
Vieira and Poesio (2000)提出了一种确定描述(definite description, DD)消解算法。算法中采用了一系列的启发式规则来检测上文中未出现过的话语描述。这种研究的动机在于观察一些平衡语料发现超过50%的话语描述都是当前上文中未出现过的话语描述。但是非指代性代词的消解算法如Lappin and Leass(1994)在精确率上有所提高,而在指代性代词的消解算法由于不是分类问题而导致提高非常少。事实上,Ng and Cardie (2002a)在这类消解问题上进行过开创性研究,结果是没有提高甚至性能有所下降。我们重新详细调研了这个研究点,并在Uryupina (2003)提高未登录描述消解技巧的基础上提出了一种修改性的算法。
为什么要做这个题目:
Poesio and Vieira(1998)在语料上进行研究发现众多语料,诸如Penn Treebank等,有52%的确定性描述(definite description, DD)是当前上文中未出现过的。
别人怎么做的:
Vieira and Poesio (2000)提出了一种确定描述(definite description, DD)消解算法。算法中采用了一系列的启发式规则来检测当前上文中未出现过的话语描述。但是包含非指代性代词识别的消解算法如Lappin and Leass(1994)在精确率上有所提高,而在包含指代性代词识别的消解算法由于不是分类问题而导致提高非常少。事实上,Ng and Cardie (2002a)在这类消解问题上进行过开创性研究,结果是没有提高甚至性能有所下降。
众多相关研究(Poesio and Vieira,1998;Bean and Riloff,1999; Ng and Cardie, 2002a; Uryupina,2003)都一致认为在DDs的DN识别中许多因素都起着重要作用。绝大多数算法采用混合算法来识别肯定性的DDs,识别DN的专有名称,识别功能性的DDs,识别被修饰的DDs来确定关系。
这些研究一致认为DN识别不能和指代消解分开实施。
问题在哪里:
这些机器学习方法的一个问题是这些系统在菜地和DD消解上都没有达到很好的效果,对比一些特殊的算法如下:Ng and Cardie的最好的一个程序版本在各种指代表达式上的F=65.8,但是在DD上的F=29.6(Vieira and Poesio的最好的结果为F=77),代词上的F=28.2(Tetreault, 2001的代词消解算法评测达到F=80)。很明显,这些算法的对比只能在同一个数据集上。正如Mitkov2000讨论的那样,在指代消解的评测中预处理和后处理对指代消解算法的效果具有很大的影响。但是我们认为在效果很好的系统上进行DN识别的评测可以更好的达到我们预期的结果。
作者提出了怎样的新方法:
本文的工作首先是对比了DN识别前后各种算法的代消解效果。采用Vieira and Poesio的算法来检测不包含DN识别的指代消解算法效果,在采用Uryupina的特征集的基础上加上一个简单的统计消解模型来检测加上DN识别的消解效果,数据集都采用Cardie and Ng提到的MUC-7中的数据。
实验结果如下:
----------------------------------
--------| R | P | F |
----------------------------------
Pronouns| 65.5 | 63.0 | 64.2 |
----------------------------------
DDs | 56.7 | 56.1 | 56.4 |
----------------------------------
Table 7.Evaluation of the GUITAR system without DN detector off raw text
----------------------------------------------
--------------------| R | P | F |
----------------------------------------------
Without DN detection| 44.7 | 54.9 | 49.3 |
----------------------------------------------
With DN detection | 41.7 | 80.0 | 54.6 |
----------------------------------------------
Table 8.Using an oracle
这个实验说明了DN识别可以提高精确率30%左右。但是还不能说明它在高性能的指代消解系统上的提高。
本文工作中又提出了一个新的DN识别的特征集,评测数据也是在MUC-7上进行。但是实验没有进行完,现在还没有实验数据和分析。
对个人的研究的指导意义
可以借助这种方法来更好的完成ACE的指代消解的算法和评测。
存在问题和个人想到的改进方案
暂时无
2004年12月30日
Happy New Year's Day!
There were only two days left to New Year's Day.
From 15:00, all the members of our laboratory celebrated in the Qianshoufo Hotel. There were so many entertainment items for us, such as swim, skate, billiards, table tennis, bowls, shuffleboard. We all enjoied ourself.
After the entertainment time, we all began to have the nice dinner. There were twenty-four persons in the dining room. We were divided into three tables. Our table members included Dr.Tliu, Hong Yu, Yu Haibin, Hu Xiaoguang, Zhao Yongzhen, me, Huang Yongguang, Gao Liqi, Chen Yihen. During the dinner time, each of our table members had sung a song at least. We all named ourself as All Star Table.
Yesterday, my neighbor Mr. Luo Junping, who went to harbin for some business, brought me some sausage. The sausage was prepared by my parents. That was my favourite. I ask the kitchen to cook it and then divided into three dishes. All the members of our table were like it. Me, too. I had a lot of it. So nice dish.
Abid was invited to sing a song. Although we all could not understand what he sang, we all thanked him.
After the dinner, we had a photo of the whole laboratory. We all happied today!
Thanks to Dr.Tliu, thanks to our laboratory.
Happy new year to every one!
From 15:00, all the members of our laboratory celebrated in the Qianshoufo Hotel. There were so many entertainment items for us, such as swim, skate, billiards, table tennis, bowls, shuffleboard. We all enjoied ourself.
After the entertainment time, we all began to have the nice dinner. There were twenty-four persons in the dining room. We were divided into three tables. Our table members included Dr.Tliu, Hong Yu, Yu Haibin, Hu Xiaoguang, Zhao Yongzhen, me, Huang Yongguang, Gao Liqi, Chen Yihen. During the dinner time, each of our table members had sung a song at least. We all named ourself as All Star Table.
Yesterday, my neighbor Mr. Luo Junping, who went to harbin for some business, brought me some sausage. The sausage was prepared by my parents. That was my favourite. I ask the kitchen to cook it and then divided into three dishes. All the members of our table were like it. Me, too. I had a lot of it. So nice dish.
Abid was invited to sing a song. Although we all could not understand what he sang, we all thanked him.
After the dinner, we had a photo of the whole laboratory. We all happied today!
Thanks to Dr.Tliu, thanks to our laboratory.
Happy new year to every one!
2004年12月29日
Using word similarity lists for resolving indirect anaphora
论文题目:Using word similarity lists for resolving indirect anaphora采用词语相似度列表来消解间接指代
论文出处:ACL2004 workshop on coreference resolution
发表时间:2004, July 25-26
论文作者: Caroline Gasperin and Renata Viera
作者单位: PIPCA-Unisinos Sao Leopoldo, Brazil
Summary:
English:
In this work we test the use of word similarity lists for anaphora resolution in Portuguese corpora. We applied an automatic lexical acquisition technique over parsed texts to identify semantically similar words. After that, we made use of this lexical knowledge to resolve coreferent definite descriptions where the head-noun of the anaphora is different from the head-noun of its antecent, which we call indirect anaphora.
中文:
本文中我们采用词语相似度列表在葡萄牙语语料上进行指代消解。我们采用了一种自动获取词汇的技术在句法分析后的文本上识别语义相似的词语。然后利用词汇知识来消解确定修饰的间接指代。这里的间接指代是指先行词和指代词的核型名词不同的情况。
为什么要做这个题目
简介指代消解主要解决先行词和指代词的核心词不一样,但是语义比较相关的短语也需要进行消解。
别人怎么做的
以前在英文上有人用过词汇相似度列表来进行指代消解(Poesio etal., 2002; Schulte im Walde, 1997; Bunescu, 2003)。
本文主要是在葡萄牙语上的指代消解,而且论文中介绍的词语相似度列表的方法需要查阅许多其他的论文。不符合现在的调研方向。先阅读到此。以后有机会再阅读。
论文出处:ACL2004 workshop on coreference resolution
发表时间:2004, July 25-26
论文作者: Caroline Gasperin and Renata Viera
作者单位: PIPCA-Unisinos Sao Leopoldo, Brazil
Summary:
English:
In this work we test the use of word similarity lists for anaphora resolution in Portuguese corpora. We applied an automatic lexical acquisition technique over parsed texts to identify semantically similar words. After that, we made use of this lexical knowledge to resolve coreferent definite descriptions where the head-noun of the anaphora is different from the head-noun of its antecent, which we call indirect anaphora.
中文:
本文中我们采用词语相似度列表在葡萄牙语语料上进行指代消解。我们采用了一种自动获取词汇的技术在句法分析后的文本上识别语义相似的词语。然后利用词汇知识来消解确定修饰的间接指代。这里的间接指代是指先行词和指代词的核型名词不同的情况。
为什么要做这个题目
简介指代消解主要解决先行词和指代词的核心词不一样,但是语义比较相关的短语也需要进行消解。
别人怎么做的
以前在英文上有人用过词汇相似度列表来进行指代消解(Poesio etal., 2002; Schulte im Walde, 1997; Bunescu, 2003)。
本文主要是在葡萄牙语上的指代消解,而且论文中介绍的词语相似度列表的方法需要查阅许多其他的论文。不符合现在的调研方向。先阅读到此。以后有机会再阅读。
Learn Prolog, Now!
When I was finding some information on machine learning, I clicked into a web which introduced the natural language processing. Iin that page there were some links to the prolog syntax analysis programs.
I had seen many useful prolog programs that could solve the natural language processing problem. So, I'd like to know something about this programming language.
Prolog, shorted by Program in Logic, can induce lots of results from the knowledge base.
After finding some useful information, I finded a nice book named as Learn Prolog Now! The useful software of prolog programming is SWI-prolog.
I had seen many useful prolog programs that could solve the natural language processing problem. So, I'd like to know something about this programming language.
Prolog, shorted by Program in Logic, can induce lots of results from the knowledge base.
After finding some useful information, I finded a nice book named as Learn Prolog Now! The useful software of prolog programming is SWI-prolog.
2004年12月28日
创造性的研究
晚上实验室Reading group讨论会上闲暇时和师姐讨论中国博士和外国博士的创造性研究的问题。师姐认为现在咱们的创造性的研究太少了,美国那边的许多博士的论文本生就奠定了将来的成就。我补充道,据报道中国现在的博士数量已经超过了美国,但是科研实力上还是远远落后于美国。为什么呢?我想大概是因为要求不一样吧。
现在有人说博士非常难念,但是看看现在国内的博士们的研究内容,出了个别情况外,很多都是水平较为低下的。
为什么会出现这种情况呢?
我猜,呵呵,也只能是猜了,是因为科研的过程中存在一些问题:首先的调研工作进行的不是很彻底,以致于一些工作和别人的重复了也不知道;其次是吃苦精神不够,读博有人说是一种人生的历练,需要非常的吃苦才行;开放性不够,博士的研究应该是非常开放的,但是许多的研究,随着程度的深入慢慢的变得越来越窄,研究越多就越不愿意去吸纳其他学科的研究成果和思想。
面对自己的研究,我能做些什么呢?现在我的状态是调研共指消解的研究现状,调研的工作还不太够。首先解决调研问题吧。写出一个像样的综述才能开始下一步的工作。
现在有人说博士非常难念,但是看看现在国内的博士们的研究内容,出了个别情况外,很多都是水平较为低下的。
为什么会出现这种情况呢?
我猜,呵呵,也只能是猜了,是因为科研的过程中存在一些问题:首先的调研工作进行的不是很彻底,以致于一些工作和别人的重复了也不知道;其次是吃苦精神不够,读博有人说是一种人生的历练,需要非常的吃苦才行;开放性不够,博士的研究应该是非常开放的,但是许多的研究,随着程度的深入慢慢的变得越来越窄,研究越多就越不愿意去吸纳其他学科的研究成果和思想。
面对自己的研究,我能做些什么呢?现在我的状态是调研共指消解的研究现状,调研的工作还不太够。首先解决调研问题吧。写出一个像样的综述才能开始下一步的工作。
2004年12月27日
Applying Coreference to Improve Name Recognition
论文题目: Applying Coreference to Improve Name Recognition利用共指来提高名称识别
论文出处: ACL2004 workshop on Coreference resolution
发表时间:2004年7月25日
论文作者:Heng JI and Ralph GRISHMAN
作者单位:纽约大学计算机系
摘要:
English:
We present a novel method of applying the results of coreference resolution to improve Name Recognition for Chinese. We consider first some methods for gauging the confidence of individual tags assigned by a statistical name tagger. For names with low confidence, we show how these names can be filtered using coreference features to improve accuracy. In addition, we present rules which use coreference information to correct some name tagging errors. Finally, we show how these gains can be magnified by clustering documents and using cross-document coreference in these cluters. These combined methods yield an absolute improvement of about 3.1% in tagger F score.
中文:
我们提出了一种新颖的利用共指消解结果来提高中文名称识别的方法。首先,我们采用一个统计名称识别器来进行标注,然后采用一些方法来计算每个单独标记的可信度。对于可信度较低的名称,我们展示了如何利用共指特征来过滤并提高名称识别的准确率。然后,我们提出了一些规则来利用共指信息校正名称识别的错误。最后,我们利用文本聚类和跨文档共指消解来考察名称识别效果的提高。这种结合在标注F值上产生了3.1%的提高。
为什么要做这个题目:为了提高名称识别的准确率
别人怎么做的:
在做NE识别方面,有人利用HMM(Bikel et al, 1997),最大熵(Borthwick et al, 1998, Chieu and Ng 2002),决策树(Sekine at al, 1998),条件随机域(McCallum and Li, 2003),基于类别的语言模型(Sun et al. 2002),基于Agent的方法(Ye et al, 2002)和支持向量机。
问题在哪里
这些机器学习方法应用在名称识别上时效果的好坏都会依赖于标注语料库的大小和使用特征的范围。更为特殊的是,多数方法都是采用规模很小的上下文信息,例如当前词的前一个或两个词语以及跟随的名称。如果测试时遇到一个没有见过的词语,并且出现在一个信息量很少的上下文,那么就会很难进行识别。
作者提出了怎样的新方法
作者利用共指消解的结果来挖掘全局信息从而增加识别的准确率。在一篇文档中进行共指消解可以很好的用于名称识别,在多篇文档中的共指消解可以更好的用于名称识别。本文中采用的共指消解算法是基于一些启发式规则的方法,规则如下
-----------------------------------------------
Rule Type Rule Description
Name & Name
----All
--------Ident(i, j) Mentioni and Mentionj are identical
--------Abbrev(i, j) Mentioni is an abbreviation of Mentionj
--------Modifier(i, j) Mentionj = Modifier + “de” + Mentioni
--------Formal(i, j) Formal and informal ways of referring to the same entity(Ex. “美国国防部 / American Defense Dept. & 五角大楼/ Pentagon”)
----PER
--------Substring(i, j) Mentioni is a substring of Mentionj
--------Title(i, j) Mentionj = Mentioni + title word; or Mentionj = LastName + title word
----ORG
--------Head(i, j) Mentioni and Mentionj have the same head
----GPE
--------Head(i, j) Mentioni and Mentionj have the same head
--------Capital(i, j) Mentioni: country name;Mentionj: name of the capital of this country Applied in restricted context.
--------Country(i, j) Mentioni and Mentionj are different names referring to the same country.(Ex. “中国 / China & 华夏 / Huaxia & 共和国 / Republic”)
Name & Nominal
----All
--------RSub(i, j) Namei is a right substring of Nominalj
--------Apposition(i, j) Nominalj is the apposite of Namei
--------Modifier2(i, j) Nominalj = Determiner/Modifier + Namei/ head
----GPE
--------Ref(i, j) Nominalj = Namei + GPE Ref Word (examples of GPE Ref Word: “方面 / Side”, “政府/Government”, “共和国 / Republic”, “自治政府/ Municipality”)
Nominal & Nominal
----All
--------IdentN(i, j) Nominali and Nominalj are identical
--------Modifier3(i, j) Nominalj = Determiner/Modifier + Nominali
-----------------------------------------------
这些规则不包含代词的消解问题。
采用MUC的评测机制,在人工标注生成的mentions语料上评测的结果是R=82.7%,p=95.1%,F=88.47%
在机器自动生成mentions上进行的评测结果是R=74。3%,p=84.5%,F=79.07%
这种方法从理论上分析有何长处
对于名称识别:增加了特征的来源。并且很好的利用了全局信息。
对于共指消解:简单、快速。
还存在哪些问题
正如规则中的说明一样,很多规则是很难用机器自动进行的,比如Formal(i,j)。这些模块的实现还需要一些技术。
个人想到的改进方案或者个人的创新观点
规则加上一些统计的方法来完成任务。
论文出处: ACL2004 workshop on Coreference resolution
发表时间:2004年7月25日
论文作者:Heng JI and Ralph GRISHMAN
作者单位:纽约大学计算机系
摘要:
English:
We present a novel method of applying the results of coreference resolution to improve Name Recognition for Chinese. We consider first some methods for gauging the confidence of individual tags assigned by a statistical name tagger. For names with low confidence, we show how these names can be filtered using coreference features to improve accuracy. In addition, we present rules which use coreference information to correct some name tagging errors. Finally, we show how these gains can be magnified by clustering documents and using cross-document coreference in these cluters. These combined methods yield an absolute improvement of about 3.1% in tagger F score.
中文:
我们提出了一种新颖的利用共指消解结果来提高中文名称识别的方法。首先,我们采用一个统计名称识别器来进行标注,然后采用一些方法来计算每个单独标记的可信度。对于可信度较低的名称,我们展示了如何利用共指特征来过滤并提高名称识别的准确率。然后,我们提出了一些规则来利用共指信息校正名称识别的错误。最后,我们利用文本聚类和跨文档共指消解来考察名称识别效果的提高。这种结合在标注F值上产生了3.1%的提高。
为什么要做这个题目:为了提高名称识别的准确率
别人怎么做的:
在做NE识别方面,有人利用HMM(Bikel et al, 1997),最大熵(Borthwick et al, 1998, Chieu and Ng 2002),决策树(Sekine at al, 1998),条件随机域(McCallum and Li, 2003),基于类别的语言模型(Sun et al. 2002),基于Agent的方法(Ye et al, 2002)和支持向量机。
问题在哪里
这些机器学习方法应用在名称识别上时效果的好坏都会依赖于标注语料库的大小和使用特征的范围。更为特殊的是,多数方法都是采用规模很小的上下文信息,例如当前词的前一个或两个词语以及跟随的名称。如果测试时遇到一个没有见过的词语,并且出现在一个信息量很少的上下文,那么就会很难进行识别。
作者提出了怎样的新方法
作者利用共指消解的结果来挖掘全局信息从而增加识别的准确率。在一篇文档中进行共指消解可以很好的用于名称识别,在多篇文档中的共指消解可以更好的用于名称识别。本文中采用的共指消解算法是基于一些启发式规则的方法,规则如下
-----------------------------------------------
Rule Type Rule Description
Name & Name
----All
--------Ident(i, j) Mentioni and Mentionj are identical
--------Abbrev(i, j) Mentioni is an abbreviation of Mentionj
--------Modifier(i, j) Mentionj = Modifier + “de” + Mentioni
--------Formal(i, j) Formal and informal ways of referring to the same entity(Ex. “美国国防部 / American Defense Dept. & 五角大楼/ Pentagon”)
----PER
--------Substring(i, j) Mentioni is a substring of Mentionj
--------Title(i, j) Mentionj = Mentioni + title word; or Mentionj = LastName + title word
----ORG
--------Head(i, j) Mentioni and Mentionj have the same head
----GPE
--------Head(i, j) Mentioni and Mentionj have the same head
--------Capital(i, j) Mentioni: country name;Mentionj: name of the capital of this country Applied in restricted context.
--------Country(i, j) Mentioni and Mentionj are different names referring to the same country.(Ex. “中国 / China & 华夏 / Huaxia & 共和国 / Republic”)
Name & Nominal
----All
--------RSub(i, j) Namei is a right substring of Nominalj
--------Apposition(i, j) Nominalj is the apposite of Namei
--------Modifier2(i, j) Nominalj = Determiner/Modifier + Namei/ head
----GPE
--------Ref(i, j) Nominalj = Namei + GPE Ref Word (examples of GPE Ref Word: “方面 / Side”, “政府/Government”, “共和国 / Republic”, “自治政府/ Municipality”)
Nominal & Nominal
----All
--------IdentN(i, j) Nominali and Nominalj are identical
--------Modifier3(i, j) Nominalj = Determiner/Modifier + Nominali
-----------------------------------------------
这些规则不包含代词的消解问题。
采用MUC的评测机制,在人工标注生成的mentions语料上评测的结果是R=82.7%,p=95.1%,F=88.47%
在机器自动生成mentions上进行的评测结果是R=74。3%,p=84.5%,F=79.07%
这种方法从理论上分析有何长处
对于名称识别:增加了特征的来源。并且很好的利用了全局信息。
对于共指消解:简单、快速。
还存在哪些问题
正如规则中的说明一样,很多规则是很难用机器自动进行的,比如Formal(i,j)。这些模块的实现还需要一些技术。
个人想到的改进方案或者个人的创新观点
规则加上一些统计的方法来完成任务。
2004年12月26日
Remember the advice
Remember the advice:
you can lose your money, you can spent all of it, and if you work hard you get it all back. But if you waste your time, you're never gonna get it back.
It were classical words in Without.a.Paddle.2004.
I believed it was true.
you can lose your money, you can spent all of it, and if you work hard you get it all back. But if you waste your time, you're never gonna get it back.
It were classical words in Without.a.Paddle.2004.
I believed it was true.
2004年12月25日
圣诞祝语
今年圣诞收到的祝福非常多,在网上甚至出现了这样一个帖子:圣诞祝语100句,总有一句适合你。非常有趣。列举如下。祝我的朋友们永远快乐、幸福!
圣诞祝语100句,总有一句适合你
01 以往的圣诞都是灰色的,今年有了你,一切都变得不同,我的世界一下子变得豁然开朗多姿多彩,我衷心地谢谢您。
02 我要把一切喜讯变成奶油,所有祝福柔成巧克力,所有快乐做成蛋糕答谢你,然后说声圣诞快乐!
03 我默默祈祷愿圣诞老人能在即将到来的圣诞之夜送我一个与我牵手同伴共同度过这奇妙的圣诞夜,结果他将你送给我。
04 考虑到24小时之内将会有铺天盖地的祝福短信堵塞网络,一向有远见聪明的我提前恭祝圣诞快乐、新年快乐!
05 如果每年的今夜有一个很肥的老人从窗口跳进来抓住你,把你装进袋子里,你不用担心,因为我想要的圣诞礼物就是你。
06 也许岁月将往事退色,或许空间将彼此隔离,但知道珍惜的依然是真心的友谊将再次对你说声圣诞快乐!
07 圣诞老人说所谓幸福是一个有健康的身体,有深爱你的人,一帮可依赖的朋友,当你收到此信息时,一切随之拥有。
08 送你一颗聚满礼物的圣诞树,顶上最大最亮的那颗星是我的真心,下面挂的是我的痴心,制造材料的是我一颗不变有心:圣诞快乐!
09 这是我发给你的三天后的信息,别偷看哦,叫你别看,还看,祝你圣诞快乐!
10 在这洋人的节日里,好想和你在一起,享受这醉人的气氛,然而你我分割两地,我只好在这轻声地对你说:“亲爱的,圣诞快乐!”
11 想念你的笑,想念你的外套,想念你白色袜子,装满圣诞的礼物。
12 圣诞节到了也,你有没有在床头挂起臭袜子哦,圣诞老公公会把我最好的礼物丢进去的,圣诞快乐!
13 圣诞老人说,今年他要把礼物装在我们两的袜子里,所以平安夜你一定要陪在我身边。
14 Merry Christmas and best wishes for happy new year!
15 在这迷人的圣诞,你躲在家里生蛋蛋,生了一堆恐龙蛋,还有一只小鸡蛋,猪,圣诞快乐!
16 快乐圣诞,什么是圣诞快乐?不是那快乐的阳光,也不是鸟儿的啁啾,那是愉快的念头和幸福的笑容,是温馨慈爱的问候。
17 这些天来一直有个问题困惑着我,你明明不是鸡,为什么人人都要祝你圣诞快乐呢?
18 知道圣诞节是谁的节日吗?不知道,是你的节日嘛,是圣诞节啊!笨蛋。
19 HI,你怎么还在这啊,你知道你的重要性吗?没了你,谁拉着圣诞老公公去给大家送礼物啊,圣诞快乐!
20 心到,想到,看到,闻到,听到,人到,手到,脚到,说到,做到,得到,时间到,你的礼物没到,只有
我的祝福传到。
21 因为你的存在,这一天是有更特别的意义,因为可以和你一起相约在树下许下一个共同的心愿,让我们相爱一生吧。
22 如果你是圣诞,我是元旦,你是圣诞老人,我是驯鹿道夫,你是圣诞老婆婆,我是圣诞老公公,祝你圣诞快乐!
23 平安夜请给我与你共度的机会,小小的要求能满足我吗?
24 在这美好的日子,没有最美的词句,没有多情的言语,没有精美的礼品,有的只是朋友深深的祝福,圣诞
快乐!
25 在这24号的晚上,煮两个鸡蛋,我吃一个,送给你的就是一个圣诞,祝你节日快乐!
26 圣诞树上耀眼的彩灯,那是我祈祷你平安一生,圣诞夜里优扬的钟声,那是我祝福你快乐一生。
27 圣诞前夜的晚上,我想和你一起走入教堂,好不好?
28 喜欢你是很久远的事了,真的好想在这个圣诞之夜与你共跳华尔兹,伏在你的身边轻轻地说,我好喜欢你。
29 亲爱的,尽管我不能陪你度过我们的第一个圣诞节,但是我还要送给你我深深的祝福,愿你明天更美丽。
30 白雪飘飘,鹿铃霄霄,甜蜜的平安夜又来到,小手摆摆,舞姿曼曼,快乐的圣诞节日多美好。
31 在这个特别的日子里,我想跟你说一声:“圣诞快乐!”
32 各位圣诞老人,圣诞快乐吗?不快乐就多寄一些礼物给我吧,我知道你们都是购物狂,一个个购完物就再送点,心里才觉得爽。
33 Merry Christmas 愿世界充满祥和,我以最真诚的心祝福你拥有幸福的一年,愿主保佑你,阿门。
34 圣诞节快乐!看短信的快把礼物送来,不然你这个圣诞夜会坐立不安咯,听到没有,别笑大傻瓜!
35 相识相知未相见,平安夜的朋友,平安夜我们能相聚在一起吗?
36 我向圣诞老人许了愿,我希望不管你的脚多臭,在明早当你穿起袜子时,等收到我托圣诞老人带给你的满
满的祝福,暖暖你的心和脚丫子。
37 小巫婆,圣诞节又要到了,我有祝福给你,希望你不要再笨了呆了,要可可爱爱的哦,哎呀,反正就是你
要过的比我幸福就对了哦。
38 值此圣诞到来之际,我只有一句话要告诉你,今天早饭我没吃,中饭我没吃,下班我去找你。
39 好久没有听到你的声音,好久没有人听我谈心,在雪花飞舞的日子里,真的好想你,祝你圣诞节快乐!
40 如果你现在一个人,我祝你圣诞快乐,二个人那也祝圣诞快乐,如果是一伙人,请告诉我,你们在什么地
方。
41 我想在你最高兴时说出我的心里话,浪漫的圣诞夜里机会来了,你高兴得像头小猪,生气时更像,哈哈。
42 今年圣诞不收礼,收礼只收短信息
43 圣诞老人问:“今天是什么日子啊?”小精灵说:“今天是圣诞节啊!”圣诞老人说:“哦,真糟糕,又要加班,我最恨这一天了。”
44 美酒、蜡烛、大餐,多么完美的圣诞节,唯独就缺你我的朋友,还有你的钱包。
45 亲爱的,你比圣诞树上的星星还明亮,你比驯鹿还可爱,但你把胡子剃了吗,我可不想你和圣诞老人一个模样。
46 听,圣诞老人的铃声,快去看看啊,怎么这么快就回来了,什么,倒垃圾的,别太急哦!
47 你怎么才起啊,快睁大眼睛,昨天夜里我爬上你的床,在你枕头下藏了一件很特别的礼物哦!
48 愿圣诞之光普照你的每一个日子,愿阳光鲜花洒满你的人生旅程。
49 你快乐,我快乐,大家快乐,快乐圣诞节,哦,我的圣诞礼物呢,快找找,快找找,哦,收到了吗,我带给你的是快乐。
50 圣诞快乐,并不是只在特别的日子才会想起你,但是圣诞节的时候,一定会让你收到我的祝福。
51 当钟声响起,我就是你的!别讲错了,不是婚礼上的钟声,是圣诞的钟声,而我是你献给礼物的人。
52 面对圣诞,面对身边匆匆而过的人,想起你,心中有一种感动。爱就是那种无法言抒的表达。
53 圣诞节又要到了,希望今年的圣诞节能和我爱的人一起过。想问你,你愿意当我爱的人吗?
54 HI!已经有一阵子没见到你了,不知道你现在好不好?圣诞节和新年就要到了,愿你拥有一个难忘和快乐的圣诞!希望你在新的一年要快乐的过哦!
55 平安夜我们去聚餐,圣诞夜我们去唱歌,狂欢夜我们去蹦迪。我要我们在一起!
56 你的离去我不知如何面对,你没有给我任何安慰。我的眼中有泪水,圣诞节你会回来吧?不要让我再次心碎!
57 有句话每年圣诞我都想说,可是苦于没有机会。现在我实在憋不住了,请把你留在我沙发上的袜子拿走!!
58 圣诞节真的觉得好寂寞哦!因为没有你在身边,其实我真的想你了,好想好想好想让你陪我度过这个浪漫
的平安夜。
59 孩子啊,我是圣诞老人,有一份圣诞礼物要送给你。什么,你们家没有烟窗,还是不去买了!
60 圣诞之夜祝福你,愿圣诞节的欢声笑语和欢乐气氛永远萦绕着你!
61 淡淡一点的友情很深,淡淡一点的味道很纯,淡淡一点的祝福最真,祝愿圣诞快乐!
62 请选择愿望:A:巧克力+玫瑰 B:自助餐+烛光 C:电影+零食 D:以上皆是
63 只有钟声响起,愿我的祝福化作飞翔的天使,飞向你的窗口,圣诞快乐!
64 如果你今天没有收到我的圣诞礼物,那一定是你的袜子有个大洞,快补吧。
65 为了响应环保,节省纸张,在圣诞节不要送我圣诞卡了,请直接在尽可能大的纸币上写你的祝词就行了。
66 我没法去教堂为你祈祷,也没有圣诞的歌声,更没有圣诞的礼物,只在心里祈求,希望你健康每一天。
67 昨晚我做了一个梦,圣诞老人送我的礼物是一张两人圣诞晚餐券,你愿意和我一起过我们的第一个圣诞节
吗?
68 宝贝,平安夜的晚上我将和圣诞老人一起了现在你的面前,把眼睛闭上数到三。
69 亲爱的圣诞节快乐,你知道我是谁吗?这个问题对你来说也许不重要,但对我很在意哦。
70 雪在下啊,圣诞老人正踩在外面青青的圣诞树窃笑,睡吧,宝贝,明天你将收到心爱的礼物,恭候我。
71 今年你愿意做我的圣诞老人吗,在圣诞的晚上将礼物放在我的床头。
72 在这时髦的大好日子里,我有万千祝福而无从说起,只想很老土的向你说四个字:圣诞快乐!
73 有你在的每一天都像在过圣诞节。
74 平安夜,祝福你,我的朋友,温馨平安!欢乐时我和你一道分享,不开心时我和你一起承担。
75 送你的礼物实在太重了,鹿车拉不动,只好亲自送了,记得等着我,等着我说圣诞快乐!
76 用中文说圣诞快乐,用英文说Merry Christmas,用心里话说我想要的圣诞礼物什么时候给我啊。
77 空中点点闪烁的银光环绕着缤纷的梦想,祝福你,双手合十许下的心愿,都一一实现在眼前。
78 恭贺圣诞快乐,在新的一年里有甜有蜜,有富有贵,有滋有味,有安有康。
79 钟声是我的问候,歌声是我的祝福,雪花是我的贺卡,美酒是我的飞吻,轻风是我的拥抱,快乐是我的礼
物。
80 请选择愿望:A:巧克力+玫瑰 B:自助餐+烛光 C:电影+零食 D:以上皆是
91 每一朵雪花飘下,每一个烟火燃起,每一秒时间流动,每一份思念传送,都代表着我想要送你的每一个祝
福,圣诞快乐!
92 春节人们用筷子吃饺子,中秋节人们用手吃月饼,圣诞节人们用刀叉吃烧鹅。现在,圣诞节快到了,你还是躲一下吧,免得刀叉落到身上。
93 圣诞佳节恭喜你,发个短信祝福你,成功的事业属于你,开心的笑容常伴你,健康长寿想着你,最后还要
通知你,财神爷爷也要拜访你哦。
94 晚上笑一笑,睡个美满觉,早晨笑一笑,全天生活有情调,工作之余笑一笑,满堂欢喜又热闹,烦恼之时笑一笑,一切烦恼全忘掉,祝圣诞快乐,笑口常开!
95 当雪花飘落,寒风吹起,才发觉,浪漫的圣诞已经飘然而至,这一刻什么都可以忘记,唯独不能忘记的是向好朋友你说声天冷了,注意身体,圣诞快乐!
96 圣诞乐,圣诞乐,快乐心涌,祝福手中握,条条短信是礼物,条条短信是快乐!礼物堆成堆,快乐汇成河。圣诞老人在说话,圣诞快乐!
97 圣诞节的快乐是因为有你在我身边,以后的日子里我会让你天天快乐,祝福是属于我们的,这不是承诺
是信心。
98 圣诞节到了,向支持我的朋友和我所爱的朋友说声感谢,感谢你走进我的生活,我会尽我最大的努力给
你无限的快乐!
99 圣诞老人,你现在已经收到了我的祝福,请马上跑到烟囱口处等待礼物的派送吧,谢谢。
100 ………………………………
圣诞祝语100句,总有一句适合你
01 以往的圣诞都是灰色的,今年有了你,一切都变得不同,我的世界一下子变得豁然开朗多姿多彩,我衷心地谢谢您。
02 我要把一切喜讯变成奶油,所有祝福柔成巧克力,所有快乐做成蛋糕答谢你,然后说声圣诞快乐!
03 我默默祈祷愿圣诞老人能在即将到来的圣诞之夜送我一个与我牵手同伴共同度过这奇妙的圣诞夜,结果他将你送给我。
04 考虑到24小时之内将会有铺天盖地的祝福短信堵塞网络,一向有远见聪明的我提前恭祝圣诞快乐、新年快乐!
05 如果每年的今夜有一个很肥的老人从窗口跳进来抓住你,把你装进袋子里,你不用担心,因为我想要的圣诞礼物就是你。
06 也许岁月将往事退色,或许空间将彼此隔离,但知道珍惜的依然是真心的友谊将再次对你说声圣诞快乐!
07 圣诞老人说所谓幸福是一个有健康的身体,有深爱你的人,一帮可依赖的朋友,当你收到此信息时,一切随之拥有。
08 送你一颗聚满礼物的圣诞树,顶上最大最亮的那颗星是我的真心,下面挂的是我的痴心,制造材料的是我一颗不变有心:圣诞快乐!
09 这是我发给你的三天后的信息,别偷看哦,叫你别看,还看,祝你圣诞快乐!
10 在这洋人的节日里,好想和你在一起,享受这醉人的气氛,然而你我分割两地,我只好在这轻声地对你说:“亲爱的,圣诞快乐!”
11 想念你的笑,想念你的外套,想念你白色袜子,装满圣诞的礼物。
12 圣诞节到了也,你有没有在床头挂起臭袜子哦,圣诞老公公会把我最好的礼物丢进去的,圣诞快乐!
13 圣诞老人说,今年他要把礼物装在我们两的袜子里,所以平安夜你一定要陪在我身边。
14 Merry Christmas and best wishes for happy new year!
15 在这迷人的圣诞,你躲在家里生蛋蛋,生了一堆恐龙蛋,还有一只小鸡蛋,猪,圣诞快乐!
16 快乐圣诞,什么是圣诞快乐?不是那快乐的阳光,也不是鸟儿的啁啾,那是愉快的念头和幸福的笑容,是温馨慈爱的问候。
17 这些天来一直有个问题困惑着我,你明明不是鸡,为什么人人都要祝你圣诞快乐呢?
18 知道圣诞节是谁的节日吗?不知道,是你的节日嘛,是圣诞节啊!笨蛋。
19 HI,你怎么还在这啊,你知道你的重要性吗?没了你,谁拉着圣诞老公公去给大家送礼物啊,圣诞快乐!
20 心到,想到,看到,闻到,听到,人到,手到,脚到,说到,做到,得到,时间到,你的礼物没到,只有
我的祝福传到。
21 因为你的存在,这一天是有更特别的意义,因为可以和你一起相约在树下许下一个共同的心愿,让我们相爱一生吧。
22 如果你是圣诞,我是元旦,你是圣诞老人,我是驯鹿道夫,你是圣诞老婆婆,我是圣诞老公公,祝你圣诞快乐!
23 平安夜请给我与你共度的机会,小小的要求能满足我吗?
24 在这美好的日子,没有最美的词句,没有多情的言语,没有精美的礼品,有的只是朋友深深的祝福,圣诞
快乐!
25 在这24号的晚上,煮两个鸡蛋,我吃一个,送给你的就是一个圣诞,祝你节日快乐!
26 圣诞树上耀眼的彩灯,那是我祈祷你平安一生,圣诞夜里优扬的钟声,那是我祝福你快乐一生。
27 圣诞前夜的晚上,我想和你一起走入教堂,好不好?
28 喜欢你是很久远的事了,真的好想在这个圣诞之夜与你共跳华尔兹,伏在你的身边轻轻地说,我好喜欢你。
29 亲爱的,尽管我不能陪你度过我们的第一个圣诞节,但是我还要送给你我深深的祝福,愿你明天更美丽。
30 白雪飘飘,鹿铃霄霄,甜蜜的平安夜又来到,小手摆摆,舞姿曼曼,快乐的圣诞节日多美好。
31 在这个特别的日子里,我想跟你说一声:“圣诞快乐!”
32 各位圣诞老人,圣诞快乐吗?不快乐就多寄一些礼物给我吧,我知道你们都是购物狂,一个个购完物就再送点,心里才觉得爽。
33 Merry Christmas 愿世界充满祥和,我以最真诚的心祝福你拥有幸福的一年,愿主保佑你,阿门。
34 圣诞节快乐!看短信的快把礼物送来,不然你这个圣诞夜会坐立不安咯,听到没有,别笑大傻瓜!
35 相识相知未相见,平安夜的朋友,平安夜我们能相聚在一起吗?
36 我向圣诞老人许了愿,我希望不管你的脚多臭,在明早当你穿起袜子时,等收到我托圣诞老人带给你的满
满的祝福,暖暖你的心和脚丫子。
37 小巫婆,圣诞节又要到了,我有祝福给你,希望你不要再笨了呆了,要可可爱爱的哦,哎呀,反正就是你
要过的比我幸福就对了哦。
38 值此圣诞到来之际,我只有一句话要告诉你,今天早饭我没吃,中饭我没吃,下班我去找你。
39 好久没有听到你的声音,好久没有人听我谈心,在雪花飞舞的日子里,真的好想你,祝你圣诞节快乐!
40 如果你现在一个人,我祝你圣诞快乐,二个人那也祝圣诞快乐,如果是一伙人,请告诉我,你们在什么地
方。
41 我想在你最高兴时说出我的心里话,浪漫的圣诞夜里机会来了,你高兴得像头小猪,生气时更像,哈哈。
42 今年圣诞不收礼,收礼只收短信息
43 圣诞老人问:“今天是什么日子啊?”小精灵说:“今天是圣诞节啊!”圣诞老人说:“哦,真糟糕,又要加班,我最恨这一天了。”
44 美酒、蜡烛、大餐,多么完美的圣诞节,唯独就缺你我的朋友,还有你的钱包。
45 亲爱的,你比圣诞树上的星星还明亮,你比驯鹿还可爱,但你把胡子剃了吗,我可不想你和圣诞老人一个模样。
46 听,圣诞老人的铃声,快去看看啊,怎么这么快就回来了,什么,倒垃圾的,别太急哦!
47 你怎么才起啊,快睁大眼睛,昨天夜里我爬上你的床,在你枕头下藏了一件很特别的礼物哦!
48 愿圣诞之光普照你的每一个日子,愿阳光鲜花洒满你的人生旅程。
49 你快乐,我快乐,大家快乐,快乐圣诞节,哦,我的圣诞礼物呢,快找找,快找找,哦,收到了吗,我带给你的是快乐。
50 圣诞快乐,并不是只在特别的日子才会想起你,但是圣诞节的时候,一定会让你收到我的祝福。
51 当钟声响起,我就是你的!别讲错了,不是婚礼上的钟声,是圣诞的钟声,而我是你献给礼物的人。
52 面对圣诞,面对身边匆匆而过的人,想起你,心中有一种感动。爱就是那种无法言抒的表达。
53 圣诞节又要到了,希望今年的圣诞节能和我爱的人一起过。想问你,你愿意当我爱的人吗?
54 HI!已经有一阵子没见到你了,不知道你现在好不好?圣诞节和新年就要到了,愿你拥有一个难忘和快乐的圣诞!希望你在新的一年要快乐的过哦!
55 平安夜我们去聚餐,圣诞夜我们去唱歌,狂欢夜我们去蹦迪。我要我们在一起!
56 你的离去我不知如何面对,你没有给我任何安慰。我的眼中有泪水,圣诞节你会回来吧?不要让我再次心碎!
57 有句话每年圣诞我都想说,可是苦于没有机会。现在我实在憋不住了,请把你留在我沙发上的袜子拿走!!
58 圣诞节真的觉得好寂寞哦!因为没有你在身边,其实我真的想你了,好想好想好想让你陪我度过这个浪漫
的平安夜。
59 孩子啊,我是圣诞老人,有一份圣诞礼物要送给你。什么,你们家没有烟窗,还是不去买了!
60 圣诞之夜祝福你,愿圣诞节的欢声笑语和欢乐气氛永远萦绕着你!
61 淡淡一点的友情很深,淡淡一点的味道很纯,淡淡一点的祝福最真,祝愿圣诞快乐!
62 请选择愿望:A:巧克力+玫瑰 B:自助餐+烛光 C:电影+零食 D:以上皆是
63 只有钟声响起,愿我的祝福化作飞翔的天使,飞向你的窗口,圣诞快乐!
64 如果你今天没有收到我的圣诞礼物,那一定是你的袜子有个大洞,快补吧。
65 为了响应环保,节省纸张,在圣诞节不要送我圣诞卡了,请直接在尽可能大的纸币上写你的祝词就行了。
66 我没法去教堂为你祈祷,也没有圣诞的歌声,更没有圣诞的礼物,只在心里祈求,希望你健康每一天。
67 昨晚我做了一个梦,圣诞老人送我的礼物是一张两人圣诞晚餐券,你愿意和我一起过我们的第一个圣诞节
吗?
68 宝贝,平安夜的晚上我将和圣诞老人一起了现在你的面前,把眼睛闭上数到三。
69 亲爱的圣诞节快乐,你知道我是谁吗?这个问题对你来说也许不重要,但对我很在意哦。
70 雪在下啊,圣诞老人正踩在外面青青的圣诞树窃笑,睡吧,宝贝,明天你将收到心爱的礼物,恭候我。
71 今年你愿意做我的圣诞老人吗,在圣诞的晚上将礼物放在我的床头。
72 在这时髦的大好日子里,我有万千祝福而无从说起,只想很老土的向你说四个字:圣诞快乐!
73 有你在的每一天都像在过圣诞节。
74 平安夜,祝福你,我的朋友,温馨平安!欢乐时我和你一道分享,不开心时我和你一起承担。
75 送你的礼物实在太重了,鹿车拉不动,只好亲自送了,记得等着我,等着我说圣诞快乐!
76 用中文说圣诞快乐,用英文说Merry Christmas,用心里话说我想要的圣诞礼物什么时候给我啊。
77 空中点点闪烁的银光环绕着缤纷的梦想,祝福你,双手合十许下的心愿,都一一实现在眼前。
78 恭贺圣诞快乐,在新的一年里有甜有蜜,有富有贵,有滋有味,有安有康。
79 钟声是我的问候,歌声是我的祝福,雪花是我的贺卡,美酒是我的飞吻,轻风是我的拥抱,快乐是我的礼
物。
80 请选择愿望:A:巧克力+玫瑰 B:自助餐+烛光 C:电影+零食 D:以上皆是
91 每一朵雪花飘下,每一个烟火燃起,每一秒时间流动,每一份思念传送,都代表着我想要送你的每一个祝
福,圣诞快乐!
92 春节人们用筷子吃饺子,中秋节人们用手吃月饼,圣诞节人们用刀叉吃烧鹅。现在,圣诞节快到了,你还是躲一下吧,免得刀叉落到身上。
93 圣诞佳节恭喜你,发个短信祝福你,成功的事业属于你,开心的笑容常伴你,健康长寿想着你,最后还要
通知你,财神爷爷也要拜访你哦。
94 晚上笑一笑,睡个美满觉,早晨笑一笑,全天生活有情调,工作之余笑一笑,满堂欢喜又热闹,烦恼之时笑一笑,一切烦恼全忘掉,祝圣诞快乐,笑口常开!
95 当雪花飘落,寒风吹起,才发觉,浪漫的圣诞已经飘然而至,这一刻什么都可以忘记,唯独不能忘记的是向好朋友你说声天冷了,注意身体,圣诞快乐!
96 圣诞乐,圣诞乐,快乐心涌,祝福手中握,条条短信是礼物,条条短信是快乐!礼物堆成堆,快乐汇成河。圣诞老人在说话,圣诞快乐!
97 圣诞节的快乐是因为有你在我身边,以后的日子里我会让你天天快乐,祝福是属于我们的,这不是承诺
是信心。
98 圣诞节到了,向支持我的朋友和我所爱的朋友说声感谢,感谢你走进我的生活,我会尽我最大的努力给
你无限的快乐!
99 圣诞老人,你现在已经收到了我的祝福,请马上跑到烟囱口处等待礼物的派送吧,谢谢。
100 ………………………………
2004年12月24日
Christmas Eve
This is another Christmas Eve after my joining IRLab. Reminding last year's Christmas Eve, I was studying in the Lab and watched a movie.
This evening, I with my gf, WF, went to the Lee pond. There were so many people in that big room. All of us enjoied ourselves.
Happy eve! Thanks to WF!
This evening, I with my gf, WF, went to the Lee pond. There were so many people in that big room. All of us enjoied ourselves.
Happy eve! Thanks to WF!
2004年12月23日
午休
中午需要午休吗?
这个话题我自己和自己讨论过很多次。本科的时候由于中午回寝室时不能睡着,所以就在教室睡半小时了。进了实验室后就改在实验室扒在桌上睡觉了。
这学期一直坚持睡午觉,前些日子有时没有睡午觉,一到下午就发困,工作效率急剧下降。今天中午忙到一点左右,回到寝室睡到两点。一觉醒来感觉很舒服,下午的工作效率也很高。
无论在教室还是在实验室我都观察过,一般每个人都回午休一会儿,只是形式不一样。教室里的一般都是扒在桌上睡一觉,但是睡觉醒来回感到有点难受。有一些回寝室睡午觉,睡觉时不但大脑得到了休息,四肢和身体也得到了休息。
回寝室睡觉现在有这个氛围,因为寝室成员都有午睡的习惯,中午没有任何杂音。睡觉质量也非常高。
中午午睡一小时,身体、工作、学习都会效率得到提高!
好习惯就应该坚持下来。
这个话题我自己和自己讨论过很多次。本科的时候由于中午回寝室时不能睡着,所以就在教室睡半小时了。进了实验室后就改在实验室扒在桌上睡觉了。
这学期一直坚持睡午觉,前些日子有时没有睡午觉,一到下午就发困,工作效率急剧下降。今天中午忙到一点左右,回到寝室睡到两点。一觉醒来感觉很舒服,下午的工作效率也很高。
无论在教室还是在实验室我都观察过,一般每个人都回午休一会儿,只是形式不一样。教室里的一般都是扒在桌上睡一觉,但是睡觉醒来回感到有点难受。有一些回寝室睡午觉,睡觉时不但大脑得到了休息,四肢和身体也得到了休息。
回寝室睡觉现在有这个氛围,因为寝室成员都有午睡的习惯,中午没有任何杂音。睡觉质量也非常高。
中午午睡一小时,身体、工作、学习都会效率得到提高!
好习惯就应该坚持下来。
2004年12月22日
Nice software Source Insight
The Problem
You have a multitude of source files spread out all over the place. You have to deal with functions that somebody else wrote. You have to figure out how some piece of code works and see all of its clients. You didn’t write the code, or you wrote it in a past life.
You may be one of the cleverest developers in the world, but if you can’t find all the myriad pieces of your program, or can’t get your head wrapped around the code, then you will not be very productive.
The Solution
Source Insight was designed to enhance your ability to understand and modify your program. Our company mission is to increase programming team productivity by clarifying source code, presenting information in a useful way, and allowing programmers to modify software in large, complex projects.
Think of your program’s source code as a free form database of information. It has not only classes, members, and functions in it, but it has many important comments. (You do have comments, don’t you?)
Your source code also has a history. In fact, many large programs have a long lifetime that includes contributions by many programmers over many years. Some of it is not pretty, but you have to live with it.
Source Insight acts as an information server that surrounds your project’s source code. With it, you can have instant access to symbolic and textual information in your program.
Whether you are new to a project, or an old-timer, Source Insight will give you useful leverage to stay productive.
You have a multitude of source files spread out all over the place. You have to deal with functions that somebody else wrote. You have to figure out how some piece of code works and see all of its clients. You didn’t write the code, or you wrote it in a past life.
You may be one of the cleverest developers in the world, but if you can’t find all the myriad pieces of your program, or can’t get your head wrapped around the code, then you will not be very productive.
The Solution
Source Insight was designed to enhance your ability to understand and modify your program. Our company mission is to increase programming team productivity by clarifying source code, presenting information in a useful way, and allowing programmers to modify software in large, complex projects.
Think of your program’s source code as a free form database of information. It has not only classes, members, and functions in it, but it has many important comments. (You do have comments, don’t you?)
Your source code also has a history. In fact, many large programs have a long lifetime that includes contributions by many programmers over many years. Some of it is not pretty, but you have to live with it.
Source Insight acts as an information server that surrounds your project’s source code. With it, you can have instant access to symbolic and textual information in your program.
Whether you are new to a project, or an old-timer, Source Insight will give you useful leverage to stay productive.
2004年12月21日
Comments in VS.NET
In VS6.0, if you want to add comments on some modules, it is very convenient using "注释精灵"。 But it is very inconvenient in VS.NET.
In order to solve this problem, I had tried lots of solutions.
1. Try to add "注释精灵" in VS.NET. But I had not managed it. After trying lots of paths, it coud not be run right.
2. Find some add-in modules for VS.NET. But there were few modules included enough functions.
Finally, I want to write a module for this function using VBA.
In order to solve this problem, I had tried lots of solutions.
1. Try to add "注释精灵" in VS.NET. But I had not managed it. After trying lots of paths, it coud not be run right.
2. Find some add-in modules for VS.NET. But there were few modules included enough functions.
Finally, I want to write a module for this function using VBA.
2004年12月20日
The snow football match
We had discussed it for a long time. This morning we realized it.
In the morning, 9:00, about 16 persons in the snow football playground. We chose a nice court and were divided into two groups. The first group consisted by the five members of IRClub and four members of IRLab. The other group included all the other members of IRLab.
The final score was 2:1.
After the match we all tired.
In the morning, 9:00, about 16 persons in the snow football playground. We chose a nice court and were divided into two groups. The first group consisted by the five members of IRClub and four members of IRLab. The other group included all the other members of IRLab.
The final score was 2:1.
After the match we all tired.
2004年12月19日
Coreference resolution recent research report
This afternoon, it is my trun to give report on coreference resolution.
I finished the reading outline of the paper Coreference Resolution for Information Extraction and then prepared for it yesterday. But I only made eight slides yeaterday.
This morning, I got up very early at 6:30, and began to continue the prepartion for the presentation. When it came to 11:30, I hade done them as my plan.
This afternoon, in the weekly meeting of our lab, I gave the presentation. My topic is coreference resolution recent research report. I gave the introduction on anaphora resolution and coreference resolution.
After the report, Mrs. Qin gave me some suggestion on my speech speed. I should slow doen my speed. Thanks for Mrs. Qin.
I finished the reading outline of the paper Coreference Resolution for Information Extraction and then prepared for it yesterday. But I only made eight slides yeaterday.
This morning, I got up very early at 6:30, and began to continue the prepartion for the presentation. When it came to 11:30, I hade done them as my plan.
This afternoon, in the weekly meeting of our lab, I gave the presentation. My topic is coreference resolution recent research report. I gave the introduction on anaphora resolution and coreference resolution.
After the report, Mrs. Qin gave me some suggestion on my speech speed. I should slow doen my speed. Thanks for Mrs. Qin.
2004年12月18日
Coreference Resolution for Information Extraction
论文题目: Coreference Resolution for Information Extraction 针对信息抽取的指代消解
论文出处: ACL2004 workshop on Coreference resolution
发表时间: 2004
论文作者: Dmitry Zelenko, Chinatsu Aone, Jason Tibbetts
作者单位: 美国华盛顿州SRA International, 4300 Fair Lakes Ct.,Faiefax, VA 22033
摘要:
English:
We compare several approaches to coreference resolution in the context of information extraction. We present a loss-based decoding framework for coreference resolution and a greedy algorithm for approximate coreference decoding, in conjunction with Perceptrpn and logistic regression learning algorithms. We experimentally evaluate the presented approaches using the Automatic Content Extraction evaluation methodology, with promising results.
中文:
我们在信息抽取文本上对比了几种指代消解算法。 我们提出了一种基于损失的解码框架用于指代消解,一种用于近似共指解码的贪心算法,其中联合了感知机和对数回归学习算法。我们在ACE评价方法的基础上实验了我们的方法,获得了很好的结果。
为什么要做这个题目:
指代消解是一个传统的研究课题,研究内容在于确定文本中的话语是否指向现实世界中的同一实体。本文将指代消解限制在针对信息抽取的文本上(命名实体都被抽取出来了)。我们不解决所有的指代消解问题,只是将文本中抽取出来的实体进行分类。
基于抽取的指代消解问题来自于ACE评测中的实体检测和跟踪(EDT)任务。EDT要求检测人名、一般代词、代词等entity mentions,然后将指向同一真实实体的entity mentions合并到一个entity里面。我们采用ACE制定的规范将entity mentions合并后的eneity看成是entity mentions的等价类。
本文中的工作在于将已经抽取出来的Entity mentions合并。
别人怎么做的
共指消解综述
指代消解(Anaphpra resolution)问题已经被广泛研究(详见Mitkov的专著Anaphora resolution),共指消解(Coreference resolution)是和指代消解相似的问题。指代体被称为指代词(anaphora),被指代的词语被称为先行词(antecent)。指代消解将问题限定在名词性(nominal)和代词性(pronominal)的指代词,因此忽视了对于信息抽取非常重要的人名的消解。更进一步,指代消解只研究回指现象(指代词在后,先行词在前)而忽视了较为少见的预指现象(指代词在前,先行词在后)。我们认为共指消解(coreference resolution)是包含预指和回指的人名、名词性、代词性的实体消解。
我们定义文档中的一组entity mentions之间的指代关系coref。任何两个entity mentions之间具有关系coref(x,y),当且仅当x和y之间具有指代关系。
根据包含的entity mentions类型的不同将共指关系划分位以下三个子任务常常是有用的。更精确的说,如果x或y是代词性实体,称为代词消解;如果x或y是名词性实体,称为名词短语消解;如果x和y都是名词实体,称为名词消解。
一个信息抽取系统需要解决三方面的问题。但是不同的模型和算法选择或者适用于名称消解、名词短语消解、代词消解。
大多数早期的指代消解和共指消解工作都是在处理代词消解(Lappin and Leass,1994; Kennedy and Boguraev, 1996)。早期的方法对一篇文档中的每个代词寻找最好的先行词。对“最好”的不同定义产生出了基于话语分析理论的不同的复杂的规则集合。
代词和名词短语消解在九十年代中期由于机器学习方法的应用而得到极大的发展,如Aone and Bennett, 1996; McCarthy and Lehnert, 1995; Ng, 2001; Ng and Cardie, 2002。
消解实例是一对entity mentions的特征表示,用于表明候选先行词和指代词之间的属性。这些特征对于确定待考查的指代词和候选先行语之间的指代关系很有用。消解实例有一个表示是否具有指代关系的属性值,一般用-1和+1表示。大多数的基于学习的系统都需要很大规模的手工特征集。(Ng, 2001)
大量的机器学习方法已经在实验上应用到了共指消解问题上。许多发表的文章都采用决策树算法(Aone and Bennett, 1996; Ng, 2001; Ng and Cardie, 2002)。我们提出了一种全局概率模型用于共指消解:通用概率模型(generative probabilistic)(Charniak et al, 1998)和条件随机域模型(conditional random field model)(McCallum and Wellner, 2003)。
基于学习算法的共指消解分类器的输出需要借助于解码算法(deconding algorithm)来用于划分entity mentions的等价类。一个最为流行的解码算法将将指代词指向最近的一个符合条件的先行词(Ng, 2001)。我们称之为最近链接(link-first)解码算法。另一种可选的解码算法是最佳链接(link-best),将每个候选先行词都计算连接概率,然后挑出最高概率的候选先行词作为最终先行词(Ng and Cardie, 2002)。我们将两种方法都加以考虑并采用新的解码框架下的不同实验来进行对比。
我们的解码算法框架很像(McCallum and Wellner, 2003)的条件随机域模型方法。采用条件随机域的共指解码问题产生了一种相关聚类问题(Bansal et al, 2002)。我们也将共指解码问题简化为相关聚类问题,但是采用了不同的近似方法。
由于缺乏训练数据,我们在名词短语的基础上实现共指聚类。换句话说,名词短语的attribute被用于距离函数,在启发式的聚类算法中产生一个对应于共指消解的聚类划分。
作者提出了怎样的新方法
共指消解框架:共指实例和特征表示、共指实例生成、共指分类器的学习算法、将预测共指分类器结合到聚类话语分析中。
共指实例分五种类型来进行特征表示
共指实例采用的方法是从当前entity mention回退,遇到窗口M之内的共指mention就生成正例,不具有共指关系的mention就生成反例。
共指解码算法采用的是link-first 和link-best方法。
机器学习方法采用的是对数回归和感知机。
构造损失函数来表达对数回归和感知机。但是变换之后的损失函数的求解是NP难问题所以将问题转换为贪心解码算法:先分类,再将分类结果转换为聚类结果。
这种方法从理论上分析有何长处
本文的贪心求解方法算法效率很高
为了验证这种方法的优点作者做了那几个实验,实验结果如何
在ACE2003的英文语料上进行了相关的评测,同样的贪心算法,采用了六种方法
实例生成方法 算法 解码算法 ACE评测得分
连续的 对数回归 link-first 75.9
完全的 对数回归 link-best 74.2
完全的 对数回归 greedy 76.4
连续的 投票感知机 link-first 75.8
完全的 投票感知机 link-best 75.4
完全的 投票感知机 greedy 75.8
ACE2003共105篇英文文本,分为训练文本53篇,测试文本52篇。需要指出的是(LDC,2003)中指出人工标注的水平评分大约在85分左右。在ACE的评测规范中比标准entity中的mentions少的惩罚大于多的情况。
实验是否证明了作者的方法的优越性
是
还存在哪些问题
ACE的评测是外部评测,我们还需要进行一些内部评测。
个人想到的改进方案或者个人的创新观点
可以和其他机器学习算法进行对比,比如决策树、遗传算法等。
论文出处: ACL2004 workshop on Coreference resolution
发表时间: 2004
论文作者: Dmitry Zelenko, Chinatsu Aone, Jason Tibbetts
作者单位: 美国华盛顿州SRA International, 4300 Fair Lakes Ct.,Faiefax, VA 22033
摘要:
English:
We compare several approaches to coreference resolution in the context of information extraction. We present a loss-based decoding framework for coreference resolution and a greedy algorithm for approximate coreference decoding, in conjunction with Perceptrpn and logistic regression learning algorithms. We experimentally evaluate the presented approaches using the Automatic Content Extraction evaluation methodology, with promising results.
中文:
我们在信息抽取文本上对比了几种指代消解算法。 我们提出了一种基于损失的解码框架用于指代消解,一种用于近似共指解码的贪心算法,其中联合了感知机和对数回归学习算法。我们在ACE评价方法的基础上实验了我们的方法,获得了很好的结果。
为什么要做这个题目:
指代消解是一个传统的研究课题,研究内容在于确定文本中的话语是否指向现实世界中的同一实体。本文将指代消解限制在针对信息抽取的文本上(命名实体都被抽取出来了)。我们不解决所有的指代消解问题,只是将文本中抽取出来的实体进行分类。
基于抽取的指代消解问题来自于ACE评测中的实体检测和跟踪(EDT)任务。EDT要求检测人名、一般代词、代词等entity mentions,然后将指向同一真实实体的entity mentions合并到一个entity里面。我们采用ACE制定的规范将entity mentions合并后的eneity看成是entity mentions的等价类。
本文中的工作在于将已经抽取出来的Entity mentions合并。
别人怎么做的
共指消解综述
指代消解(Anaphpra resolution)问题已经被广泛研究(详见Mitkov的专著Anaphora resolution),共指消解(Coreference resolution)是和指代消解相似的问题。指代体被称为指代词(anaphora),被指代的词语被称为先行词(antecent)。指代消解将问题限定在名词性(nominal)和代词性(pronominal)的指代词,因此忽视了对于信息抽取非常重要的人名的消解。更进一步,指代消解只研究回指现象(指代词在后,先行词在前)而忽视了较为少见的预指现象(指代词在前,先行词在后)。我们认为共指消解(coreference resolution)是包含预指和回指的人名、名词性、代词性的实体消解。
我们定义文档中的一组entity mentions之间的指代关系coref。任何两个entity mentions之间具有关系coref(x,y),当且仅当x和y之间具有指代关系。
根据包含的entity mentions类型的不同将共指关系划分位以下三个子任务常常是有用的。更精确的说,如果x或y是代词性实体,称为代词消解;如果x或y是名词性实体,称为名词短语消解;如果x和y都是名词实体,称为名词消解。
一个信息抽取系统需要解决三方面的问题。但是不同的模型和算法选择或者适用于名称消解、名词短语消解、代词消解。
大多数早期的指代消解和共指消解工作都是在处理代词消解(Lappin and Leass,1994; Kennedy and Boguraev, 1996)。早期的方法对一篇文档中的每个代词寻找最好的先行词。对“最好”的不同定义产生出了基于话语分析理论的不同的复杂的规则集合。
代词和名词短语消解在九十年代中期由于机器学习方法的应用而得到极大的发展,如Aone and Bennett, 1996; McCarthy and Lehnert, 1995; Ng, 2001; Ng and Cardie, 2002。
消解实例是一对entity mentions的特征表示,用于表明候选先行词和指代词之间的属性。这些特征对于确定待考查的指代词和候选先行语之间的指代关系很有用。消解实例有一个表示是否具有指代关系的属性值,一般用-1和+1表示。大多数的基于学习的系统都需要很大规模的手工特征集。(Ng, 2001)
大量的机器学习方法已经在实验上应用到了共指消解问题上。许多发表的文章都采用决策树算法(Aone and Bennett, 1996; Ng, 2001; Ng and Cardie, 2002)。我们提出了一种全局概率模型用于共指消解:通用概率模型(generative probabilistic)(Charniak et al, 1998)和条件随机域模型(conditional random field model)(McCallum and Wellner, 2003)。
基于学习算法的共指消解分类器的输出需要借助于解码算法(deconding algorithm)来用于划分entity mentions的等价类。一个最为流行的解码算法将将指代词指向最近的一个符合条件的先行词(Ng, 2001)。我们称之为最近链接(link-first)解码算法。另一种可选的解码算法是最佳链接(link-best),将每个候选先行词都计算连接概率,然后挑出最高概率的候选先行词作为最终先行词(Ng and Cardie, 2002)。我们将两种方法都加以考虑并采用新的解码框架下的不同实验来进行对比。
我们的解码算法框架很像(McCallum and Wellner, 2003)的条件随机域模型方法。采用条件随机域的共指解码问题产生了一种相关聚类问题(Bansal et al, 2002)。我们也将共指解码问题简化为相关聚类问题,但是采用了不同的近似方法。
由于缺乏训练数据,我们在名词短语的基础上实现共指聚类。换句话说,名词短语的attribute被用于距离函数,在启发式的聚类算法中产生一个对应于共指消解的聚类划分。
作者提出了怎样的新方法
共指消解框架:共指实例和特征表示、共指实例生成、共指分类器的学习算法、将预测共指分类器结合到聚类话语分析中。
共指实例分五种类型来进行特征表示
共指实例采用的方法是从当前entity mention回退,遇到窗口M之内的共指mention就生成正例,不具有共指关系的mention就生成反例。
共指解码算法采用的是link-first 和link-best方法。
机器学习方法采用的是对数回归和感知机。
构造损失函数来表达对数回归和感知机。但是变换之后的损失函数的求解是NP难问题所以将问题转换为贪心解码算法:先分类,再将分类结果转换为聚类结果。
这种方法从理论上分析有何长处
本文的贪心求解方法算法效率很高
为了验证这种方法的优点作者做了那几个实验,实验结果如何
在ACE2003的英文语料上进行了相关的评测,同样的贪心算法,采用了六种方法
实例生成方法 算法 解码算法 ACE评测得分
连续的 对数回归 link-first 75.9
完全的 对数回归 link-best 74.2
完全的 对数回归 greedy 76.4
连续的 投票感知机 link-first 75.8
完全的 投票感知机 link-best 75.4
完全的 投票感知机 greedy 75.8
ACE2003共105篇英文文本,分为训练文本53篇,测试文本52篇。需要指出的是(LDC,2003)中指出人工标注的水平评分大约在85分左右。在ACE的评测规范中比标准entity中的mentions少的惩罚大于多的情况。
实验是否证明了作者的方法的优越性
是
还存在哪些问题
ACE的评测是外部评测,我们还需要进行一些内部评测。
个人想到的改进方案或者个人的创新观点
可以和其他机器学习算法进行对比,比如决策树、遗传算法等。
2004年12月17日
Event Clustering on Streaming News Using Co-refernece Chains and Event Words
论文题目:Event Clustering on Streaming News Using Co-refernece Chains and Event Words 利用指代链和事件词的新闻流事件聚类
论文出处 :ACL2004 workshop on coreference resolution
发表时间 :2004
论文作者 :June-Jei Kuo, Hsin-Hsi Chen
作者单位 :Department of Computer Science and Information Engineering,National Taiwan University, Taipei, Taiwan台湾国立大学计算机科学与信息工程系
摘要
English:
Event clustering on streaming news aims to group documents by events automatically. This paper employs co-reference chains to extract the most representative sentences, and then uses them to select the most informative features for clustering. Due to the long span of events, a fixed threshold approach prohibits the latter documents to be clustered and thus decreases the performance. A dynamic threshold using time decay function and spanning window is proposed. Besides the noun phrases in co-reference chains, event words in each sentence are also introduced to improve the related performance. Two models are proposed. The experimental results show that both event words and co-reference chains are useful on event clustering.
中文:
新闻流上的事件聚类目的在于自动根据事件文本聚类。本文利用共指链抽取表示性最强的句子,然后利用这些句子选择最好的信息特征用于聚类。由于事件之间跨度较大,固定阈值的聚类算法限制了后来文档被聚类从而降低了聚类的效果。提出了采用基于时间衰退函数和跨度窗口的动态阈值聚类方法。除去指代链中的名次短语外,每个句子中的事件词也被用于提高相关的效能。提出了两个模型。实验结果显示事件词和指代链对聚类都很有用。
为什么要做这个题目
新闻在网上到处散布,在瞬息万变的网络时代,发现和跟踪新闻事件对于决策的制定非常有用。事件聚类就是要对指定的文档进行有效的聚类。
事件聚类背后需要解决的问题有五个:
多少特征可以用于事件聚类?
对于新来的文档哪个线索模板可以用于指定类别?
各种聚类策略如何影响历史数据和在线数据的聚类效果?
时间因素是怎么影响聚类效果的?
怎样实现多语数据的聚类?
别人怎么做的
Chen and Ku(2002) 将命名实体、其他名词、动词看成是描述同一事件的文档的线索模板。提出了一种二次阈值的中心聚类方法来计算新文档和旧类之间的关联程度。其中采用考虑时间因素的的最小最近使用移除模型用于排除过旧和不重要的术语对聚类的影响。
Chen and Su(2003)将事件聚类看成是多语的多文档自动文摘。他们证明先聚类后翻译比先翻译后聚类的效果好。聚类之后的翻译可以减少翻译的错误。
Fukumoto and Suzuki(2000)提出将主题词和事件词用于事件跟踪。在特征提取方面,他们提出比词性方法更加偏重语义的方法。
Wong, Kuo and Chen(2001)利用这些方法来选取信息丰富的词语用于文本首行的生成,和多文档文摘的抽取句子的排序(Kuo, Wong, Lin and Chen, 2002)。
Bagga and Baldwin(1998)提出基于命名实体的跨文档共指消解,采用每个文档中的指代链来生成当前文档的摘要,然后利用摘要而不是全文来抽取信息词作为文档的特征。
Azzam, Humphreys, and Gaizauskas(1999)提出一种利用指代链生成文摘的简单模型。
Silber and McCoy(2002)提出一种采用词汇链的文摘模型,指出代词和指代消解都是不可缺少的特征。
作者提出了怎样的新方法
在某种程度上指代链和事件词是互相补充的基于语义特征选择的方法。指代链可以看成名词短语的等价类,事件词考虑多文档中的名词和动词术语特征。
本文将指代链和事件词都用于事件聚类。
本文中指代消解方法
由于本文仅仅是应用了指代消解的结果,没有给出指代消解的一些算法和程序。这里将文中对指代消解的讨论部分摘要如下。
Cardie and Wagstaff(1999)指出文档中的指代链列出了名词短语的等价类。指代消解算法的第一步是找出全部的可能的名词短语作为候选。这个过程包括分词、命名实体识别、词性标注、名词短语Chunking化。利用诸如词/短语自身、短语首词词性、命名实体、在文档中的位置、数(单数、复数、未知),代词、性别(男、女、未知)、首词语义等属性来进行分类。在MUC-7(1998)中对英文文档的自动指代消解最好的F值是61.8%。评测中采用了一个手工标注命名实体和指代链的语料。
利用指代链的方法
一个句子包含一条指代链中的任何节点成为改句子覆盖这条指代链。一个句子覆盖的指代链越多其重要性越大。
论文出处 :ACL2004 workshop on coreference resolution
发表时间 :2004
论文作者 :June-Jei Kuo, Hsin-Hsi Chen
作者单位 :Department of Computer Science and Information Engineering,National Taiwan University, Taipei, Taiwan台湾国立大学计算机科学与信息工程系
摘要
English:
Event clustering on streaming news aims to group documents by events automatically. This paper employs co-reference chains to extract the most representative sentences, and then uses them to select the most informative features for clustering. Due to the long span of events, a fixed threshold approach prohibits the latter documents to be clustered and thus decreases the performance. A dynamic threshold using time decay function and spanning window is proposed. Besides the noun phrases in co-reference chains, event words in each sentence are also introduced to improve the related performance. Two models are proposed. The experimental results show that both event words and co-reference chains are useful on event clustering.
中文:
新闻流上的事件聚类目的在于自动根据事件文本聚类。本文利用共指链抽取表示性最强的句子,然后利用这些句子选择最好的信息特征用于聚类。由于事件之间跨度较大,固定阈值的聚类算法限制了后来文档被聚类从而降低了聚类的效果。提出了采用基于时间衰退函数和跨度窗口的动态阈值聚类方法。除去指代链中的名次短语外,每个句子中的事件词也被用于提高相关的效能。提出了两个模型。实验结果显示事件词和指代链对聚类都很有用。
为什么要做这个题目
新闻在网上到处散布,在瞬息万变的网络时代,发现和跟踪新闻事件对于决策的制定非常有用。事件聚类就是要对指定的文档进行有效的聚类。
事件聚类背后需要解决的问题有五个:
多少特征可以用于事件聚类?
对于新来的文档哪个线索模板可以用于指定类别?
各种聚类策略如何影响历史数据和在线数据的聚类效果?
时间因素是怎么影响聚类效果的?
怎样实现多语数据的聚类?
别人怎么做的
Chen and Ku(2002) 将命名实体、其他名词、动词看成是描述同一事件的文档的线索模板。提出了一种二次阈值的中心聚类方法来计算新文档和旧类之间的关联程度。其中采用考虑时间因素的的最小最近使用移除模型用于排除过旧和不重要的术语对聚类的影响。
Chen and Su(2003)将事件聚类看成是多语的多文档自动文摘。他们证明先聚类后翻译比先翻译后聚类的效果好。聚类之后的翻译可以减少翻译的错误。
Fukumoto and Suzuki(2000)提出将主题词和事件词用于事件跟踪。在特征提取方面,他们提出比词性方法更加偏重语义的方法。
Wong, Kuo and Chen(2001)利用这些方法来选取信息丰富的词语用于文本首行的生成,和多文档文摘的抽取句子的排序(Kuo, Wong, Lin and Chen, 2002)。
Bagga and Baldwin(1998)提出基于命名实体的跨文档共指消解,采用每个文档中的指代链来生成当前文档的摘要,然后利用摘要而不是全文来抽取信息词作为文档的特征。
Azzam, Humphreys, and Gaizauskas(1999)提出一种利用指代链生成文摘的简单模型。
Silber and McCoy(2002)提出一种采用词汇链的文摘模型,指出代词和指代消解都是不可缺少的特征。
作者提出了怎样的新方法
在某种程度上指代链和事件词是互相补充的基于语义特征选择的方法。指代链可以看成名词短语的等价类,事件词考虑多文档中的名词和动词术语特征。
本文将指代链和事件词都用于事件聚类。
本文中指代消解方法
由于本文仅仅是应用了指代消解的结果,没有给出指代消解的一些算法和程序。这里将文中对指代消解的讨论部分摘要如下。
Cardie and Wagstaff(1999)指出文档中的指代链列出了名词短语的等价类。指代消解算法的第一步是找出全部的可能的名词短语作为候选。这个过程包括分词、命名实体识别、词性标注、名词短语Chunking化。利用诸如词/短语自身、短语首词词性、命名实体、在文档中的位置、数(单数、复数、未知),代词、性别(男、女、未知)、首词语义等属性来进行分类。在MUC-7(1998)中对英文文档的自动指代消解最好的F值是61.8%。评测中采用了一个手工标注命名实体和指代链的语料。
利用指代链的方法
一个句子包含一条指代链中的任何节点成为改句子覆盖这条指代链。一个句子覆盖的指代链越多其重要性越大。
2004年12月16日
Cross Document Co-reference Resolution Applications for People in the Legal Domain
Author: Choristopher Dozier and Thomas Zielund
Conference: Proceedings of the Workshop on Reference Resolution and its Applications. ACL2004
Summary:
English:
By combining information extraction and record linkage techniques, we have created a repository of references to attorneys, judges, and expert witness across a broard range of text sources. These text sources include news, caselaw, law reviews, Medline abstracts, and legal briefs among others. We briefly describe our cross document co-reference resolution algorithm and discuss applications these resolved references enable. Among these applications is one that shows summaries of relations chains between individuals based on their document co-occurence and cross document co-reference.
中文:
结合信息抽取和链接记录技术,我们从大量文本中构建了一个包含律师、法官、职业见证人的知识库。 这些文本包括新闻、案例、法律回顾、美国联机医学文献分析和检索系统摘要和其他法律纲要。我们简要介绍了我们的跨文档指代消解算法并讨论了如何应用这些消解后的索引。在总多应用中其中之一是基于共现和多文档的共指消解链关系的文摘。
Reading outline:
为什么要做这个题目:
在法律系统中法官、律师、职业见证人都起着非常重要的作用。律师在处理各种事务时需要查看许许多多的文档。为了方便律师的调研需求,我们构建了一个系统自动指向跨文档的律师、法官、职业见证人的索引。这些文本包括新闻、案例、法律回顾、美国联机医学文献分析和检索系统摘要和其他法律纲要。
别人怎么做的:
文中没有提到别人的工作
问题在哪里:
文中没有提到
作者提出了怎样的新方法:
我们的方法是先按照MUC的类型模板抽取每篇文档中的实体,然后基于贝叶斯链接技术多元匹配这些实体。最终利用最后生成的实体组在大量文档中生成各种简要的摘要信息。
采用的指代消解方法是利用一个大规模的法律事务电子词典构建每个实体的框架,通过和跨文档中按照模板抽取出来的实体信息和电子词典中抽取出来的实体框架进行聚类。
这种方法从理论上分析有何长处:
没有提及
为了验证这种方法的优点作者做了那几个实验:
直接完成系统,只是给出了一些实例,没有提及实验
实验结果如何:
未提及
实验是否证明了作者的方法的优越性:
未提及
还存在哪些问题:
仅仅是一个特定域的系统,由于借助了一个非常完善的电子词典,因此很难进行领域的切换。
个人想到的改进方案或者个人的创新观点:
可以采用一些广泛域的词典来构建实体框架信息,然后进行指代消解。
Conference: Proceedings of the Workshop on Reference Resolution and its Applications. ACL2004
Summary:
English:
By combining information extraction and record linkage techniques, we have created a repository of references to attorneys, judges, and expert witness across a broard range of text sources. These text sources include news, caselaw, law reviews, Medline abstracts, and legal briefs among others. We briefly describe our cross document co-reference resolution algorithm and discuss applications these resolved references enable. Among these applications is one that shows summaries of relations chains between individuals based on their document co-occurence and cross document co-reference.
中文:
结合信息抽取和链接记录技术,我们从大量文本中构建了一个包含律师、法官、职业见证人的知识库。 这些文本包括新闻、案例、法律回顾、美国联机医学文献分析和检索系统摘要和其他法律纲要。我们简要介绍了我们的跨文档指代消解算法并讨论了如何应用这些消解后的索引。在总多应用中其中之一是基于共现和多文档的共指消解链关系的文摘。
Reading outline:
为什么要做这个题目:
在法律系统中法官、律师、职业见证人都起着非常重要的作用。律师在处理各种事务时需要查看许许多多的文档。为了方便律师的调研需求,我们构建了一个系统自动指向跨文档的律师、法官、职业见证人的索引。这些文本包括新闻、案例、法律回顾、美国联机医学文献分析和检索系统摘要和其他法律纲要。
别人怎么做的:
文中没有提到别人的工作
问题在哪里:
文中没有提到
作者提出了怎样的新方法:
我们的方法是先按照MUC的类型模板抽取每篇文档中的实体,然后基于贝叶斯链接技术多元匹配这些实体。最终利用最后生成的实体组在大量文档中生成各种简要的摘要信息。
采用的指代消解方法是利用一个大规模的法律事务电子词典构建每个实体的框架,通过和跨文档中按照模板抽取出来的实体信息和电子词典中抽取出来的实体框架进行聚类。
这种方法从理论上分析有何长处:
没有提及
为了验证这种方法的优点作者做了那几个实验:
直接完成系统,只是给出了一些实例,没有提及实验
实验结果如何:
未提及
实验是否证明了作者的方法的优越性:
未提及
还存在哪些问题:
仅仅是一个特定域的系统,由于借助了一个非常完善的电子词典,因此很难进行领域的切换。
个人想到的改进方案或者个人的创新观点:
可以采用一些广泛域的词典来构建实体框架信息,然后进行指代消解。
2004年12月15日
Open your eyes
This evening, I was working in the laboratory. So did Abid Khan.
When we talked about the diary habit, we had some different ideas. He thought that it was time-consuming, and without enough things to record. But I believed if you opened your eyes and observed your life, you could found out many many things could be recorded.
I introduced the popular blog to him and suggested he writed a blog too. He thought it was a good idea.
As there was only Wangxiang Che had the rights for adding a user of the blog system of our laboratory. So we must wait for his return from Beijing.
When we talked about the diary habit, we had some different ideas. He thought that it was time-consuming, and without enough things to record. But I believed if you opened your eyes and observed your life, you could found out many many things could be recorded.
I introduced the popular blog to him and suggested he writed a blog too. He thought it was a good idea.
As there was only Wangxiang Che had the rights for adding a user of the blog system of our laboratory. So we must wait for his return from Beijing.
2004年12月14日
Typeset paper
The paper on summarization systems evaluation should be typeset again for another journal. It was in little trouble. I used lots of techniques from the book of the art of word-write and typeset scale documents.
2004年12月13日
Multi-Document Person Name Resolution
Author: Michael Ben Fleischman, Eduard Hovy
Conference: Proceedings of the Workshop on Reference Resolution and its Applications. ACL2004
Summary:
English:
Multi-document person name resolution focuses on the problem of determining if two instances with the same name and from different documents refer to the same individual. We present a two-step approach in which a Maximum Entropy model is trained to give the probability that two names refer to the same individual. We then apply a modified agglomerative clustering technique to partition the instances according to their referents.
中文:
多文档人名消解注重解决确定两个不同文档中的人名实例是否指向同一个实体的问题。我们提出了一种分两步的解决方法:采用最大熵模型来训练两个人名指向同一实体的概率,然后聚类方法来分类人名实例。
Reading outline:
为什么要做这个题目:
哲学家和艺术家在很早以前指出具有同样名称的实例指向同一实体。最近,人名的指代消歧变得越来越受计算语言学界的关注。伴随着因特网在数量和覆盖面上的增长,具有相同名称的不在同一网站上的人名实例指向同一实体的可能性越来越小。这个问题在信息检索、自动问答这类依靠小量数据来处理用户查询的问题中遇到巨大挑战。
另一个指代消歧的问题出现在采用实例构建本体(ontology)时。在构建本体时常常在网站上抽取概念/实例对(如 Paul Simon/Pop star)并添加到数据库中。加入时必须要保证与原来的概念/实例库属于同一个实体。常常出现具有同一名称不同实例的对指向不同的实体(如, Paul Simon/pop star 和 Paul Simon/politician).
别人怎么做的:
Mann and Yarowsky(2003)将多文档人名消解问题看成一个聚类问题,将原文中抽取得到的特征组合看成是词袋,然后采用聚类算法聚出两个类别。他们的工作中采用了两种评测方法:在真实搜索的基础上对人工标注的数据集上评测的精确率/召回率为0.88/0.73,采用伪名(将任何两个名字组合在一起看成一个名称的具有两种实体)达到了86.4%的精确率。
Bagga and Baldwin(1998)另外一种方法。他们首先在单篇文档中进行人名的指代消解,标出全部的指代链信息,然后抽取指代链上每个节点附近的文本构成单片文档中该指代链实体的摘要,然后采用词袋模型来构建每篇文档中每个指代链的向量,再用聚类算法来完成多文档中的人名消解。经过在173篇纽约时报上11个名叫John Smith的实例进行消解,最终达到了0.846的F值。
问题在哪里:
Mann and Yarowsky(2003)提出了许多有用的特征,但是聚类类别收到预先确定的限制;采用伪名的方法来评测很难确定这种方法对真实世界中的问题的泛化能力。
Bagga and Baldwin(1998)虽然他们的方法可以发现可变数量的指代实体,但是由于采用的是简单的词袋模型用于聚类,这就从本质上限制了他们方法的应用。还有一点是他们仅仅是对单人称进行了测试,很难保证对真实世界中的情况有很好的效果。
作者提出了怎样的新方法:
作者提出的方法分为两步:第一步采用最大熵模型来获得任何两个概念/实例对之间具有指代关系的概率,第二步采用了一个改进的聚类算法来合并可能的概念/实例对。
为了完成实验,准备工作如下:
数据:
在ACL数据集上抽取并标注出了2675个概念/实例对,分为训练集(1875个)、开发集(400个)、测试集(400个)。
特征:
名称特征(人口普查词典词频、ACL语料词频、Google上的检索返回条目数)
网页特征(将概念词语分为head1和head2,然后在google中构建query name+head1+head2,abs((name+head1)-(name+head2),(name+head1+head2)/((name+head1)+(name+head2)))
重叠特征(句子范围内查看重叠率)
语义特征(利用wordnet的本体之间查询任何两个词项之间的语义相似度)
统计特征(利用四个条件概率 p(i1=i2|i1->A,i2->B),p(i1->A,i2->B|i1=i2),p(i1->A|i2->B)+p(i2->B|i1->A),p(i1->A,i2->B)/(p(i1->A)+p(i2->B))
模型:采用YASMET Max.Ent package,Gaussian prior with mean 0的平滑方法。
聚类时采用了O(n平方)的算法
这种方法从理论上分析有何长处:
糅合大量特征,很多是网络中的特征,可以很好的完成任务
为了验证这种方法的优点作者做了那几个实验以及结果如何:
最大熵训练任何两个概念/实例对的指代概率,baseline方法(同概念及同指代)达到了83.5%的正确率,最大熵达到了90.75的准确率。聚类时采用了大量的T.Mitchell的机器学习中提到的假设检验的方法来判断实验效果。
实验是否证明了作者的方法的优越性:
是
还存在哪些问题:
最大熵可以实现特征之间的有机组合,体现在一些权值的设定上,但是这种设定是否达到最佳,还需要和其他方法进行比对。
个人想到的改进方案或者个人的创新观点:
采用遗传算法和最大熵进行比较。特征的选择方法可以借鉴Soon的研究方法中提到的特征来融入更多的特征,并进行更加有效的特征选择。
Conference: Proceedings of the Workshop on Reference Resolution and its Applications. ACL2004
Summary:
English:
Multi-document person name resolution focuses on the problem of determining if two instances with the same name and from different documents refer to the same individual. We present a two-step approach in which a Maximum Entropy model is trained to give the probability that two names refer to the same individual. We then apply a modified agglomerative clustering technique to partition the instances according to their referents.
中文:
多文档人名消解注重解决确定两个不同文档中的人名实例是否指向同一个实体的问题。我们提出了一种分两步的解决方法:采用最大熵模型来训练两个人名指向同一实体的概率,然后聚类方法来分类人名实例。
Reading outline:
为什么要做这个题目:
哲学家和艺术家在很早以前指出具有同样名称的实例指向同一实体。最近,人名的指代消歧变得越来越受计算语言学界的关注。伴随着因特网在数量和覆盖面上的增长,具有相同名称的不在同一网站上的人名实例指向同一实体的可能性越来越小。这个问题在信息检索、自动问答这类依靠小量数据来处理用户查询的问题中遇到巨大挑战。
另一个指代消歧的问题出现在采用实例构建本体(ontology)时。在构建本体时常常在网站上抽取概念/实例对(如 Paul Simon/Pop star)并添加到数据库中。加入时必须要保证与原来的概念/实例库属于同一个实体。常常出现具有同一名称不同实例的对指向不同的实体(如, Paul Simon/pop star 和 Paul Simon/politician).
别人怎么做的:
Mann and Yarowsky(2003)将多文档人名消解问题看成一个聚类问题,将原文中抽取得到的特征组合看成是词袋,然后采用聚类算法聚出两个类别。他们的工作中采用了两种评测方法:在真实搜索的基础上对人工标注的数据集上评测的精确率/召回率为0.88/0.73,采用伪名(将任何两个名字组合在一起看成一个名称的具有两种实体)达到了86.4%的精确率。
Bagga and Baldwin(1998)另外一种方法。他们首先在单篇文档中进行人名的指代消解,标出全部的指代链信息,然后抽取指代链上每个节点附近的文本构成单片文档中该指代链实体的摘要,然后采用词袋模型来构建每篇文档中每个指代链的向量,再用聚类算法来完成多文档中的人名消解。经过在173篇纽约时报上11个名叫John Smith的实例进行消解,最终达到了0.846的F值。
问题在哪里:
Mann and Yarowsky(2003)提出了许多有用的特征,但是聚类类别收到预先确定的限制;采用伪名的方法来评测很难确定这种方法对真实世界中的问题的泛化能力。
Bagga and Baldwin(1998)虽然他们的方法可以发现可变数量的指代实体,但是由于采用的是简单的词袋模型用于聚类,这就从本质上限制了他们方法的应用。还有一点是他们仅仅是对单人称进行了测试,很难保证对真实世界中的情况有很好的效果。
作者提出了怎样的新方法:
作者提出的方法分为两步:第一步采用最大熵模型来获得任何两个概念/实例对之间具有指代关系的概率,第二步采用了一个改进的聚类算法来合并可能的概念/实例对。
为了完成实验,准备工作如下:
数据:
在ACL数据集上抽取并标注出了2675个概念/实例对,分为训练集(1875个)、开发集(400个)、测试集(400个)。
特征:
名称特征(人口普查词典词频、ACL语料词频、Google上的检索返回条目数)
网页特征(将概念词语分为head1和head2,然后在google中构建query name+head1+head2,abs((name+head1)-(name+head2),(name+head1+head2)/((name+head1)+(name+head2)))
重叠特征(句子范围内查看重叠率)
语义特征(利用wordnet的本体之间查询任何两个词项之间的语义相似度)
统计特征(利用四个条件概率 p(i1=i2|i1->A,i2->B),p(i1->A,i2->B|i1=i2),p(i1->A|i2->B)+p(i2->B|i1->A),p(i1->A,i2->B)/(p(i1->A)+p(i2->B))
模型:采用YASMET Max.Ent package,Gaussian prior with mean 0的平滑方法。
聚类时采用了O(n平方)的算法
这种方法从理论上分析有何长处:
糅合大量特征,很多是网络中的特征,可以很好的完成任务
为了验证这种方法的优点作者做了那几个实验以及结果如何:
最大熵训练任何两个概念/实例对的指代概率,baseline方法(同概念及同指代)达到了83.5%的正确率,最大熵达到了90.75的准确率。聚类时采用了大量的T.Mitchell的机器学习中提到的假设检验的方法来判断实验效果。
实验是否证明了作者的方法的优越性:
是
还存在哪些问题:
最大熵可以实现特征之间的有机组合,体现在一些权值的设定上,但是这种设定是否达到最佳,还需要和其他方法进行比对。
个人想到的改进方案或者个人的创新观点:
采用遗传算法和最大熵进行比较。特征的选择方法可以借鉴Soon的研究方法中提到的特征来融入更多的特征,并进行更加有效的特征选择。
2004年12月12日
Europe feeling & get-together
This afternoon, the regular weekly meeting was hold. Today, Dr. Tliu gave us a wonderful speech on his europe visiting and feeling.
His feeling may be outlined in some items, as follows:
High quality: Their research, architecture, human diathesis, diligence and so on, are in high quality.
Self-confidence: In some aspects we are not short than them. We should have self-confidence to meet them.
Working hard: Their students' diligence is more than ours. We must studying and working more hard.
Beautiful secnery: Europe is very beautiful.
Dr. Tliu also introduced something on the possible cooperation with Europe. This was nice opportunity.
Tonight, as the original plan, we all classmates in our graduate English course of last term came to our teacher's home.
She Zhiyong and me came to the flower shop to buy some beautiful flowers. And then we all get together in the hall of A building. When it was 4:30 pm, we, eight students, came to teacher's home.
Mrs. Zhang was waiting for us. She was in a fever and could not touch any cold things. So we all help to prepare the chaffy dish. Mrs. Zhang, Lou Xiutao, Luo Chenglin, Sun Jin and me compose the arrangment group for this dinner.
It was a so nice chance for us to prepare this dinner. We sheared the vegetables and preparing the chaffy dish. When we began to savour the dishes, there was a fragrant smelling in the house. We all fell happy.
Mrs. Zhan was so good to us. Thanks for her! She was one of my best teachers.
His feeling may be outlined in some items, as follows:
High quality: Their research, architecture, human diathesis, diligence and so on, are in high quality.
Self-confidence: In some aspects we are not short than them. We should have self-confidence to meet them.
Working hard: Their students' diligence is more than ours. We must studying and working more hard.
Beautiful secnery: Europe is very beautiful.
Dr. Tliu also introduced something on the possible cooperation with Europe. This was nice opportunity.
Tonight, as the original plan, we all classmates in our graduate English course of last term came to our teacher's home.
She Zhiyong and me came to the flower shop to buy some beautiful flowers. And then we all get together in the hall of A building. When it was 4:30 pm, we, eight students, came to teacher's home.
Mrs. Zhang was waiting for us. She was in a fever and could not touch any cold things. So we all help to prepare the chaffy dish. Mrs. Zhang, Lou Xiutao, Luo Chenglin, Sun Jin and me compose the arrangment group for this dinner.
It was a so nice chance for us to prepare this dinner. We sheared the vegetables and preparing the chaffy dish. When we began to savour the dishes, there was a fragrant smelling in the house. We all fell happy.
Mrs. Zhan was so good to us. Thanks for her! She was one of my best teachers.
2004年12月11日
International intercommunion
Abid Khan had invited us to his room for dinner. This evening, when it was 6:00, he came to the lab and guided us. Dr.Tliu, Carl, Simply, Victor, Slchen and me went to his room.
There were three other classmates of Abid. They were cousins and classmates of same univiersity. And each weekend, they would have dinner. And this evening they invited us.
They had made good preparation for us. Following the tradition of their country, the gusts should have the dinner first. And then the hosts could have. We were very suit for it firstly. The dinner was very ample. Abid was good at cooking rice. Some other dishes were made by his cousions. There were salad, mutton, sweet, and so on. Having foreign style dishes, chating with them, we were enjoying ourselves.
After the dinner, we began to chat on many topics, including the ph.d. learning, Harbin lives, cooking, and so on.
During the three hours, we were all in pure English context. So nice experience.
There were three other classmates of Abid. They were cousins and classmates of same univiersity. And each weekend, they would have dinner. And this evening they invited us.
They had made good preparation for us. Following the tradition of their country, the gusts should have the dinner first. And then the hosts could have. We were very suit for it firstly. The dinner was very ample. Abid was good at cooking rice. Some other dishes were made by his cousions. There were salad, mutton, sweet, and so on. Having foreign style dishes, chating with them, we were enjoying ourselves.
After the dinner, we began to chat on many topics, including the ph.d. learning, Harbin lives, cooking, and so on.
During the three hours, we were all in pure English context. So nice experience.
2004年12月10日
Lee pond
This evening, some guys, Jiang Hongfei, Zhang Jian, Yang Yuhang, Zhong Bineng, WF and me, went to lee pond which located nearby our campus. I had not ever gone to that place. I had believed originally it might very expensive.
When we came there, the cost was 10 yuan per man with VIP card. We played chess, card, building block, football on table, and shuffleboard.
When we were play shuffleboard, we all enjoied ourselves.
So nice games.
It was a nice leisure place.
When we came there, the cost was 10 yuan per man with VIP card. We played chess, card, building block, football on table, and shuffleboard.
When we were play shuffleboard, we all enjoied ourselves.
So nice games.
It was a nice leisure place.
2004年12月9日
GMCM award
The first national graduate mathematical contest of modeling(GMCM, shorted by myself) had been hold during Sep.17th, 2004 to Sep.20th, 2004, four days.
Yesterday morning, I found the award news in our campus website. Oh. So excited that our group has awarded to the first level. This is a so wonderful news for me and our group. I noticed Liu Yu and Yu Qiyue, who were my parters. They were excited, too.
We were lucky and excited. So good news!
Yesterday morning, I found the award news in our campus website. Oh. So excited that our group has awarded to the first level. This is a so wonderful news for me and our group. I noticed Liu Yu and Yu Qiyue, who were my parters. They were excited, too.
We were lucky and excited. So good news!
2004年12月8日
Anaphora/coreference resolution research!
This morning, Dr.Tliu came back. He was in nice state and brought us Irish chocolate. We were all excited.
I discussed my research plan on anaphora/coreference resolution with Dr.Tliu. And the final result was that I should read more papers about anaphora/coreference resolution. I understood his suggestion. When I had read few papers, if I did some experiments I might do some repeated works of others. And in my original plan I would do some repeated work of others.
So, considering the situation, I should read papers for one or two months. And then do some research.
I discussed my research plan on anaphora/coreference resolution with Dr.Tliu. And the final result was that I should read more papers about anaphora/coreference resolution. I understood his suggestion. When I had read few papers, if I did some experiments I might do some repeated works of others. And in my original plan I would do some repeated work of others.
So, considering the situation, I should read papers for one or two months. And then do some research.
2004年12月7日
Abid Khan
It was Abid Khan's turn for the presentation of doctoral reading group. This evening, he was in nice apparel with a shirt and a brown tie.
He told lots about his former education, projects, and experience. His English was better than us. But some word like "T" and "D", I thought that he pronounced incorrectly.
After one and a half hours, the meeting was end. And we chated on some other subjects.
When we had no words, it was 22:40. It is too late.
After three hours communication in English, I found my speaking English had been improved.
Good!
He told lots about his former education, projects, and experience. His English was better than us. But some word like "T" and "D", I thought that he pronounced incorrectly.
After one and a half hours, the meeting was end. And we chated on some other subjects.
When we had no words, it was 22:40. It is too late.
After three hours communication in English, I found my speaking English had been improved.
Good!
2004年12月6日
Biblioscape
最近阅读了许多的文章,但是如何非常有效的管理文章并方便与论文写作却是一个非常让我困惑的问题。
原先选定了Biblioscape 6.1作为管理软件,但是后来发现在Word中导入参考文献时中文的显示存在严重的问题,也就是应该显示中文的地方都显示成了乱码。后来听说Endnote8.0对中文支持的非常好,不会出现Bibioscape在Word中导入出现乱码的问题。但是真正使用Endnote8.0后发现,远不如Biblioscape 6.1对文章的管理方便和有效。
这样摆在我面前的选择就有两个:Biblioscape或者Endnote。因为一旦选择好软件后就需要付出许多的劳动来整理自己阅读到的论文。Biblioscape非常方便,在自己的亲身实践中自己感到非常方便,Endnote在网上一查使用率也非常高。
为了彻底解决这个问题,我花去了一整添的时间来对比各种文献管理软件
主要对比了Reference Manager、Endnote、Biblioscape等等。
找到的最为权威的网页是
Evaluation of Reference Management Software on NT (comparing Papyrus with ProCite, Reference Manager, Endnote, Citation, GetARef, Biblioscape, Library Master, Bibliographica, Scribe, Refs)
http://eis.bris.ac.uk/~ccmjs/rmeval99.htm
但是里面没有介绍对中文的支持情况。在网上查找了很多介绍性的文章,到现在为止堆中文支持最好的软件就是Biblioscape
参考Biblioscape网站上对于中文支持问题的解决的网页,我很顺利的解决了以前遇到的那个中文在Word中的现实问题。
正常在Word中显示的时候真是异常高兴,这样一来以后就可以非常省心的使用Biblioscape了。
结论:对于可能出现大量的中文论文的研究者(例如我们)最好的选择就是Biblioscape。希望本文能够节省您选择类似软件时花费的时间。
原先选定了Biblioscape 6.1作为管理软件,但是后来发现在Word中导入参考文献时中文的显示存在严重的问题,也就是应该显示中文的地方都显示成了乱码。后来听说Endnote8.0对中文支持的非常好,不会出现Bibioscape在Word中导入出现乱码的问题。但是真正使用Endnote8.0后发现,远不如Biblioscape 6.1对文章的管理方便和有效。
这样摆在我面前的选择就有两个:Biblioscape或者Endnote。因为一旦选择好软件后就需要付出许多的劳动来整理自己阅读到的论文。Biblioscape非常方便,在自己的亲身实践中自己感到非常方便,Endnote在网上一查使用率也非常高。
为了彻底解决这个问题,我花去了一整添的时间来对比各种文献管理软件
主要对比了Reference Manager、Endnote、Biblioscape等等。
找到的最为权威的网页是
Evaluation of Reference Management Software on NT (comparing Papyrus with ProCite, Reference Manager, Endnote, Citation, GetARef, Biblioscape, Library Master, Bibliographica, Scribe, Refs)
http://eis.bris.ac.uk/~ccmjs/rmeval99.htm
但是里面没有介绍对中文的支持情况。在网上查找了很多介绍性的文章,到现在为止堆中文支持最好的软件就是Biblioscape
参考Biblioscape网站上对于中文支持问题的解决的网页,我很顺利的解决了以前遇到的那个中文在Word中的现实问题。
正常在Word中显示的时候真是异常高兴,这样一来以后就可以非常省心的使用Biblioscape了。
结论:对于可能出现大量的中文论文的研究者(例如我们)最好的选择就是Biblioscape。希望本文能够节省您选择类似软件时花费的时间。
2004年12月5日
2004年12月4日
2004年12月2日
复习模式识别
模式识别考试将至,同学们纷纷忙着学习和复习。我也不例外。看了书,明白基本原理,但是有人说模式识别作业不用学模式识别也能作,根本没有模式识别的影子。这个观点我同意一半。
模式识别的基本思想是非常简单的,但是要向深刻的理解这些简单的思想却需要很多的数学知识和数学推导。这就是这门课学期来像数学课的原因。模式识别的课程,书很好,是国外经典教材,书中明确指明的最精华的一章确没有在教学中体现出来。书上很有特色的一系列上机题却没有被纳入教学计划。难怪有些同学抱怨模式识别很难学。
确实,如果没有一些实践经验模式识别中的那些算法是不容易理解的。现在虽然复习了一遍左右,但是书中的精华感觉自己还是没有很好的领悟。打算考试结束后每天抽出一点时间来细细体会,并完成那些非常有意义的上机题。
模式识别的基本思想是非常简单的,但是要向深刻的理解这些简单的思想却需要很多的数学知识和数学推导。这就是这门课学期来像数学课的原因。模式识别的课程,书很好,是国外经典教材,书中明确指明的最精华的一章确没有在教学中体现出来。书上很有特色的一系列上机题却没有被纳入教学计划。难怪有些同学抱怨模式识别很难学。
确实,如果没有一些实践经验模式识别中的那些算法是不容易理解的。现在虽然复习了一遍左右,但是书中的精华感觉自己还是没有很好的领悟。打算考试结束后每天抽出一点时间来细细体会,并完成那些非常有意义的上机题。
2004年12月1日
NLP exam
This evening we all take the Natural Language Processing exam. The time was from 6:00pm to 8:00.
There were eight subjects. They all were the key points of the course. Dr.Guan Yi had told us many many times about these contents. After detailed review, I solved them one by one quickly. And finally I tanded up my examination paper after 70 minutes.
So recently, there was only the pattern classification exam to me. It will be taken after three days. I must prepare it detailed.
There were eight subjects. They all were the key points of the course. Dr.Guan Yi had told us many many times about these contents. After detailed review, I solved them one by one quickly. And finally I tanded up my examination paper after 70 minutes.
So recently, there was only the pattern classification exam to me. It will be taken after three days. I must prepare it detailed.
2004年11月30日
2004年11月29日
2004年11月28日
GA for anaphora resolution(2)
After some search in internet, I found a new paper using GA for Chinese Pronoun resolution. I read this paper quickly.
The main idea is same as my original. So I understood a rule that if you have some new ideas, you should realize and publish as soon as possible.
The main idea is same as my original. So I understood a rule that if you have some new ideas, you should realize and publish as soon as possible.
2004年11月27日
Make new acquaintances
This afternoon, when I searched some information about anaohora resolution. I found some information a Ph.D. student of Beijing Normal University. He focused on the centering theory, and had published a paper on zero anaphora resolution on centering theory. He was in the second year of his Ph.D. process. His school request two core magazines' paper for graduation of Ph.D.
We talked lots about the research on anaphora resolution. He was familiar with the linguistical knowledge. We agreed on that we could cooperation on this point.
Good idea. Good news.
We talked lots about the research on anaphora resolution. He was familiar with the linguistical knowledge. We agreed on that we could cooperation on this point.
Good idea. Good news.
2004年11月26日
The introduction about anaphora/coreference resolution
Dr.Tliu had requested us to make full introduction about each research filed. I was charged with the introduction of our information extraction and summarization group. As the sub-areas were three. I divided it into three parts: multi-documents summarization, single document summarization, and anaphora/coreference resolution. I made the introduction about anaphora/coreference resolution, and hand the other two to other persons who were familiar with them.
Under the usual framework of introduction to research area, I finished it in three hours. During the three hours, I found lots of materials. And then I became more familiar with this area.
Under the usual framework of introduction to research area, I finished it in three hours. During the three hours, I found lots of materials. And then I became more familiar with this area.
2004年11月25日
Research plan of anaphora/coreference resolution
These days, one of tasks was to submit a research plan of anaphora/coreference resolution.
As in my original idea, I divided the research into two main parts: intrasentential and intersentential. In another way, I divided it into four fractions: combination the syntax information for resolution, combination the knowledge bases for resolution, features set generation and optimize, machine learning algorithms comparison and selection.
Right now I knew the difficulty about making research plan.
As in my original idea, I divided the research into two main parts: intrasentential and intersentential. In another way, I divided it into four fractions: combination the syntax information for resolution, combination the knowledge bases for resolution, features set generation and optimize, machine learning algorithms comparison and selection.
Right now I knew the difficulty about making research plan.
2004年11月24日
2004年11月23日
So interestig the plays.
This evening, with WF, I went to watch the Graduate English plays contest in A building of our campus.
I meet our best English teacher Mrs.Zhang, our classmates Wang Shuting, Jiang Haiyan and so on. Some of the plays were very very wonderful. I was admiring them. So many groups played ‘pilgrimage to the west’(西游记). One group played some snippets of The Matrix Reloaded. There were some very intesesting scenes of tussle. I liked it very much.
So interesting.
I meet our best English teacher Mrs.Zhang, our classmates Wang Shuting, Jiang Haiyan and so on. Some of the plays were very very wonderful. I was admiring them. So many groups played ‘pilgrimage to the west’(西游记). One group played some snippets of The Matrix Reloaded. There were some very intesesting scenes of tussle. I liked it very much.
So interesting.
2004年11月22日
So busy day!
This moring I was noticed to make the presentation on the CS Clubs inaugural meeting. I prepared for it in this whole morning.
As be noticed to prepare for the introduction of ACE EDR Co-reference.
This evening I took part in the CS Clubs inaugural meeting. And after this meeting, from 9:30pm, I took part in the weekly IRClub meeting.
So busy a day!
As be noticed to prepare for the introduction of ACE EDR Co-reference.
This evening I took part in the CS Clubs inaugural meeting. And after this meeting, from 9:30pm, I took part in the weekly IRClub meeting.
So busy a day!
2004年11月21日
Perl 5 编程详解
今天的任务动态列表中很值得完成的任务是继续学习Perl编程。 一致都知道Perl非常适合于处理文本,页非常适合于自然处理研究工作中涉及到的一些编程任务,但是迟迟没有展开对它的详细学习。今天开始仔细学习Perl。
翻开yhb师兄借给我的《Perl 5 编程详解》,很快就被其强大的功能所吸引,尤其实在正则表达式方面简直非常的完美。仔细查看Perl实现正则表达式的算法方法时发现它采用的模糊匹配的方法居然非常的高效。正则表达式是Perl的生命力所在,俺务必掌握之。
在原先学习完基本程序语言要素的基础上,今天我主要是学习了Perl的正则表达式一章,打算采用Perl来完成我最新需要处理的一批语料。
编程最快的学习方法就是不断的读别人的代码和自己不断的实践练习。相信我会很快掌握Perl的基本功能的。
翻开yhb师兄借给我的《Perl 5 编程详解》,很快就被其强大的功能所吸引,尤其实在正则表达式方面简直非常的完美。仔细查看Perl实现正则表达式的算法方法时发现它采用的模糊匹配的方法居然非常的高效。正则表达式是Perl的生命力所在,俺务必掌握之。
在原先学习完基本程序语言要素的基础上,今天我主要是学习了Perl的正则表达式一章,打算采用Perl来完成我最新需要处理的一批语料。
编程最快的学习方法就是不断的读别人的代码和自己不断的实践练习。相信我会很快掌握Perl的基本功能的。
2004年11月20日
战胜Matlab必做练习50题
查看今天的动态任务表,发现最适合完成学习《战胜Matlab必做练习50题》的学习。
打开这本书,翻看了一下它的目录和主要内容。发现本书以单元练习的形式,从MAATLAB最基本的问题入手,循序渐进,逐渐过渡到较为复杂的数学问题、信号分析问题、力学问题和电学问题的求解,将MATLAB的学习贯穿在解决不同领域实际问题的过程当中。每一个练习都结合问题,介绍与之相关的MATLAB使用知识,全书50个练习基本上涵盖了MATLAB的主要功能。
书中的内容各个部分介绍都比较简单,和其他的书籍比起来只是形式变了一下。但是从中我学习到了一些以前不太熟悉的指令和方法,如:求伪逆矩阵的pinv(包含inv);矩阵除法解线性方程组的速度比求完逆矩阵后再相乘的方法快;subspace可以求解两个同长向量张成的子空间的夹角;roots求根法先把多项式转换为伴随矩阵,再求特征值,可靠性和精度都高于经典方法,而且比数值解法更好的是可以求解出复数解;end可以表示向量的最后一个元素;plot(p)在p为实矩阵时绘出每列元素与其序列号的对应关系,这样可以很方便的绘制对比曲线。
现在领悟了一点“书读百遍其义自现”的含义。
打开这本书,翻看了一下它的目录和主要内容。发现本书以单元练习的形式,从MAATLAB最基本的问题入手,循序渐进,逐渐过渡到较为复杂的数学问题、信号分析问题、力学问题和电学问题的求解,将MATLAB的学习贯穿在解决不同领域实际问题的过程当中。每一个练习都结合问题,介绍与之相关的MATLAB使用知识,全书50个练习基本上涵盖了MATLAB的主要功能。
书中的内容各个部分介绍都比较简单,和其他的书籍比起来只是形式变了一下。但是从中我学习到了一些以前不太熟悉的指令和方法,如:求伪逆矩阵的pinv(包含inv);矩阵除法解线性方程组的速度比求完逆矩阵后再相乘的方法快;subspace可以求解两个同长向量张成的子空间的夹角;roots求根法先把多项式转换为伴随矩阵,再求特征值,可靠性和精度都高于经典方法,而且比数值解法更好的是可以求解出复数解;end可以表示向量的最后一个元素;plot(p)在p为实矩阵时绘出每列元素与其序列号的对应关系,这样可以很方便的绘制对比曲线。
现在领悟了一点“书读百遍其义自现”的含义。
2004年11月19日
两件感触
感触可以用“件”来作量词吗?或许可以,这里试试^-^
感触之一
上午8:00开始上《机器翻译》课,今天这堂课是机器翻译实验室的杨沐昀老师主讲知识工程。课程的内容也就是差不多的知识工程中的那些知识,但是讲课之余的一些“闲话”倒是让我感觉很是受益。我的感受如下:
1、作自然语言处理的某项研究,一定要精选一些论文阅读,现在能够找到的论文中很多都是一些层次很低的论文(可称之为“垃圾”),如果找到了一定要鉴定一下,否则可能会浪费时间,最好的方式是阅读那些大牛的文章以及他们的文章中提到的那些文章。
2、研究不要自我评测,因为自己评价自己总会是好的结论,应该参加一些公共的评测。现在自然语言处理的几乎各个子领域都有相关的评测,开始实际动手左作一些工作的时候一定要调查好相关的评测机构和评测组织以及可能的竞争对手。关注并参与这些评测对于研究工作是非常好的。
3、在从事某项研究的过程中一定要注意撰写出一些非常高质量的代码,这样便于维护和转交给别人。看来我起初的ACM计划还需要继续进行下去。
感触之二
下午两点半金山公司的雷军总裁及金山词霸、金山毒霸技术总监和我们实验室的成员进行了一次短暂的座谈。座谈主要的内容是实验室各位成员自我介绍以及雷先生的讲话。
从雷先生那里了解到CMU的博士一般要五到8年才能毕业,刚开始读博之前需要编写十万行的代码,所以CMU毕业的每位博士生编程方面都非常厉害。从事研究工作需要非常良好的编程功底,否则不可能编写出有用的代码,甚至不能进行有效的研究。原先我一直在编程方面吓得功夫不是很多,现在看来我前几日开启的ACM计划需要认认真真、彻彻底底的进行下去。
和其他的大的软件公司一样,金山公司讲求软件框架设计的可维护性、代码的可维护性、文档的完整性。这些东西虽然在软件工程中学过,但是在自己的研究工作中体会的还不是很深。还需要锻炼一下自己的这方面的能力。
两个感触的小结
研究需要写代码,而且需要非常高效、准确的写,需要软件工程标准来要求和进行质量的控制。
既然知道了这些,那就好好实现吧 ^-^
感触之一
上午8:00开始上《机器翻译》课,今天这堂课是机器翻译实验室的杨沐昀老师主讲知识工程。课程的内容也就是差不多的知识工程中的那些知识,但是讲课之余的一些“闲话”倒是让我感觉很是受益。我的感受如下:
1、作自然语言处理的某项研究,一定要精选一些论文阅读,现在能够找到的论文中很多都是一些层次很低的论文(可称之为“垃圾”),如果找到了一定要鉴定一下,否则可能会浪费时间,最好的方式是阅读那些大牛的文章以及他们的文章中提到的那些文章。
2、研究不要自我评测,因为自己评价自己总会是好的结论,应该参加一些公共的评测。现在自然语言处理的几乎各个子领域都有相关的评测,开始实际动手左作一些工作的时候一定要调查好相关的评测机构和评测组织以及可能的竞争对手。关注并参与这些评测对于研究工作是非常好的。
3、在从事某项研究的过程中一定要注意撰写出一些非常高质量的代码,这样便于维护和转交给别人。看来我起初的ACM计划还需要继续进行下去。
感触之二
下午两点半金山公司的雷军总裁及金山词霸、金山毒霸技术总监和我们实验室的成员进行了一次短暂的座谈。座谈主要的内容是实验室各位成员自我介绍以及雷先生的讲话。
从雷先生那里了解到CMU的博士一般要五到8年才能毕业,刚开始读博之前需要编写十万行的代码,所以CMU毕业的每位博士生编程方面都非常厉害。从事研究工作需要非常良好的编程功底,否则不可能编写出有用的代码,甚至不能进行有效的研究。原先我一直在编程方面吓得功夫不是很多,现在看来我前几日开启的ACM计划需要认认真真、彻彻底底的进行下去。
和其他的大的软件公司一样,金山公司讲求软件框架设计的可维护性、代码的可维护性、文档的完整性。这些东西虽然在软件工程中学过,但是在自己的研究工作中体会的还不是很深。还需要锻炼一下自己的这方面的能力。
两个感触的小结
研究需要写代码,而且需要非常高效、准确的写,需要软件工程标准来要求和进行质量的控制。
既然知道了这些,那就好好实现吧 ^-^
2004年11月18日
2004年11月17日
Sciense and technology philosophy exam
This evening, we took the Sciense and Technology Philosophy exam.
Frackly speaking, this exam was long for us. As I known, all of us had been wirting for nearly two hours. When I handed in my exam paper, I fell little tired of my right hand.
We must begin to prepare for the Combinatorics of next day.
Frackly speaking, this exam was long for us. As I known, all of us had been wirting for nearly two hours. When I handed in my exam paper, I fell little tired of my right hand.
We must begin to prepare for the Combinatorics of next day.
2004年11月16日
Some nice books
When I was studying in the campus library, I came to the second floor to view the books. I found some nice books and had borrowed them. Ther were Program Generators with XML and JAVA, Introduction to Management Science: A Modeling and Case Studies Approach with Spreadsheets, and The Bible of Visio 2000.
I planed to read them after the two exams.
I planed to read them after the two exams.
2004年11月15日
IRClub weekly meeting
This evening we hold on the weekly meeting of IRClub. As in the exam season, many menbers were not absent.
All the eight members who came here introduced their workings and progress. Some good news was that the mp3 and pdf information extract modules were finished quickly. The sub group of research had chosen their supervisors and will begin their research.
We decided to choose some members as the main speakers of next meeting.
All the eight members who came here introduced their workings and progress. Some good news was that the mp3 and pdf information extract modules were finished quickly. The sub group of research had chosen their supervisors and will begin their research.
We decided to choose some members as the main speakers of next meeting.
2004年11月14日
Paper discussion
This afternoon, I invited Dr.Tliu, Miss Qin, Mr. Lu, and Carl to discuss the framework of my paper. It was about the summarization evaluation.
The discussion result was that I should add some experiment results about the comparization between anto-summarization and human-summarization.
This was a wonderful idea for supporting my idea and result in the paper. After the two recent exams I could finish it.
The discussion result was that I should add some experiment results about the comparization between anto-summarization and human-summarization.
This was a wonderful idea for supporting my idea and result in the paper. After the two recent exams I could finish it.
2004年11月13日
ACM.hit.edu.cn
This is a wonderful website!
It was set up by paws, xiaoyin and xiong before one year. I knew it when it began. But I tried it from this evening.
After so long time without practising programming, my program ability was very poor(I thought so.) The first problem was very easy. I passed it very fast. But the second problem spent me about two hours. As there were so many restrict of c language that I forgot.
Finally, my program passed the limit.
So wonderful and useful website. I would like to solve a problem per day.
Thanks to Paws for his recommendation.
It was set up by paws, xiaoyin and xiong before one year. I knew it when it began. But I tried it from this evening.
After so long time without practising programming, my program ability was very poor(I thought so.) The first problem was very easy. I passed it very fast. But the second problem spent me about two hours. As there were so many restrict of c language that I forgot.
Finally, my program passed the limit.
So wonderful and useful website. I would like to solve a problem per day.
Thanks to Paws for his recommendation.
2004年11月12日
2004年11月11日
GA for anaphora resolution
I was looking for the related materials about GA for anaphora resolution.
2004年11月10日
2004年11月9日
2004年11月7日
2004年11月6日
2004年11月5日
The first snow of this winter
This morning, when I got up I found there was a thin snow on the ground. This was the first snow of this winter. I was little excited.
Recollecting the days of the first snow in my freshman period, all the members of our class were excited at that time. Many of us played snow war on the playground. We were happy.
It is a collection of memorable pieces about the undergraduate campus life that was going down in those days.
Recollecting the days of the first snow in my freshman period, all the members of our class were excited at that time. Many of us played snow war on the playground. We were happy.
It is a collection of memorable pieces about the undergraduate campus life that was going down in those days.
2004年11月4日
Studying for the Pattern Classification
Right now all of us were busy with the homework of the pattern classification. We were all puzzled by some of the themes.
Just now I fell that we were learning mathematics instead of Pattern classification. The reason, as I guessed, was that our teacher introduced the mathematical aspect instead of the application one.
Chapter nine was the core fraction of this book. I liked it.
Just now I fell that we were learning mathematics instead of Pattern classification. The reason, as I guessed, was that our teacher introduced the mathematical aspect instead of the application one.
Chapter nine was the core fraction of this book. I liked it.
2004年11月3日
Abid Khan
Abid Khan is a foreign student of our lab. He had come to our lab twice. But each time I was not in lab.
This evening he came here again. Under the arrangement of Dr. Tliu, he used the computer of CYH. He was polite. When he came into the room, he shook hands with all the persons. I introduced myself to him and ask his how to spell his name. His English was fluent and better than mine.
I made a conversation with him. I introduced the library sources to him. He asked me what should be mastered for his PH.D. learninig. I introduced something to him.
When I talked with him, I found I forgot some glossaries. I should improve my English.
This evening he came here again. Under the arrangement of Dr. Tliu, he used the computer of CYH. He was polite. When he came into the room, he shook hands with all the persons. I introduced myself to him and ask his how to spell his name. His English was fluent and better than mine.
I made a conversation with him. I introduced the library sources to him. He asked me what should be mastered for his PH.D. learninig. I introduced something to him.
When I talked with him, I found I forgot some glossaries. I should improve my English.
2004年11月2日
Doctoral English Reading activity
This evening, it was Ph.D. Mjs's turn to give his presentation in English.
His topic was Learning Random Walks for Inducing Word Dependency Distributions.
The main idea of this paper was constructing a rule links from expert experience, wordnet and other resource. On Markov chain method there were some ramdom walks of the links graph. After some search, the probabilities could be obtained. This method could solve the data sparseness problem.
So wonderful idea. This random idea was little similar with the random forest. Based on the random idea, we could do lots of things.
His topic was Learning Random Walks for Inducing Word Dependency Distributions.
The main idea of this paper was constructing a rule links from expert experience, wordnet and other resource. On Markov chain method there were some ramdom walks of the links graph. After some search, the probabilities could be obtained. This method could solve the data sparseness problem.
So wonderful idea. This random idea was little similar with the random forest. Based on the random idea, we could do lots of things.
2004年11月1日
IRClub activity
This evening, all the members got together. The main topic was distributing the tasks to the members. The main tasks included two aspects: research group and development group.
All the members were excited. But after the meeting, a member sent a mail to me. He decided to leave out our lab. He said as follow:
I must say thank you for informing me of attending the meeting.and thank the club for teaching me a lot.
however i really don't think i am quite ready for the club,also there is some other reason of myself,so i hope you will permit me of farewelling with you now.and i am also very sorry for having taking you a lot of trouble,sincerely sorry.
maybe when i am better prepared,i can learn more from you and our club and lab of IR.
again, thank you and sorry.
I replyed him as follows:
I am very glad that you have known yourself more. We are welcoming you paying attention to our lab and club. Also, we welcome you take part in us when you prepare well.
May you good achievements!
Little sad for him and for our club.
All the members were excited. But after the meeting, a member sent a mail to me. He decided to leave out our lab. He said as follow:
I must say thank you for informing me of attending the meeting.and thank the club for teaching me a lot.
however i really don't think i am quite ready for the club,also there is some other reason of myself,so i hope you will permit me of farewelling with you now.and i am also very sorry for having taking you a lot of trouble,sincerely sorry.
maybe when i am better prepared,i can learn more from you and our club and lab of IR.
again, thank you and sorry.
I replyed him as follows:
I am very glad that you have known yourself more. We are welcoming you paying attention to our lab and club. Also, we welcome you take part in us when you prepare well.
May you good achievements!
Little sad for him and for our club.
2004年10月31日
Rreply to the robot attack of my blog
Nowadays, I discovered that many many bad comments appeared in my blog. As I had not closed the option of allowing comments, lots of robots put some bad comments. I believed that there were some SOE(搜索引擎优化公司)s use this trick to advance the rank of some hyperlink's rank in Google.
Formerly, I deleted the bad comments one by one. These days I dealed with it as it deals with my blog. I used a robot named as ROBOT5 to delete the comments one by one automatically and without any human interface. By this robot I could record some operations as a macro and run this macro for any times.
This robot software was useful for lots of mechanical operation. Wonderful!
Formerly, I deleted the bad comments one by one. These days I dealed with it as it deals with my blog. I used a robot named as ROBOT5 to delete the comments one by one automatically and without any human interface. By this robot I could record some operations as a macro and run this macro for any times.
This robot software was useful for lots of mechanical operation. Wonderful!
2004年10月30日
Pattern Classification homework
This evening, I was studying in a classroom in D building. My Pattern Classification homework had not been finished. After studied the chaper four I began to finish the five subjects. But the first was nearly a pure mathematical one. I thought it for an hour without any solution. The other subjects were easy to be solved. At some extent, I believed that mathematical was very important for the computer application subjects.
2004年10月29日
自习一日
今日计划复习一日。来到二区图书馆,试图在新鲜的环境下学习。开始上自习一会儿就困了,下午的自习也是一挥而就困了。整个一天睡去了很多时间。
最近的几次上自习我都会出现一会儿就很困,睡上一觉后才能继续学习的状况。以前还以为是自己的睡眠时间不够,但是我只要在计算面前就不会出现这种情况。现在仔细想来,那是因为计算机的辐射作用可以让我处在一种兴奋的状态。
原先看别人的IT人养生之道中说一个人每天面对计算机的时间不能超过6小时,否则会对身体造成伤害,也不是是对是错,但是从自己的这种状态来看,我需要控制在计算机面前的时间。
这一点或许就是今天最大的收获。
最近的几次上自习我都会出现一会儿就很困,睡上一觉后才能继续学习的状况。以前还以为是自己的睡眠时间不够,但是我只要在计算面前就不会出现这种情况。现在仔细想来,那是因为计算机的辐射作用可以让我处在一种兴奋的状态。
原先看别人的IT人养生之道中说一个人每天面对计算机的时间不能超过6小时,否则会对身体造成伤害,也不是是对是错,但是从自己的这种状态来看,我需要控制在计算机面前的时间。
这一点或许就是今天最大的收获。
2004年10月28日
2004年10月27日
Google desktop search
今天收到最新一期《计算机世界》。像往常一样,我迅速的浏览了全部内容。其中有两篇文章谈到了Google Desktop Search。一篇说Google Desktop Search很简洁高效,整个软件也非常小,神奇的查找资料的方式和Google的网页搜索融为一体。总之就是非常赞赏。
另一篇文章谈到一个使用过这个软件的人都会想到的问题,那就是它的安全性。Google Desktop Search可以非常方便的帮助用户查找机器上的资料,同时这种功能也具有最完美的间谍功能,因为你的邮件、聊天记录、个人office文档都尽在它的眼中。如果别人一旦侵入你的机器,你的个人资料和个人隐私将100%的被盗走。
一个新事物有好有坏,让事实来验证它吧。
另一篇文章谈到一个使用过这个软件的人都会想到的问题,那就是它的安全性。Google Desktop Search可以非常方便的帮助用户查找机器上的资料,同时这种功能也具有最完美的间谍功能,因为你的邮件、聊天记录、个人office文档都尽在它的眼中。如果别人一旦侵入你的机器,你的个人资料和个人隐私将100%的被盗走。
一个新事物有好有坏,让事实来验证它吧。
2004年10月26日
宁静以志远
记得刚进实验室不久由于太忙,心情曾经低落过。当时Dr.Tliu给我的指点是:一个人如果能够安排好各项任务,在繁忙中仍然保持内心的平静,他将来一定大有作为。
前些日子,我也处于这种非常忙碌的状态中。俱乐部的招新面试、成立大会,论文框架需要重新修改,课程考试压力的增大,等等事情全压下来。从几日来自己的心态来看,我比以前成熟了许多。面对这种情况,心情不像以往那样容易焦躁不安。对每一件事我都认认真真的去对待。其实许多事情压在身上的时候需要的仅仅是一种形态,一种宁静以志远的心态。同样的事情,无论什么心态都需要去完成,去面对,与其焦躁不安不如心平气和的去面对,心态好了,心情就好,一切自然就顺畅了。这些忙忙碌碌的事情使得我忙碌之余感到充实。
“宁静以致远”,七年前在表哥家的墙上看到,但是当时体会不到这些。希望这种心态继续保持下去。
前些日子,我也处于这种非常忙碌的状态中。俱乐部的招新面试、成立大会,论文框架需要重新修改,课程考试压力的增大,等等事情全压下来。从几日来自己的心态来看,我比以前成熟了许多。面对这种情况,心情不像以往那样容易焦躁不安。对每一件事我都认认真真的去对待。其实许多事情压在身上的时候需要的仅仅是一种形态,一种宁静以志远的心态。同样的事情,无论什么心态都需要去完成,去面对,与其焦躁不安不如心平气和的去面对,心态好了,心情就好,一切自然就顺畅了。这些忙忙碌碌的事情使得我忙碌之余感到充实。
“宁静以致远”,七年前在表哥家的墙上看到,但是当时体会不到这些。希望这种心态继续保持下去。
2004年10月25日
The students club plan
Nowadays, our college leader intend to go ahead with the students club plan. The clubs are founded based on some research centers and labs, under the lead of some Ph.D. and master students. The aim was to practise the scientific research and development ability. The basic idea was to let the sophomores and juniors join in the students club earlier and know more about each research centers and labs.
The result of our IRClub interview was published this noon. I phoned the students who passed the interview one by one. They were all excited. This afternoon, there was a junior who had not passed our interview came here to ask us give a chance to him. As he cherished this chance very much. Carl and me were moved by his spirit and let him take part in the meeting of tomorrow.
This kind of chance was very good for each undergraduate. When I was a understudent, we had not any chance to join in the research center or lab. I admire them.
The result of our IRClub interview was published this noon. I phoned the students who passed the interview one by one. They were all excited. This afternoon, there was a junior who had not passed our interview came here to ask us give a chance to him. As he cherished this chance very much. Carl and me were moved by his spirit and let him take part in the meeting of tomorrow.
This kind of chance was very good for each undergraduate. When I was a understudent, we had not any chance to join in the research center or lab. I admire them.
2004年10月24日
IRClub Interview(2)
Our Information Retrieval Club interviewed about twenty-seven sophomores this evening. Based on the experience last evening, we interviewed them more standard.
Faced to the sophomores I could feel clearly that their experience and thoughts not abundant and profound than the juniors. Thinking back the my sophomore and junior years, I was like them. At this momnent I understood more about the effect of university. We must cherish more the campus life.
The process was three and a half hours like yeaterday. We tired also. But I fell better.
Faced to the sophomores I could feel clearly that their experience and thoughts not abundant and profound than the juniors. Thinking back the my sophomore and junior years, I was like them. At this momnent I understood more about the effect of university. We must cherish more the campus life.
The process was three and a half hours like yeaterday. We tired also. But I fell better.
2004年10月23日
IRClub Interview(1)
Our original plan was to arrange the IRClub interview at tomorrow evening. But at 18:10, some juniors came to our lab said that they were noticed to be interview this evening. Dr.Tliu said that we should interview them under this case.
Carl, zsq and me, based on the interview excel table that I designed this noon, combined a three interviewers group quickly. Our rule was three students as a group. Any student should be interviewed by three times by us during 15 minutes. We asked each interviewee some questions and made a score. Finally we unified our opinions to each student.
This process was strict to and responsible for each interviewee. There were twenty-seven juniors who were interviewed by us.
We were all tired after the four hours. But this was a nice experience. This was my first chance as an interviewer.
Carl, zsq and me, based on the interview excel table that I designed this noon, combined a three interviewers group quickly. Our rule was three students as a group. Any student should be interviewed by three times by us during 15 minutes. We asked each interviewee some questions and made a score. Finally we unified our opinions to each student.
This process was strict to and responsible for each interviewee. There were twenty-seven juniors who were interviewed by us.
We were all tired after the four hours. But this was a nice experience. This was my first chance as an interviewer.
2004年10月22日
Begin to study VC++
This afternoon, I began to study VC++. This time I had made up my mind to study VC++ and never study VB.
I studied MFC firstly based on some experience on a simple calculator. I had finished a "Hello World!" program. This was a simple but useful program for me to understand the mechanism of VC++.
Continue this process!
I studied MFC firstly based on some experience on a simple calculator. I had finished a "Hello World!" program. This was a simple but useful program for me to understand the mechanism of VC++.
Continue this process!
2004年10月21日
Reading the new paper about Anaphora Resolution
There is a new paper about Chinese Anaphora Resolution that is On Anaphora Resolution within Chinese Text. The author is Wang Houfeng, who is a expert in Chinese Anaphora Resolution.
In his paper, he mentioned some issues on anaphora resolution within Chinese text and analyzes the difficulties to solve these issues in the current state of art. Three aspects of anaphora resolution are discussed: (1) It is difficult to identify some Chinese anaphors such as zero forms and common noun ones; (2) there are a lot of difficulties to recognize potential antecedents and their features like gender, number, and grammatical role etc.; (3) There is a lack of both necessary technology of NLP and Language resource.
I learned some new technique for anaphora about the syntax. That is C-command condition.
In his paper, he mentioned some issues on anaphora resolution within Chinese text and analyzes the difficulties to solve these issues in the current state of art. Three aspects of anaphora resolution are discussed: (1) It is difficult to identify some Chinese anaphors such as zero forms and common noun ones; (2) there are a lot of difficulties to recognize potential antecedents and their features like gender, number, and grammatical role etc.; (3) There is a lack of both necessary technology of NLP and Language resource.
I learned some new technique for anaphora about the syntax. That is C-command condition.
2004年10月20日
Time Management
How to manage your time when you are busy with more things than your consideration or burden? This is a big problem of my study and life.
During the period of time, I had spent lots of time on the lab's tasks. When I had some spare time I concentrated on them also. So there were some chapters of Combinatorics and Pattern Classification I had not read any more. And some homework of them I had not finished.
This evening I thought more about my recent life and study. I found that I should manage my time more reasonable. And I listed a simple time management as follow:
6:30 Get up, do moring exercise and have breakfast.
8:00~11:30 Finish the task of lab or read some materials about the research theme.
11:30~1:30 Have lunch and take a nap.
2:00~5:30 Practise the programming techniques or finish the program task of lab.
6:30~10:00 Study the course of graduate.
10:00~10:30 Write my diary and make the detail plan for the next day.
11:00 Go to sleep.
This plan is flexible for my study and life. There is a celebrated remark that you must devote all your energies to your work and study when you are working or studying, and spend all your energies to enjoy your self in your spare time. This rule is adopted to my needs.
New management, new life! I wish so.
During the period of time, I had spent lots of time on the lab's tasks. When I had some spare time I concentrated on them also. So there were some chapters of Combinatorics and Pattern Classification I had not read any more. And some homework of them I had not finished.
This evening I thought more about my recent life and study. I found that I should manage my time more reasonable. And I listed a simple time management as follow:
6:30 Get up, do moring exercise and have breakfast.
8:00~11:30 Finish the task of lab or read some materials about the research theme.
11:30~1:30 Have lunch and take a nap.
2:00~5:30 Practise the programming techniques or finish the program task of lab.
6:30~10:00 Study the course of graduate.
10:00~10:30 Write my diary and make the detail plan for the next day.
11:00 Go to sleep.
This plan is flexible for my study and life. There is a celebrated remark that you must devote all your energies to your work and study when you are working or studying, and spend all your energies to enjoy your self in your spare time. This rule is adopted to my needs.
New management, new life! I wish so.
2004年10月19日
Face to the visit.
This afternoon, I received a task that introducing our laboratory to some foreigners in English. When it was 3:10 this afternoon, the guests who were a couple, came to our laboratory. After some introduction by our associate dean, I began to introduce our laboratory to them. During the speech of our associate dean, the woman smiled with me. I felt her kindness.
Frankly speaking, before their visit I had prepared the English introduction for one hour. Firstly, I talked some about the research areas, IF, IE and NLP. Then I made some demo to them. At the beginning, I fell some nervous. About two minutes later, the couple and me sat down to chat. The man was very interested in the natural text understanding technology. When he looked the demo of Chinese sentence dependency parser, he asked some questions about the Chinese character word segmentation and said that was very different from English words. When I demoed the summarization system, he told some thing about his works about reading lots of information. Finally, he was interested in the Chinese character recognition system. He wrote a character that was a old symbol in Chinese. So this system could not recognize it. He was very interested in this character and said some history about this symbol.
Frankly speaking, I had not understood some sentences of them. But I could feel that their English was perfect. This was a nice chance for me practicing my speaking English.
Nice experience!
Frankly speaking, before their visit I had prepared the English introduction for one hour. Firstly, I talked some about the research areas, IF, IE and NLP. Then I made some demo to them. At the beginning, I fell some nervous. About two minutes later, the couple and me sat down to chat. The man was very interested in the natural text understanding technology. When he looked the demo of Chinese sentence dependency parser, he asked some questions about the Chinese character word segmentation and said that was very different from English words. When I demoed the summarization system, he told some thing about his works about reading lots of information. Finally, he was interested in the Chinese character recognition system. He wrote a character that was a old symbol in Chinese. So this system could not recognize it. He was very interested in this character and said some history about this symbol.
Frankly speaking, I had not understood some sentences of them. But I could feel that their English was perfect. This was a nice chance for me practicing my speaking English.
Nice experience!
2004年10月18日
First TA
This evening, I came to the Second Campus for TA of experiments of C language programming. I substituted my studying brother to guide the experiments. This was a good chance for me to practise my ability.
There were fifteen students who were been guided by me. Some of them finished the experiment quickly and better. But some of them were not smart to the problem and the language. I thought back to my C language experiments when I was a fresh man. The guidline was more strict. Every fifteen students had a guidiing teacher. I believed this rule could give more benefit to the students.
It was a good experience.
There were fifteen students who were been guided by me. Some of them finished the experiment quickly and better. But some of them were not smart to the problem and the language. I thought back to my C language experiments when I was a fresh man. The guidline was more strict. Every fifteen students had a guidiing teacher. I believed this rule could give more benefit to the students.
It was a good experience.
2004年10月17日
2004年10月16日
Get together
This noon, Hang Chen came to the campus. Ten of our class got together at Hong Ming restaurant. We talked lots on the working experience and recent situation. Hang Chen was more mature and stout. He gave us some advices about finding a good job. He also told some news about other classmates. Some of them wanted to change a job and some prepared for the recent graduage enrollment exam.
We all fell happy with the good memory in undergraduate four years.
We all fell happy with the good memory in undergraduate four years.
2004年10月15日
质疑灰色系统理论
很久没有和别人讨论灰色系统方面的问题了。今天在HIT-IR-BBS上的Machine Learning版遇到一位ID是phew的朋友。开始是他对灰色系统理论提出质疑,后来是我们之间的一些讨论。将这些讨论列举如下:
phew:
-----------------------------------------------------------------------------------------
不知道你们这里对灰色这么热衷。
邓聚龙的灰色刚出来时,我曾经把他的例题进行了检查,发现他根本就没做过计算,而是想当然地直接给出了结果。后来,我按他的方法检查了所谓的GM(1,1)模型,GM(1,2)模型,发现他的理论根本就是对高等数学的侮辱。
如果你想附和理论,尽可以使用灰色作标题。如果你想做科学研究,那么,请你思考点(x,y)处的一阶导数是怎么定义的?然后,你去查他的例题,最好看他的第一本书。
我还检查过当时的《系统工程》杂志(具体名称及不清了),里面刊登的所有有关灰色理论的例子,没有一个可以按照他的思路得出正确的结果。然而,预测的结果精度都非常高。
于是,我放弃了这一高深的理论。
因此,我希望大家也对这些东西进行检查之后再使用它们。
-----------------------------------------------------------------------------------------
billlang:
-----------------------------------------------------------------------------------------
欢迎讨论
欢迎批判
但是你说的太笼统了,能不能烦请阁下给出一个具体的反例让人心服口服。
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
下面是一个例子,如果您有怀疑,可以找出邓聚龙86年左右出版的那本书,自己动手亲自验算一下。
因为已经被人提升到系统工程的高度,灰色理论理应适用于他所鼓吹的范围,并且,不能违背数学规律
令 t = (1,2,3,4,5)
假定时步 dt=1
对于序列 (35,47,22,150,47,33),因为它不是递增的,所以进行累加,得到如下序列
(82,104,254,301,334)
于是:[dx/dt]= (47,22,150,47,33)
[x]= (82,104,254,301,334)
GM(1,1)应当是:
dx/dt + ax + b =0
x=exp(-a * t) – b/a
只要求得系数(a,b),就万事大吉
E1=47 + a * 82 + b
E2=22 + a * 104 + b
E3=150 + a * 254 + b
E4=47 + a * 301 + b
E4=33 + a * 334 + b
按最小二乘法,使 min([e] * [e]’)
{-dx/dt}={C}{a,b}’
得到 (a,b) = ( -2.4362 ,-0.0105)
Dx/dt – 2.4362 * x – 0.0105 = 0
X=exp(2.4362 t ) – 0.0043
X=(10 13 14.9 1707 19509)
这个结果不是可以更改初始条件就可以修正到原来的序列的。
GM(1,2)就不用举例了。只靠累加去强迫所有的增序列满足指数规律,难道做得到吗?
这里的矩阵运算使用Matlab。
-----------------------------------------------------------------------------------------
billlang:
-----------------------------------------------------------------------------------------
phew你好!
我同意你对这个例子的分析
但是如果你仅仅因为这一个例子而排斥灰色系统理论,感觉有所不妥,个人认为你可以改进这个理论,甚至提出你自己的新的见解。
灰色系统理论从诞生以来已经得到了很大的改进,原先的GM(1,1)模型仅仅是一个预测模型的雏形,或者可以说是一种数据拟合的方法。后来出现了许许多多的改进使得基于GM(1,1)得到了很大的改进。
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
这不仅仅是一个孤立的例子,而是,使用那个常微分方程是有简单的前提要求,它不会自动满足灰度的概念中提出的要求。
--------------------------------
--------------------------------
请问Bill,我看了你的那个ppt。在学习灰色系统的过程中,其中对于白化方程推导GM(1,1)模型的那部分有点问题,我没有推出正确的答案。请问有没有关于这部分详细的理论和推导?如果有相关的电子文档,能不能发给我一份?我的信箱:yang.guan@gmail.com。多谢。
----------------------------------
看来不只是我有疑问。15年前,邓先生给我的答复同您的给我的鼓励是一样的。
1、 dx/dt=dx 是有条件的。那就是,dx/dt在0附近
2、使用所谓的灰微分方程,也不能违背微分方程的求解法则
您对灰微分方程附加了那么多的条件,“少数据量”又有什么意义呢?
我看了您的灰色讲义,其中您对于灰色的包装,实在是不得不折服。不过我不明白的是,以您深厚的数学底子,为什么不去堵上在数学上那么低级的漏洞呢?
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
这是台湾朝阳科技大学的一篇关于灰色预测的论文,诸位检查一下他的结果。
论文的第44页
------------------------------------------------------
http://ethesys.lib.cyut.edu.tw/ETD-db/ETD-search/getfile?URN=etd-0724104-142556&filename=etd-0724104-142556.pdf
------------------------------------------------------------
关键矩阵:
{B}=
-349 1
-612 1
-858 1
{Y}=
272
254
239
A={a,b}
矩阵方程
{B}*A'={Y}
用最小二乘法求 A
------------------------------------------------------
如果他的结论是正确的,那一定是最小二乘法出了问题
-----------------------------------------------------------------------------------------
billlang:
-----------------------------------------------------------------------------------------
您在15年前就和邓先生讨论了。看来您对灰色系统的研究比我深入很多。我接触灰色系统的时间不超过两年。原先接触灰色系统是因为想要用来参加数学建模竞赛,后来又在灰色系统的基础上完成过一个简单的研究题目。我现在还是一名学生,我的研究方向不是灰色系统,现在有许多事情需要我去完成,没有更多的时间来深入的学习灰色系统理论。但是我坚信,灰色系统理论有它存在的价值和意义,是一个富有生命力的理论。纵观很多经典理论的发展都不是一帆风顺的,我想灰色系统理论的发展会是前途光明的。
我的那个介绍灰色系统理论的讲义是参考《灰色系统理论及其应用》制作的,我并没有对灰色系统进行包装(建议您浏览一下这本书)。在网上您会找到很多关于灰色系统理论的文章,灰色系统的研究者有很多,他们富有成效的研究将灰色系统带上一个更高的层次。自我感觉我的数学底子还不是很深,现在还没有能力去发展灰色系统理论。
您对灰色系统理论的研究非常深入,建议您和现在灰色系统方面的专家刘思峰老师讨论一下。
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
因为用的人多,所以提醒大家,使用一个新理论之前,先要搞清它的前提,现在一哄而上的现象太普遍。对于灰色理论的GM模型,我的看法是:
1、他用了最小二乘法,只是最小二乘法只能给出一个误差平方和的最小值,这个最小值并不是0。这一点被人忽略了。
2、有太多的人在造数据,上面提到的那篇论文,a 的值明明是 0.0648,而最后的反推公式则奇迹般地变成为0.01804(希望是作者的笔误)。但是,无论用 a 的那个值(0.0648 或者 0.01804),按论文描述的灰色思路,都无法还原到原来的序列(希望我的验算是错误的)。
将灰色理论描述一番--〉装模作样造些数据--〉造个奇迹,这几乎成了利用灰色理论的套路。
理解发表论文的重要性,但是,明知是错还要去用,面对面前也同样需要论文的同道,这些作者们应当有所收敛。
-------------------------------------------------------
有太多的问题是说不清楚的,因而需要理论的突破,不只是灰色,风行的Fuzzy Probability 同样也在遭到数学家的拷问,不能采用糊涂的理论解释未知的事物。我希望灰色理论有大的突破,因为我的专业也在等待着。但不是用这样的方法。
-----------------------------------------------------------------------------------------
billlang:
部分同意您的观点
您说的“明知是错还要去用”我不太同意
因为灰色系统的对与错需要大量的事实来验证,相信实践是检验真理的唯一标准,为什么不等待时间和事实来证明一切呢
不知您现在的专业是什么,需要灰色系统来帮您解决什么问题呢
谢谢指教
-----------------------------------------------------------------------------------------
phew对问题的深入程度很是令人敬佩,期待和他的交流!
具体讨论参见:http://ir.hit.edu.cn/cgi-bin/newbbs/topic.cgi?forum=20&topic=4&start=24&show=0
phew:
-----------------------------------------------------------------------------------------
不知道你们这里对灰色这么热衷。
邓聚龙的灰色刚出来时,我曾经把他的例题进行了检查,发现他根本就没做过计算,而是想当然地直接给出了结果。后来,我按他的方法检查了所谓的GM(1,1)模型,GM(1,2)模型,发现他的理论根本就是对高等数学的侮辱。
如果你想附和理论,尽可以使用灰色作标题。如果你想做科学研究,那么,请你思考点(x,y)处的一阶导数是怎么定义的?然后,你去查他的例题,最好看他的第一本书。
我还检查过当时的《系统工程》杂志(具体名称及不清了),里面刊登的所有有关灰色理论的例子,没有一个可以按照他的思路得出正确的结果。然而,预测的结果精度都非常高。
于是,我放弃了这一高深的理论。
因此,我希望大家也对这些东西进行检查之后再使用它们。
-----------------------------------------------------------------------------------------
billlang:
-----------------------------------------------------------------------------------------
欢迎讨论
欢迎批判
但是你说的太笼统了,能不能烦请阁下给出一个具体的反例让人心服口服。
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
下面是一个例子,如果您有怀疑,可以找出邓聚龙86年左右出版的那本书,自己动手亲自验算一下。
因为已经被人提升到系统工程的高度,灰色理论理应适用于他所鼓吹的范围,并且,不能违背数学规律
令 t = (1,2,3,4,5)
假定时步 dt=1
对于序列 (35,47,22,150,47,33),因为它不是递增的,所以进行累加,得到如下序列
(82,104,254,301,334)
于是:[dx/dt]= (47,22,150,47,33)
[x]= (82,104,254,301,334)
GM(1,1)应当是:
dx/dt + ax + b =0
x=exp(-a * t) – b/a
只要求得系数(a,b),就万事大吉
E1=47 + a * 82 + b
E2=22 + a * 104 + b
E3=150 + a * 254 + b
E4=47 + a * 301 + b
E4=33 + a * 334 + b
按最小二乘法,使 min([e] * [e]’)
{-dx/dt}={C}{a,b}’
得到 (a,b) = ( -2.4362 ,-0.0105)
Dx/dt – 2.4362 * x – 0.0105 = 0
X=exp(2.4362 t ) – 0.0043
X=(10 13 14.9 1707 19509)
这个结果不是可以更改初始条件就可以修正到原来的序列的。
GM(1,2)就不用举例了。只靠累加去强迫所有的增序列满足指数规律,难道做得到吗?
这里的矩阵运算使用Matlab。
-----------------------------------------------------------------------------------------
billlang:
-----------------------------------------------------------------------------------------
phew你好!
我同意你对这个例子的分析
但是如果你仅仅因为这一个例子而排斥灰色系统理论,感觉有所不妥,个人认为你可以改进这个理论,甚至提出你自己的新的见解。
灰色系统理论从诞生以来已经得到了很大的改进,原先的GM(1,1)模型仅仅是一个预测模型的雏形,或者可以说是一种数据拟合的方法。后来出现了许许多多的改进使得基于GM(1,1)得到了很大的改进。
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
这不仅仅是一个孤立的例子,而是,使用那个常微分方程是有简单的前提要求,它不会自动满足灰度的概念中提出的要求。
--------------------------------
--------------------------------
请问Bill,我看了你的那个ppt。在学习灰色系统的过程中,其中对于白化方程推导GM(1,1)模型的那部分有点问题,我没有推出正确的答案。请问有没有关于这部分详细的理论和推导?如果有相关的电子文档,能不能发给我一份?我的信箱:yang.guan@gmail.com。多谢。
----------------------------------
看来不只是我有疑问。15年前,邓先生给我的答复同您的给我的鼓励是一样的。
1、 dx/dt=dx 是有条件的。那就是,dx/dt在0附近
2、使用所谓的灰微分方程,也不能违背微分方程的求解法则
您对灰微分方程附加了那么多的条件,“少数据量”又有什么意义呢?
我看了您的灰色讲义,其中您对于灰色的包装,实在是不得不折服。不过我不明白的是,以您深厚的数学底子,为什么不去堵上在数学上那么低级的漏洞呢?
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
这是台湾朝阳科技大学的一篇关于灰色预测的论文,诸位检查一下他的结果。
论文的第44页
------------------------------------------------------
http://ethesys.lib.cyut.edu.tw/ETD-db/ETD-search/getfile?URN=etd-0724104-142556&filename=etd-0724104-142556.pdf
------------------------------------------------------------
关键矩阵:
{B}=
-349 1
-612 1
-858 1
{Y}=
272
254
239
A={a,b}
矩阵方程
{B}*A'={Y}
用最小二乘法求 A
------------------------------------------------------
如果他的结论是正确的,那一定是最小二乘法出了问题
-----------------------------------------------------------------------------------------
billlang:
-----------------------------------------------------------------------------------------
您在15年前就和邓先生讨论了。看来您对灰色系统的研究比我深入很多。我接触灰色系统的时间不超过两年。原先接触灰色系统是因为想要用来参加数学建模竞赛,后来又在灰色系统的基础上完成过一个简单的研究题目。我现在还是一名学生,我的研究方向不是灰色系统,现在有许多事情需要我去完成,没有更多的时间来深入的学习灰色系统理论。但是我坚信,灰色系统理论有它存在的价值和意义,是一个富有生命力的理论。纵观很多经典理论的发展都不是一帆风顺的,我想灰色系统理论的发展会是前途光明的。
我的那个介绍灰色系统理论的讲义是参考《灰色系统理论及其应用》制作的,我并没有对灰色系统进行包装(建议您浏览一下这本书)。在网上您会找到很多关于灰色系统理论的文章,灰色系统的研究者有很多,他们富有成效的研究将灰色系统带上一个更高的层次。自我感觉我的数学底子还不是很深,现在还没有能力去发展灰色系统理论。
您对灰色系统理论的研究非常深入,建议您和现在灰色系统方面的专家刘思峰老师讨论一下。
-----------------------------------------------------------------------------------------
phew:
-----------------------------------------------------------------------------------------
因为用的人多,所以提醒大家,使用一个新理论之前,先要搞清它的前提,现在一哄而上的现象太普遍。对于灰色理论的GM模型,我的看法是:
1、他用了最小二乘法,只是最小二乘法只能给出一个误差平方和的最小值,这个最小值并不是0。这一点被人忽略了。
2、有太多的人在造数据,上面提到的那篇论文,a 的值明明是 0.0648,而最后的反推公式则奇迹般地变成为0.01804(希望是作者的笔误)。但是,无论用 a 的那个值(0.0648 或者 0.01804),按论文描述的灰色思路,都无法还原到原来的序列(希望我的验算是错误的)。
将灰色理论描述一番--〉装模作样造些数据--〉造个奇迹,这几乎成了利用灰色理论的套路。
理解发表论文的重要性,但是,明知是错还要去用,面对面前也同样需要论文的同道,这些作者们应当有所收敛。
-------------------------------------------------------
有太多的问题是说不清楚的,因而需要理论的突破,不只是灰色,风行的Fuzzy Probability 同样也在遭到数学家的拷问,不能采用糊涂的理论解释未知的事物。我希望灰色理论有大的突破,因为我的专业也在等待着。但不是用这样的方法。
-----------------------------------------------------------------------------------------
billlang:
部分同意您的观点
您说的“明知是错还要去用”我不太同意
因为灰色系统的对与错需要大量的事实来验证,相信实践是检验真理的唯一标准,为什么不等待时间和事实来证明一切呢
不知您现在的专业是什么,需要灰色系统来帮您解决什么问题呢
谢谢指教
-----------------------------------------------------------------------------------------
phew对问题的深入程度很是令人敬佩,期待和他的交流!
具体讨论参见:http://ir.hit.edu.cn/cgi-bin/newbbs/topic.cgi?forum=20&topic=4&start=24&show=0
新书推荐《神经网络及其应用》
出版社 : 清华大学出版社
作者 : 周志华/ 曹存根/
系列名 : 中国计算机学会学术著作丛书
出版日期: 2004年9月
内容简介:
本书特别邀请国内神经网络及相关领域的知名专家,分别对神经网络的理论基础及典型应用进行了讨论。内容涉及神经网络的学习方法、优化计算、知识理论、流形学习、过程神经元网络、随机二元网络、离散联想记忆神经网络以及神经网络在医学数据处理、汉语认知等方面的应用。文中通过丰富的文献资料和研究工作,对当前的最新进展做出回顾和分析,对学术研究有重要的参考价值。
本书适合计算机和自动化专业的研究生、教师、工程技术人员和研究人员参考。
作者 : 周志华/ 曹存根/
系列名 : 中国计算机学会学术著作丛书
出版日期: 2004年9月
内容简介:
本书特别邀请国内神经网络及相关领域的知名专家,分别对神经网络的理论基础及典型应用进行了讨论。内容涉及神经网络的学习方法、优化计算、知识理论、流形学习、过程神经元网络、随机二元网络、离散联想记忆神经网络以及神经网络在医学数据处理、汉语认知等方面的应用。文中通过丰富的文献资料和研究工作,对当前的最新进展做出回顾和分析,对学术研究有重要的参考价值。
本书适合计算机和自动化专业的研究生、教师、工程技术人员和研究人员参考。
2004年10月14日
Related materials on Summarization Evaluation
There were so many materials on summarization evaluation. In the recent DUC 2004 conference, there was a summarization evaluation tool named as ROUGE. It's main idea was calculating the n-gram co-occurence rate. Following the successful application of automatic evaluation methods, such as BLEU, in machine learning translation evaluation, Lin and Hovy(2003) showed that methods similiar to BLEU, i.e. n-gram co-occurance statistics, could be applied to evaluate summaries.
ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes several automatic evaluation methods that measure the similaity between summaries.
Reference:
[1] Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of Summaries, ACL2004
[2] Chew-Yew Lin, and E.H.Hovy.2003. Automatic evaluation of summaries using n-gram co-occurance statics. In Proceedings of 2003 Language Technologyu Conference, Edmonton, Canada.
ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes several automatic evaluation methods that measure the similaity between summaries.
Reference:
[1] Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of Summaries, ACL2004
[2] Chew-Yew Lin, and E.H.Hovy.2003. Automatic evaluation of summaries using n-gram co-occurance statics. In Proceedings of 2003 Language Technologyu Conference, Edmonton, Canada.
2004年10月13日
New Scheme
I have done the summarization evaluation task. But I have not studied on CR for nearly two months. So my recent task was to read lots of papers about CR.
We had studied lots of classes on Pattern Classification, Combinatorics. I had not reviewed them for nearly half a month.
Two main aspects of tasks I could work for. Try again!
We had studied lots of classes on Pattern Classification, Combinatorics. I had not reviewed them for nearly half a month.
Two main aspects of tasks I could work for. Try again!
2004年10月12日
Poor pronunciation
During this year I had made two English presentations. The first one was on the Graduate English Class. At that time my topic was High-tech. It was only eight minutes. After that presentation my English teacher Mrs. Zhang suggested me improving my pronunciation. My second presentation was in the summer holiday's Fault Tolerant Computing and Wearable Computing Class. My topic was Power Management. It was thirty-five minutes. After that presentation the teacher Dr.Daniel P. Siewiorek suggested me improving my pronunciation.
This evening I made the third presentation. This time was in the Doctoral English Forum about discussion some papers about our research fields. It was my trun to give presentation. My topic was Random Forests in Language Modeling. It was seventy minutes. I kept my speaking speed in order to express myself clearly. After the presentation I answered lots of questions. Dr.Tliu suggested me improving my pronunciation.
It was clearly that my pronunciation was not good. I fell this problem was very serious to me. I should solve this problem from now on.
This evening I made the third presentation. This time was in the Doctoral English Forum about discussion some papers about our research fields. It was my trun to give presentation. My topic was Random Forests in Language Modeling. It was seventy minutes. I kept my speaking speed in order to express myself clearly. After the presentation I answered lots of questions. Dr.Tliu suggested me improving my pronunciation.
It was clearly that my pronunciation was not good. I fell this problem was very serious to me. I should solve this problem from now on.
2004年10月11日
Continue reading paper
For tomorrow's presentation, I must continue reading the paper. There were lots of puzzles to me.
One thing I'd like to note that the author Peng Xu was a Chinese. His education experience was as follows:
1990 ~ 1995 B.S. in Tsinghua University;
1995 ~ 1998 M.S. in National Lab of Pattern Recognition Beijing;
1998 ~ 1999 Ph.D. Candidate in Brown University;
1999 ~ now Ph.D. Candidate in The Center for Language and Speech Processing, Johns Hopkins University, USA.
Fields of Interest: Speech Recognition, Pattern Recognition, Language Modeling, Machine Translation, Natural Language Processing, Multimedia Coding.
He had done lots of performance.
One thing I'd like to note that the author Peng Xu was a Chinese. His education experience was as follows:
1990 ~ 1995 B.S. in Tsinghua University;
1995 ~ 1998 M.S. in National Lab of Pattern Recognition Beijing;
1998 ~ 1999 Ph.D. Candidate in Brown University;
1999 ~ now Ph.D. Candidate in The Center for Language and Speech Processing, Johns Hopkins University, USA.
Fields of Interest: Speech Recognition, Pattern Recognition, Language Modeling, Machine Translation, Natural Language Processing, Multimedia Coding.
He had done lots of performance.
2004年10月10日
Random Forests in Language Modeling
It is about the language model. The reading outline was as follows:
Title: Random Forests in Language Modeling
Author(s): Peng Xu and Frederick Jelinek
Author Affiliation: Center for Language and Speech Processing, the Johns Hopkins University, Baltimore, MD 21218, USA
Conference Title: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.
Language: English
Type: Conference Paper (PA)
Treatment: Practical (P) Experimental (X)
Abstract: In this paper, we explore the use of Random Forests (RFs) (Amit and Geman, 1997; Breiman, 2001) in language modeling, the problem of predicting the next word based on words already seen before. The goal in this work is to develop a new language modeling approach based on randomly grown Decision Trees (DTs) and apply it to automatic speech recognition. We study our RF approach in the context of n-gram type language modeling. Unlike regular n-gram language models, RF language models have the potential to generalize well to unseen data, even when a complicated history is used. We show that our RF language models are superior to regular n-gram language models in reducing both the perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system.
Descriptors: Natural Language Processing basic problem
Identifiers: random forests, language model, decision tree, perplixity
Personal feeling: It introduces the decision tree language model and random forests concepts. The main idea is wonderful. Random generating some decision tree language models and combine them for a whole model. This model could solve the data sparseness problem at some extent.
Some thing could be updated: The random decision tree generation method was not good enough. I believe we can use some optimization principles for get better random decision trees.
Title: Random Forests in Language Modeling
Author(s): Peng Xu and Frederick Jelinek
Author Affiliation: Center for Language and Speech Processing, the Johns Hopkins University, Baltimore, MD 21218, USA
Conference Title: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.
Language: English
Type: Conference Paper (PA)
Treatment: Practical (P) Experimental (X)
Abstract: In this paper, we explore the use of Random Forests (RFs) (Amit and Geman, 1997; Breiman, 2001) in language modeling, the problem of predicting the next word based on words already seen before. The goal in this work is to develop a new language modeling approach based on randomly grown Decision Trees (DTs) and apply it to automatic speech recognition. We study our RF approach in the context of n-gram type language modeling. Unlike regular n-gram language models, RF language models have the potential to generalize well to unseen data, even when a complicated history is used. We show that our RF language models are superior to regular n-gram language models in reducing both the perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system.
Descriptors: Natural Language Processing basic problem
Identifiers: random forests, language model, decision tree, perplixity
Personal feeling: It introduces the decision tree language model and random forests concepts. The main idea is wonderful. Random generating some decision tree language models and combine them for a whole model. This model could solve the data sparseness problem at some extent.
Some thing could be updated: The random decision tree generation method was not good enough. I believe we can use some optimization principles for get better random decision trees.
2004年10月9日
Three new summarization systems
Based on the best evaluation methods' idea, I realized two summarization systems. The first one was following the traditional methods that calculates the weights of all sentences and selects the best ones.
And the new idea for my first system was the weighting methods. I used the evaluation methods for calculating the weights of each sentences. The final summarizations of the test papers achieved some better score under my best evaluation system.
Then I changed my point to combine all possible sentences set then calculating the similarty to source file. This methods was very slow. As its algorithm complexity was pow(2,n).
The third one was generalizing the weights of each sentences based on the first system. But its final evaluating results was worse than the realized four system by yhb.
Three systems, three methods, I will think more about them.
And the new idea for my first system was the weighting methods. I used the evaluation methods for calculating the weights of each sentences. The final summarizations of the test papers achieved some better score under my best evaluation system.
Then I changed my point to combine all possible sentences set then calculating the similarty to source file. This methods was very slow. As its algorithm complexity was pow(2,n).
The third one was generalizing the weights of each sentences based on the first system. But its final evaluating results was worse than the realized four system by yhb.
Three systems, three methods, I will think more about them.
2004年10月8日
Some checking results
The original plan was that I use the new evaluation method for yhb to obtain the best one. And yhb gave me eight new system, I used my progam to evaluate them one by one.
It was perfect effective. But there were some trend not following Mrs. Qin's feeling. We could analysis more.
It was perfect effective. But there were some trend not following Mrs. Qin's feeling. We could analysis more.
2004年10月7日
Exciting Scheme
This morning, after reading the daily latest news I made the daily plan. Just now I finished the first one: realizing the relatively word frequency approach. The final experimental result was close to the intending. It was of the ability of distinguishing different summarization systems, but not high relative to the human feelings. It was not well as the two ones, could be a comparison results.
Right now I had an exciting scheme. I discussed the recent development with Wanxiang Che who was a PhD.Student. He did not believe that my TF method was powerful and suggested me to realize some new method. But he suggested me to realize a new summarization system based on my TF method. We could make some evaluation by human to prove the useness of this method.
This system was not complicated. I could realize it quickly. So exciting.
After comparing this system with others I could do another thing: comparing the summarizations with human summarizations and get new evaluation approach.
So exciting news for me today! I was excited!!!!
Let me start the new plans.
Right now I had an exciting scheme. I discussed the recent development with Wanxiang Che who was a PhD.Student. He did not believe that my TF method was powerful and suggested me to realize some new method. But he suggested me to realize a new summarization system based on my TF method. We could make some evaluation by human to prove the useness of this method.
This system was not complicated. I could realize it quickly. So exciting.
After comparing this system with others I could do another thing: comparing the summarizations with human summarizations and get new evaluation approach.
So exciting news for me today! I was excited!!!!
Let me start the new plans.
2004年10月6日
Simply but effective method for SE
It was said that the most effective method was the most simply one. I could not believe it ever. But now, I couldn't help believing it.
This morning, I kept feeling sad on the SE task. I had no idea. But I sticked on my viewpoint about the break point that was how to combine the new system into my SE system. I wanted to realize the famous package for Summarization Evaluation in DUC2004: ROUGE. But there was some unkonwn problem that I could not run the progroms. I had no idea and began to review the presentation ppt on 26 Sep. Suddenly, I found the two methods in that ppt could be re-realized with some new usage.
I compared the new four ranks data, and got the ideal result. Wonderful!!
I realized it. I told this news to Dr.Tliu. He discuss it with me and was exciting ,too. He suggested me to think more about the methods.
At noon, I kept working and realized a more basic method and achieved better result. Good news for me.
Frankly speaking, the two new methods were not master, but simply and effectively. I could not explain it fully.
This morning, I kept feeling sad on the SE task. I had no idea. But I sticked on my viewpoint about the break point that was how to combine the new system into my SE system. I wanted to realize the famous package for Summarization Evaluation in DUC2004: ROUGE. But there was some unkonwn problem that I could not run the progroms. I had no idea and began to review the presentation ppt on 26 Sep. Suddenly, I found the two methods in that ppt could be re-realized with some new usage.
I compared the new four ranks data, and got the ideal result. Wonderful!!
I realized it. I told this news to Dr.Tliu. He discuss it with me and was exciting ,too. He suggested me to think more about the methods.
At noon, I kept working and realized a more basic method and achieved better result. Good news for me.
Frankly speaking, the two new methods were not master, but simply and effectively. I could not explain it fully.
2004年10月5日
Visiting science and technology museum
This morning we began our visiting plan: to science and technology museum.
We were 17 person including members of our lab and WF.
Remembering last time, we planed to visit this place. But it was close every Monday. Insteadly we plaied in the Sun Island. Today was Tuesday and sunny.
In this beautiful building there were three floors. All kinds of item under science and technology were interesting.
The most exciting program was the four-dimensional film.
We were 17 person including members of our lab and WF.
Remembering last time, we planed to visit this place. But it was close every Monday. Insteadly we plaied in the Sun Island. Today was Tuesday and sunny.
In this beautiful building there were three floors. All kinds of item under science and technology were interesting.
The most exciting program was the four-dimensional film.
2004年10月4日
No answer for SE
One whole day, I was thinking the key of the SE problem.
I found I had not any idea for this problem. So I changed my view to the publication papers of others. There was lots of papers about SE by Hongyan Jin. She was a famous person in this area. I had read some of her papers. But there were not any information guiding me to finish my task.
The key problem of the SE task was how to combine the new summarization system into the SE system.
There was a famous summarization evaluation conference in DUC. Their evaluation tool was ROUGE which was based on n-gram and other gram information. I wanted to test it. But there was some problems of my perl enviroment.
Until just now, I had not got along with it.
I found I had not any idea for this problem. So I changed my view to the publication papers of others. There was lots of papers about SE by Hongyan Jin. She was a famous person in this area. I had read some of her papers. But there were not any information guiding me to finish my task.
The key problem of the SE task was how to combine the new summarization system into the SE system.
There was a famous summarization evaluation conference in DUC. Their evaluation tool was ROUGE which was based on n-gram and other gram information. I wanted to test it. But there was some problems of my perl enviroment.
Until just now, I had not got along with it.
2004年10月3日
Three new methods for SE
My recent task was designing new methods for SE(Summarization Evaluation). Today, I tested three new methods for SE. They were Artifical Neural Network, Decision Trees, multi regress analysis. But no one was good for my task.
I believed there was a breach. I must combine the new system to my SE methods. How to combine? This was the essence of my problem. I could ponder it much.
I believed there was a breach. I must combine the new system to my SE methods. How to combine? This was the essence of my problem. I could ponder it much.
2004年10月2日
New method for SE
There was a new method for SE(Summarization Evaluation). That was based on the hint of Dr.Tliu, Car and Yhb. This method was so good that I could use my machine learning methods. I wanted to extract lots of features of the Summarizations and let the learner fitting the data.
The framework has been fixed on. I only realized the sub-modules one by one.
The framework has been fixed on. I only realized the sub-modules one by one.
2004年10月1日
National Day!
This is national day!
Under the original plan, I was working in lab in this morning and afternoon. When it was 3:00 pm, I, with WF, went to Harbin Odeum. There was a wonderful concert of Harbin Philharmonic Group for national day. It began at 6:30 pm.
The conductor was a famous young man named Qiuhong Teng. He conducted 12 compositions, including the famous Carmen, the Blue Danube, and so on. They were beautiful.
This was the first time I went to listen concert. Wonderful experience. And WF and me wandered in soome streets, including the Central Street.
When it came to 8:00 pm, we came back.
Under the original plan, I was working in lab in this morning and afternoon. When it was 3:00 pm, I, with WF, went to Harbin Odeum. There was a wonderful concert of Harbin Philharmonic Group for national day. It began at 6:30 pm.
The conductor was a famous young man named Qiuhong Teng. He conducted 12 compositions, including the famous Carmen, the Blue Danube, and so on. They were beautiful.
This was the first time I went to listen concert. Wonderful experience. And WF and me wandered in soome streets, including the Central Street.
When it came to 8:00 pm, we came back.
2004年9月30日
Welcome Hang Chen!
This afternoon Hang Chen phoned me when I was in the class of Science and Technology Philosophy. He told me that he will be sent to Harbin for exercitation on Oct 6.
Great! Hang Chen was one of my undergraduate classmates. He was working at Hua Wei. Last time he said to me that he would be sent to some places for exercitation. Last month he was in Shenyang. Unexpectedly, he would be Harbin in Oct. Good. We could get together again!
Great! Hang Chen was one of my undergraduate classmates. He was working at Hua Wei. Last time he said to me that he would be sent to some places for exercitation. Last month he was in Shenyang. Unexpectedly, he would be Harbin in Oct. Good. We could get together again!
2004年9月29日
Hidden Markov Model
Hidden Markov Model was very famous. I heard it on Simply's presentation last year. But I had not understand it totally until this afternoon.
Just these days I was learning the chapter on Markov Model in Pattern Classification. This afternoon, on the class of Natural Language Processing, Dr.Yi Guan introduced Markov Model to us.
Just now I reviewed the content of Pattern Classification and Dr.Yi Guan's teaching materials. I thought I had understood this model. Markov was so great.
Just these days I was learning the chapter on Markov Model in Pattern Classification. This afternoon, on the class of Natural Language Processing, Dr.Yi Guan introduced Markov Model to us.
Just now I reviewed the content of Pattern Classification and Dr.Yi Guan's teaching materials. I thought I had understood this model. Markov was so great.
2004年9月28日
Useful English Phrase
roger
interj.
Used especially in radio communications to indicate receipt of a message.
知道了,已收到了尤用于无线电通讯中表示收到讯息的答语
interj.
Used especially in radio communications to indicate receipt of a message.
知道了,已收到了尤用于无线电通讯中表示收到讯息的答语
Mid-autumn day!
Give my wish to all my friends! Happy mid-autumn day!
So traditional festival in China, I had dinner with my roomates this evening. This was the first time we all got together. We were happy.
So traditional festival in China, I had dinner with my roomates this evening. This was the first time we all got together. We were happy.
2004年9月27日
My birthday!
I am twenty-three years old today! Before twenty-three years I was born today. Thanks for my parents. They gave me life and brought me up.
I was very happy today! Lots of friends sent their bless and well-wishing. my parents, FangWang, Simply, Taozi, Zsq, Carl, Cr999, Heng Cao, Lou, Shuting Wang, Jinzhong Chen, Fan Xu, Heqing Rao, Ligang Long, Boqiang Luo, Chenlin Luo, Yongkun Zhang, Wenji Song, Qi Wang, Hang Chen, Xianjun Li, Xin Zhang, Qingren He, Xuan Hu, Yin Xiao, Kai Zhao, Jun Li, Jingwei Liu, Zhengjing Tang, Hao Wang, and so on. I was so happy!
I was very happy today! This afternoon, I went for the interview about the continuous academic project that involves postgraduate and doctoral study. I started my doctoral study plan from today. It was special for my birthday. I loved my choice. For my doctoral degree and for my ideal, I should be more studious, consider more about my research direction, and learn more.
I was very happy today! I knew a wonderful girl. She was so good. It's my pleasure to have dinner with her.
So happy day!
My new year's life. I could do better.
I was very happy today! Lots of friends sent their bless and well-wishing. my parents, FangWang, Simply, Taozi, Zsq, Carl, Cr999, Heng Cao, Lou, Shuting Wang, Jinzhong Chen, Fan Xu, Heqing Rao, Ligang Long, Boqiang Luo, Chenlin Luo, Yongkun Zhang, Wenji Song, Qi Wang, Hang Chen, Xianjun Li, Xin Zhang, Qingren He, Xuan Hu, Yin Xiao, Kai Zhao, Jun Li, Jingwei Liu, Zhengjing Tang, Hao Wang, and so on. I was so happy!
I was very happy today! This afternoon, I went for the interview about the continuous academic project that involves postgraduate and doctoral study. I started my doctoral study plan from today. It was special for my birthday. I loved my choice. For my doctoral degree and for my ideal, I should be more studious, consider more about my research direction, and learn more.
I was very happy today! I knew a wonderful girl. She was so good. It's my pleasure to have dinner with her.
So happy day!
My new year's life. I could do better.
2004年9月26日
Next plan for Summarization Evaluation
This afternoon, in the colloquium of our lab, I introduced the summarization evaluation task, finished works, and next plan. Dr.Tliu and other seniors gave me lots of suggestion.
My task was very inportant for the Summarization project. I should try my best to achieve the wonderful goal.
My task was very inportant for the Summarization project. I should try my best to achieve the wonderful goal.
Some useful translation
A AA制 Dutch treatment; to go Dutch
B 毕业答辩 thesis/dissertation defence 毕业设计 final project
博士生 a PhD Candidate 报销 to apply for reimbursement
博导 PhD student supervisor 班主任 class tutor
必修/选修课 compulsory/optional courses/modules
辩论队 debate team 辩论赛 debate contest 本命年 one's own Chinese zodiac year
C 成就感 sense of accomplishments/achievements
D 第三产业 the tertiary industry 导师 tutor, supervisor
独立思考能力 capacity for independent thinking 党支部 Party branch
党支部书记 Party branch secretary 调研 research; survey
E 厄尔尼诺现象 El Nino phenomenon 二等奖 the second prize
F 附中 affiliated (high or junior etc) school of ....
附件(email): attachment 房地产 real estate
G 公务员 civil servant (工作)单位 work unit
工学学士/硕士 Bachelor/Master of Science (B.S & M.S)
高考 National College Entrance Examination
国家重点实验室 state key laboratory
股份制 shareholding system; joint-stock system
股份有限公司 Co. Ltd; company/corporation limited: limited corporation
H 户口簿 residence booklet; household register; household registration booklet
获六级证书 obtain a certificate of CET-6
J 甲方乙方 Part A and Party B 基础设施 infrastructure
敬业精神 professional dedication; professional ethics
讲师 lecturer 高级讲师 senior lecturer 技术支持 technical support
精神文明建设 ideological and ethical progress
机电一体化 Electromechanical Integration
激烈的竞争 intense/fierce/bitter competition
九五攻关 The 9th 5-year plan 竞争力 competitiveness
K 可持续发展 sustainable development
考研 take part in the entrance exams for postgraduate schools
课代表 subject representative
L 理论联系实际 to link theory with practice
论文答辩 thesis defence 劳动密集型 labour-intensive
联系方式 contacts;contact details; how to contact;
M 民工 migrant workers/labourers 满分 full mark 面试 interview
P 平面设计 graphic design
Q 全职 full-time
R 人才 talent; talented people 理念 philosophy; value; doctrine
入世 china's accession to the wto; china joins the wto
S 三个代表(论) the Three Represents (Theory) 三等奖 the third prize
双刃剑 double-edged sword 上网 to get on the internet
适者生存 survival of the fittest 私营经济 private sector
事业单位 public institution 私/民营企业 private enterprise
三好学生 merit student; three good student(good in study, attitude and health)
师兄 无准确英译,可表达为'junior or senior (fellow) schoolmate/student
双赢(局面) win-win; a win-win situation 实习 internship 实习生 intern
双学位 double degree/dual degree 手机短信SMS/short message/instant message
上市 to go public; to be listed (in the stock market)
市场营销(活动) marketing (activitiess)
硕博联读 a continuous academic project that involves postgraduate and
doctoral study; a PhD programme
水平一/二 English Proficiency Test I/II (of Tsinghua University)
社会实践 social practice
社会实践优秀个人 excellent individual in social practice
T 团队精神 esprit de corps OR team spirit 特此证明 this is to certify that.
团支部书记 League branch secretary 团委 the Youth League committee
特等奖学金 top class/level scholarship
通过大学四级考试 pass the College English Test Band 4
W 物业管理 asset management, property management
物流 logistics
外联部 liaison department (小的办公室,叫office)
企业的外联部,通常是PR: Public Relations Division/Department
X 性价比 cost performance 学术交流 academic exchange
信息化 adj and n. information v. informatise/informationise
n. informatisation/informationisation
选修课 optional/selective courses/modules
学位课 degree course 学号 student number
Y 营销(学) marketing
优胜互补 (the two parties...) have complementary advantages
优胜劣汰,适者生存 survival of the fittest
院士(见Z中科院条)
与时俱进 to advance/progress with times 研究所 research institute
以人为本 people oriented; people foremost
研一生 first-year graduate student
一等奖学金 first class scholarship 一等奖 first prize
有限公司 limited company; Ltd.
Z
振兴xxx: to rejuventate/revitalise xxx 准考证 admission ticket
知识经济 knowledge economy; knowledge-based economy
知识密集(性) knowledge-intensive
知识产权 intellectual property rights
中科院 the Chinese Academy of Sciences; Academia Sinica
(院士 member, academician)
中国工程院 the Chinese Academy of Engineering
正版 adj. authorised
综合国力 comprehensive national strength
政治面貌 political status
助教 teaching assistant (TA)
自强不息,厚德载物 Self-discipline and Social Commitment
自我评价 self-assessment; self-evaluation
B 毕业答辩 thesis/dissertation defence 毕业设计 final project
博士生 a PhD Candidate 报销 to apply for reimbursement
博导 PhD student supervisor 班主任 class tutor
必修/选修课 compulsory/optional courses/modules
辩论队 debate team 辩论赛 debate contest 本命年 one's own Chinese zodiac year
C 成就感 sense of accomplishments/achievements
D 第三产业 the tertiary industry 导师 tutor, supervisor
独立思考能力 capacity for independent thinking 党支部 Party branch
党支部书记 Party branch secretary 调研 research; survey
E 厄尔尼诺现象 El Nino phenomenon 二等奖 the second prize
F 附中 affiliated (high or junior etc) school of ....
附件(email): attachment 房地产 real estate
G 公务员 civil servant (工作)单位 work unit
工学学士/硕士 Bachelor/Master of Science (B.S & M.S)
高考 National College Entrance Examination
国家重点实验室 state key laboratory
股份制 shareholding system; joint-stock system
股份有限公司 Co. Ltd; company/corporation limited: limited corporation
H 户口簿 residence booklet; household register; household registration booklet
获六级证书 obtain a certificate of CET-6
J 甲方乙方 Part A and Party B 基础设施 infrastructure
敬业精神 professional dedication; professional ethics
讲师 lecturer 高级讲师 senior lecturer 技术支持 technical support
精神文明建设 ideological and ethical progress
机电一体化 Electromechanical Integration
激烈的竞争 intense/fierce/bitter competition
九五攻关 The 9th 5-year plan 竞争力 competitiveness
K 可持续发展 sustainable development
考研 take part in the entrance exams for postgraduate schools
课代表 subject representative
L 理论联系实际 to link theory with practice
论文答辩 thesis defence 劳动密集型 labour-intensive
联系方式 contacts;contact details; how to contact;
M 民工 migrant workers/labourers 满分 full mark 面试 interview
P 平面设计 graphic design
Q 全职 full-time
R 人才 talent; talented people 理念 philosophy; value; doctrine
入世 china's accession to the wto; china joins the wto
S 三个代表(论) the Three Represents (Theory) 三等奖 the third prize
双刃剑 double-edged sword 上网 to get on the internet
适者生存 survival of the fittest 私营经济 private sector
事业单位 public institution 私/民营企业 private enterprise
三好学生 merit student; three good student(good in study, attitude and health)
师兄 无准确英译,可表达为'junior or senior (fellow) schoolmate/student
双赢(局面) win-win; a win-win situation 实习 internship 实习生 intern
双学位 double degree/dual degree 手机短信SMS/short message/instant message
上市 to go public; to be listed (in the stock market)
市场营销(活动) marketing (activitiess)
硕博联读 a continuous academic project that involves postgraduate and
doctoral study; a PhD programme
水平一/二 English Proficiency Test I/II (of Tsinghua University)
社会实践 social practice
社会实践优秀个人 excellent individual in social practice
T 团队精神 esprit de corps OR team spirit 特此证明 this is to certify that.
团支部书记 League branch secretary 团委 the Youth League committee
特等奖学金 top class/level scholarship
通过大学四级考试 pass the College English Test Band 4
W 物业管理 asset management, property management
物流 logistics
外联部 liaison department (小的办公室,叫office)
企业的外联部,通常是PR: Public Relations Division/Department
X 性价比 cost performance 学术交流 academic exchange
信息化 adj and n. information v. informatise/informationise
n. informatisation/informationisation
选修课 optional/selective courses/modules
学位课 degree course 学号 student number
Y 营销(学) marketing
优胜互补 (the two parties...) have complementary advantages
优胜劣汰,适者生存 survival of the fittest
院士(见Z中科院条)
与时俱进 to advance/progress with times 研究所 research institute
以人为本 people oriented; people foremost
研一生 first-year graduate student
一等奖学金 first class scholarship 一等奖 first prize
有限公司 limited company; Ltd.
Z
振兴xxx: to rejuventate/revitalise xxx 准考证 admission ticket
知识经济 knowledge economy; knowledge-based economy
知识密集(性) knowledge-intensive
知识产权 intellectual property rights
中科院 the Chinese Academy of Sciences; Academia Sinica
(院士 member, academician)
中国工程院 the Chinese Academy of Engineering
正版 adj. authorised
综合国力 comprehensive national strength
政治面貌 political status
助教 teaching assistant (TA)
自强不息,厚德载物 Self-discipline and Social Commitment
自我评价 self-assessment; self-evaluation
2004年9月25日
成君忆:《从三国到西游:中国传统文化中的人本管理智慧》
上午有幸进入主楼三楼大厅,聆听了成君忆主讲的《从三国到西游:中国传统文化中的人本管理智慧》。
内容提要如下:
1、创作《水煮三国》的6个起因
2、以性格为中心的人力资源管理
a、什么是性格?
b、蜘蛛的故事
3、性格的分类
a、曹操与孙悟空:力量型的杰出代表 专论:孙悟空的“弼马瘟”效应
b、刘备与猪八戒:活泼型的典型
c、诸葛亮和唐僧:完美型的化身
d、孙权与沙和尚:和平型的优秀版本 专论:和平型的人生态度有许多看似矛盾的地方
4、角色演练
a、作为个人一个关于感情问题处理方式 寓言故事:《孙悟空大闹五庄观》
b、作为团队的管理者 寓言:《老虎今天吃草》
专论:为什么要给孙悟空戴金箍儿?
c、作为合作者和竞争者
5、不同性格类型的优点与缺点
a、力量型的优点与缺点
b、活泼型的优点与缺点
c、完美型的优点与缺点
d、和平型的优点与缺点
6、人际冲突中的性格变化几个关于人际冲突处理方式的案例?
※ 唐僧与孙悟空
※ 孙悟空与猪八戒
※ 刘备与孙尚香
※ 孙权与沙和尚
7、对各种性格类型的忠告
a、对力量型的忠告
b、对活泼型的忠告
c、对完美型的忠告
d、对和平型的忠告
8、关于《三国演义》、《西游记》若干公案的探讨?孙悟空是从哪里来的?
※ 唐僧师徒四人的姓名由来
※ 九九八十一难的真实意义
※ 解读《心经》
9、结束语:温故而知新
其中我觉得有些话很有道理。如:你在看一个人的时候如果总是挑他的毛病和缺点,那么你将会觉得这个人越来越讨厌,这是客观事实; 反之,如果一个人你总是欣赏他的优点和长处,那么你会越来越认为这个人值得成为朋友,这也是客观事实中发现的结果。如果我们总是抱着一颗阳光的心去看待这个世界,那么观察到的东西总是非常美好的。
还有一个观点:为什么随着两个相互欣赏的人彼此了解的深入,你会发现对方存在这样那样的缺点,时间长了,你会觉得这个人可能会越来越讨厌。了解得越深入,那么就越会出现这种问题。这也是一对恋人或者夫妻之间会出现矛盾的原因所在。出现问题不要紧,要紧的是双方应该抱着一种积极的负责任的态度去解决出现的问题,而不是消极的对待。
对待历史应该有阳光的心态,历史的博大精深才会展现在你的面前,如果总是想着社会的阴暗面和倒退的现象,久而久之也会让一个人的心态变得越来越阴暗。
成君忆现在为止写了两本书《水煮三国》和《孙悟空是个好员工》。这两本书充分体现了他的观点,有时间确实需要读一下。
内容提要如下:
1、创作《水煮三国》的6个起因
2、以性格为中心的人力资源管理
a、什么是性格?
b、蜘蛛的故事
3、性格的分类
a、曹操与孙悟空:力量型的杰出代表 专论:孙悟空的“弼马瘟”效应
b、刘备与猪八戒:活泼型的典型
c、诸葛亮和唐僧:完美型的化身
d、孙权与沙和尚:和平型的优秀版本 专论:和平型的人生态度有许多看似矛盾的地方
4、角色演练
a、作为个人一个关于感情问题处理方式 寓言故事:《孙悟空大闹五庄观》
b、作为团队的管理者 寓言:《老虎今天吃草》
专论:为什么要给孙悟空戴金箍儿?
c、作为合作者和竞争者
5、不同性格类型的优点与缺点
a、力量型的优点与缺点
b、活泼型的优点与缺点
c、完美型的优点与缺点
d、和平型的优点与缺点
6、人际冲突中的性格变化几个关于人际冲突处理方式的案例?
※ 唐僧与孙悟空
※ 孙悟空与猪八戒
※ 刘备与孙尚香
※ 孙权与沙和尚
7、对各种性格类型的忠告
a、对力量型的忠告
b、对活泼型的忠告
c、对完美型的忠告
d、对和平型的忠告
8、关于《三国演义》、《西游记》若干公案的探讨?孙悟空是从哪里来的?
※ 唐僧师徒四人的姓名由来
※ 九九八十一难的真实意义
※ 解读《心经》
9、结束语:温故而知新
其中我觉得有些话很有道理。如:你在看一个人的时候如果总是挑他的毛病和缺点,那么你将会觉得这个人越来越讨厌,这是客观事实; 反之,如果一个人你总是欣赏他的优点和长处,那么你会越来越认为这个人值得成为朋友,这也是客观事实中发现的结果。如果我们总是抱着一颗阳光的心去看待这个世界,那么观察到的东西总是非常美好的。
还有一个观点:为什么随着两个相互欣赏的人彼此了解的深入,你会发现对方存在这样那样的缺点,时间长了,你会觉得这个人可能会越来越讨厌。了解得越深入,那么就越会出现这种问题。这也是一对恋人或者夫妻之间会出现矛盾的原因所在。出现问题不要紧,要紧的是双方应该抱着一种积极的负责任的态度去解决出现的问题,而不是消极的对待。
对待历史应该有阳光的心态,历史的博大精深才会展现在你的面前,如果总是想着社会的阴暗面和倒退的现象,久而久之也会让一个人的心态变得越来越阴暗。
成君忆现在为止写了两本书《水煮三国》和《孙悟空是个好员工》。这两本书充分体现了他的观点,有时间确实需要读一下。
2004年9月24日
智能科学大餐
2004年9月10 – 12日,由国家自然科学基金委员会信息科学部主办、中国人工智能学会和燕山大学承办的《智能科学技术基础理论重大问题研讨会》在燕山大学举行。来自智能科学、脑科学、认知科学、逻辑、哲学等学科交叉领域的代表50多人参加了会议,作了27个专题报告。
部分会议报告如下,仅供参考:
李衍达: 对智能研究的一些设想(ppt)
钟义信: 智能科学-世纪挑战,百年良机(ppt)
陆汝钤: 研究知识科学,发展知识工程,推进知识产业(ppt)
史忠植: 智能科学的基本问题(ppt)
王守觉: 仿生模式识别与机器形象思维(ppt)
郭爱克: 抉择的自然计算(ppt)
李德毅: 不确定性人工智能(ppt)
许卓群: Web of Distributed ntologies(ppt)
王飞耀: 词计算和语言动力学系统的计算理论框架(ppt)
周志华: 普适机器学习(ppt)
王珏: 机器学习研究回顾(ppt)
林方真: Many uses of classical logic(pdf)
何华灿: 广义智能科学的逻辑基础探讨(ppt)
童天湘: 智能化是信息化的必然趋势(doc)
经过逐之一学习,我感觉从中学习到了一些新的东西。学习体会如下:
1。云模型是新兴的一种理论[1]。我们在统计数学和模糊数学的基础上,用云模型来统一刻画语言原子和数值之间的随机性和模糊性,正向云发生器[2]是用语言值描述的某个基本概念与其数值表示之间的不确定性转换模型。云的数字特征用期望值Ex、熵En和超熵He三个数值表示。它把模糊性和随机性完全集成在一起,构成定性和定量相互间的映射,作为知识表示的基础。因为自然现象中的云也有着不确定的性质,我们就借用“云”来命名数据--概念之间的转换模型。云由许多云滴组成,每个云滴就是这个定性概念映射到数域空间的一个点,即一次带有不确定性的具体实现。模型同时给出这个云滴能代表该定性概念的确定程度。模型可以生成任意多个云滴。
反过来,我们用逆向云模型实现数值和语言值之间的随时转换。数据开采的一个基本问题是先有数据,然后才形成概念;先有连续的数据量,然后才有离散的符号量。
2。中心极限定理从理论上阐述了正态分布的条件,中心极限定理的简单直观说明:
如果决定某一随机事件结果的是大量微小独立的随机因素之和,并且每一因素的单独作用相对均匀的小,没有一种因素起到主导作用,那么这个随机变量服从正态分布。
正态分布是许多重要概率分布的极限分布,许多非正态的随机变量是正态随机变量的函数,正态分布的密度函数和分布函数有各种良好的性质和比较简单的数学形式,这些都使得正态分布在理论和实际中应用分布非广泛。在学习模式识别的数学基础[4]时了解到:“在所有的连续概率密度函数中,如果均值u和方差s(暂用s代替)都取已知的固定值,则使熵达到最大值的将是高斯分布(即正态分布),此时最大熵为H=0.5+log2(sqre(2*pi*s))(比特).”熵具有描述信息含量的特征,正态分布的这种最大熵特性决定了正态分布在自然界的广泛存在。
事实上现实世界中各种因素的单独作用并不是相对均匀的小。许多随机现象不能用正态分布来描绘。如果决定随机现象的因素单独作用不是均匀的小,相互之间并不独立,有一定程度的相互依赖,就不符合正态分布的产生条件,不构成正态分布,或者只能用正态分布来近似处理。概率论用联合分布来处理这类情况,但是通常联合概率分布的确定非常复杂,难以实际应用。李德毅院士提出用云模型来描述这类随机性,将正态分布拓展为泛正态,用一个新的独立参数---超熵,来衡量偏离正态分布的程度,这种处理方法比单纯用正态条件分布更为宽松,同时比联合概率分布简单,易于表示和操作。
3 。不确定性人工智能在研究人类认知活动的切入层次是自然语言层次。无疑这是对自然语言处理研究的一种肯定,也是给与了自然语言处理信心。
4。现在机器学习的研究出现了很多机遇和挑战。下面将以医疗和金融为代表来举几个例子:
例子1:代价敏感
医疗:以乳腺癌诊断为例,“将病人误诊为健康人的代价”与“将健康人误诊为病人的代价”是不同的
金融:以信用卡盗用检测为例,“将盗用误认为正常使用的代价”与“将正常使用误认为盗用的代价”是不同的
传统的ML技术基本上只考虑同一代价
如何处理代价敏感性?
在教科书中找不到现成的答案,例如:
Tom Mitchell, Machine Learning, McGraw-Hill, 1997
Nils J. Nilsson, Introduction to Machine Learning, draft 1996 - 2004
例子2:不平衡数据
医疗:以乳腺癌诊断为例,“健康人”样本远远多于“病人”样本
金融:以信用卡盗用检测为例,“正常使用”样本远远多于“被盗用”样本
传统的ML技术基本上只考虑平衡数据
如何处理数据不平衡性?
在教科书中找不到现成的答案
例子3:可理解
医疗:以乳腺癌诊断为例,需要向病人解释“为什么做出这样的诊断”
金融:以信用卡盗用检测为例,需要向保安部门解释“为什么这是正在被盗用的卡”
传统的ML技术基本上只考虑泛化不考虑理解
如何处理可理解性?
在教科书中找不到现成的答案
个人认为这些挑战的存在是机器学习存在和发展的动力之一。需要大家的努力和解决。三个问题中,我曾经遇到过的是第三个--数据的不平衡性。曾经采用过的方法是将不平衡通过适当裁减变成平衡的数据,但是这样一来会丢失很多的信息。采用决策树算法的时候没有进行裁减,同样可以学习,但是学习得到的结果需要仔细分析。
5。统计机器学习需要满足独立同分布条件,严厉。
对于这个独立同分布的前提条件,我自己的体会并不深刻。在采用神经网络、决策树完成一些任务时前提条件中并没有考察过这个条件满足与否。看到这个提示开始以为是机器学习算法中通常的各个特征之间相互独立的约束条件,后来仔细一想在用决策树的目的不就是要挖掘各种特征之间的相关性吗。 所以这里的不相关性是指前后数据的不相关。每次采样时不受到以往或者以后数据的影响。
写到这里我又想到了正态云模型分析实例中经常提到的评判射击运动员打靶成绩的标准的问题。通常的统计方法都会认为运动员的各次打靶之间是相互独立的,没有任何关系,但是实际上运动员的每次射击都受到前几次打靶成绩的影响,采用正态云模型中的超熵来分析这个问题时,超熵越小,则运动员各次射击之间的影响越小,运动员的心理素质越好,反之亦然。
因此我们在采用机器学习算法来完成一些任务的时候需要仔细分析这个前提假设,如过本来这个假设都布满足,那么随后出现的问题以及解决方案的出现都存在一些偶然因素。
参考文献
1 吕辉军,王晔,李德毅,刘常昱. 逆向云在定性评价中的应用. 计算机学报. 2003,26(8):1009~1014
2 李德毅,孟海军,史雪梅. 隶属云和隶属云发生器. 计算机研究与发展, 1995, 32(6):16~21
3 李德毅,刘常昱.论正态云模型的普适性. 中国工程科学.2004,6(8):28~34
4 Richard O.Duda 等著,李宏东 姚天翔 译. 模式分类. 北京:机械工业出版社.2003.9
部分会议报告如下,仅供参考:
李衍达: 对智能研究的一些设想(ppt)
钟义信: 智能科学-世纪挑战,百年良机(ppt)
陆汝钤: 研究知识科学,发展知识工程,推进知识产业(ppt)
史忠植: 智能科学的基本问题(ppt)
王守觉: 仿生模式识别与机器形象思维(ppt)
郭爱克: 抉择的自然计算(ppt)
李德毅: 不确定性人工智能(ppt)
许卓群: Web of Distributed ntologies(ppt)
王飞耀: 词计算和语言动力学系统的计算理论框架(ppt)
周志华: 普适机器学习(ppt)
王珏: 机器学习研究回顾(ppt)
林方真: Many uses of classical logic(pdf)
何华灿: 广义智能科学的逻辑基础探讨(ppt)
童天湘: 智能化是信息化的必然趋势(doc)
经过逐之一学习,我感觉从中学习到了一些新的东西。学习体会如下:
1。云模型是新兴的一种理论[1]。我们在统计数学和模糊数学的基础上,用云模型来统一刻画语言原子和数值之间的随机性和模糊性,正向云发生器[2]是用语言值描述的某个基本概念与其数值表示之间的不确定性转换模型。云的数字特征用期望值Ex、熵En和超熵He三个数值表示。它把模糊性和随机性完全集成在一起,构成定性和定量相互间的映射,作为知识表示的基础。因为自然现象中的云也有着不确定的性质,我们就借用“云”来命名数据--概念之间的转换模型。云由许多云滴组成,每个云滴就是这个定性概念映射到数域空间的一个点,即一次带有不确定性的具体实现。模型同时给出这个云滴能代表该定性概念的确定程度。模型可以生成任意多个云滴。
反过来,我们用逆向云模型实现数值和语言值之间的随时转换。数据开采的一个基本问题是先有数据,然后才形成概念;先有连续的数据量,然后才有离散的符号量。
2。中心极限定理从理论上阐述了正态分布的条件,中心极限定理的简单直观说明:
如果决定某一随机事件结果的是大量微小独立的随机因素之和,并且每一因素的单独作用相对均匀的小,没有一种因素起到主导作用,那么这个随机变量服从正态分布。
正态分布是许多重要概率分布的极限分布,许多非正态的随机变量是正态随机变量的函数,正态分布的密度函数和分布函数有各种良好的性质和比较简单的数学形式,这些都使得正态分布在理论和实际中应用分布非广泛。在学习模式识别的数学基础[4]时了解到:“在所有的连续概率密度函数中,如果均值u和方差s(暂用s代替)都取已知的固定值,则使熵达到最大值的将是高斯分布(即正态分布),此时最大熵为H=0.5+log2(sqre(2*pi*s))(比特).”熵具有描述信息含量的特征,正态分布的这种最大熵特性决定了正态分布在自然界的广泛存在。
事实上现实世界中各种因素的单独作用并不是相对均匀的小。许多随机现象不能用正态分布来描绘。如果决定随机现象的因素单独作用不是均匀的小,相互之间并不独立,有一定程度的相互依赖,就不符合正态分布的产生条件,不构成正态分布,或者只能用正态分布来近似处理。概率论用联合分布来处理这类情况,但是通常联合概率分布的确定非常复杂,难以实际应用。李德毅院士提出用云模型来描述这类随机性,将正态分布拓展为泛正态,用一个新的独立参数---超熵,来衡量偏离正态分布的程度,这种处理方法比单纯用正态条件分布更为宽松,同时比联合概率分布简单,易于表示和操作。
3 。不确定性人工智能在研究人类认知活动的切入层次是自然语言层次。无疑这是对自然语言处理研究的一种肯定,也是给与了自然语言处理信心。
4。现在机器学习的研究出现了很多机遇和挑战。下面将以医疗和金融为代表来举几个例子:
例子1:代价敏感
医疗:以乳腺癌诊断为例,“将病人误诊为健康人的代价”与“将健康人误诊为病人的代价”是不同的
金融:以信用卡盗用检测为例,“将盗用误认为正常使用的代价”与“将正常使用误认为盗用的代价”是不同的
传统的ML技术基本上只考虑同一代价
如何处理代价敏感性?
在教科书中找不到现成的答案,例如:
Tom Mitchell, Machine Learning, McGraw-Hill, 1997
Nils J. Nilsson, Introduction to Machine Learning, draft 1996 - 2004
例子2:不平衡数据
医疗:以乳腺癌诊断为例,“健康人”样本远远多于“病人”样本
金融:以信用卡盗用检测为例,“正常使用”样本远远多于“被盗用”样本
传统的ML技术基本上只考虑平衡数据
如何处理数据不平衡性?
在教科书中找不到现成的答案
例子3:可理解
医疗:以乳腺癌诊断为例,需要向病人解释“为什么做出这样的诊断”
金融:以信用卡盗用检测为例,需要向保安部门解释“为什么这是正在被盗用的卡”
传统的ML技术基本上只考虑泛化不考虑理解
如何处理可理解性?
在教科书中找不到现成的答案
个人认为这些挑战的存在是机器学习存在和发展的动力之一。需要大家的努力和解决。三个问题中,我曾经遇到过的是第三个--数据的不平衡性。曾经采用过的方法是将不平衡通过适当裁减变成平衡的数据,但是这样一来会丢失很多的信息。采用决策树算法的时候没有进行裁减,同样可以学习,但是学习得到的结果需要仔细分析。
5。统计机器学习需要满足独立同分布条件,严厉。
对于这个独立同分布的前提条件,我自己的体会并不深刻。在采用神经网络、决策树完成一些任务时前提条件中并没有考察过这个条件满足与否。看到这个提示开始以为是机器学习算法中通常的各个特征之间相互独立的约束条件,后来仔细一想在用决策树的目的不就是要挖掘各种特征之间的相关性吗。 所以这里的不相关性是指前后数据的不相关。每次采样时不受到以往或者以后数据的影响。
写到这里我又想到了正态云模型分析实例中经常提到的评判射击运动员打靶成绩的标准的问题。通常的统计方法都会认为运动员的各次打靶之间是相互独立的,没有任何关系,但是实际上运动员的每次射击都受到前几次打靶成绩的影响,采用正态云模型中的超熵来分析这个问题时,超熵越小,则运动员各次射击之间的影响越小,运动员的心理素质越好,反之亦然。
因此我们在采用机器学习算法来完成一些任务的时候需要仔细分析这个前提假设,如过本来这个假设都布满足,那么随后出现的问题以及解决方案的出现都存在一些偶然因素。
参考文献
1 吕辉军,王晔,李德毅,刘常昱. 逆向云在定性评价中的应用. 计算机学报. 2003,26(8):1009~1014
2 李德毅,孟海军,史雪梅. 隶属云和隶属云发生器. 计算机研究与发展, 1995, 32(6):16~21
3 李德毅,刘常昱.论正态云模型的普适性. 中国工程科学.2004,6(8):28~34
4 Richard O.Duda 等著,李宏东 姚天翔 译. 模式分类. 北京:机械工业出版社.2003.9
2004年9月23日
The Library of Second Campus of HIT
When the morning, I went to the library to find two papers in Chinese Academy of Engineering. But unfortunately there was not any one of this magazine in 2004. After search in the database, the administrator told me to come to the second campus. The two papers were wonderful after I read their abstract. Their titles were "Study on the University of the Normal Cloud Model" and "An Axiomatic Definition of Degree of Greyness pf Grey Number". And the school bus between first and second campuses was once a half hour.
It was nearly 8:30. I ran to the platform and went to second campus.
This was my second time to second campus. After about twenty-five minutes, I got off near by the library.
I wa not familiar with the library of second campus. So I asked some worker for help. The basic setting of this libray was same as that of the first campus but some architecture style. I came to the science and technology magazine reading room and found the magazing soon.
After half an hour I returned back to first campus. The two papers were rare for me. I could study them carefully.
It was nearly 8:30. I ran to the platform and went to second campus.
This was my second time to second campus. After about twenty-five minutes, I got off near by the library.
I wa not familiar with the library of second campus. So I asked some worker for help. The basic setting of this libray was same as that of the first campus but some architecture style. I came to the science and technology magazine reading room and found the magazing soon.
After half an hour I returned back to first campus. The two papers were rare for me. I could study them carefully.
2004年9月22日
Artificial Intelligence with Uncertainty
The link of this ppt is: http://www.intsci.ac.cn/research/lidy04.ppt
When I reviewed the slides about this topic, I was excited. As there was some uncertainty elements related to Grey system theory.
When I studied it, I found lots of useful knowledge. I listed some as follows:
1. Artificial Intelligence with Uncertainty was a new research field in AI.
2. There is a wonderful discuss about the revert theory:
对《还原论》的质疑
如同我们不能从最基础的硅芯片的活动来推测计算机网络上电子邮件的行为一样,我们不可能从分析单个离子、神经元、突触的性质去理解人们欣赏落日美景的感受。因此,我们怎么能够设想从分析单个器官、细胞、基因、蛋白质分子的性质和神经传导就能够推断人脑的认知和思维活动呢?系统论关于系统整体特征不是由低层元素加和而成的原理对还原论提出质疑。
3.It was more difficult playing go with computer than playing chess.
象棋有明确的最终目标状态;围棋没有。
电脑象棋可以从一个目标状态不断搜索最合理的走法(推理)达到下一个目标状态;围棋没有。
电脑象棋可以有目的地向着某一目标状态不断搜索最合理的走法(推理);围棋没有。
围棋想围住对方,在某个状态下应对的步骤比象棋要多得多,更注重形象思维,更大局观。
4. The research layer of Artificial Intelligence with Uncertainty was natural language. So we, NLP researchers, should be happy with this conclusion :)
5. 追求普遍性、深刻性和富有意义,追求真和美,是基础研究和交叉研究的魅力所在。领略不同学科交汇乃至科学和艺术的交汇所产生的奇妙景观是人生的一大享受。
Grey system is of the uncertain problem. So what is the effect of Grey system in Artificial Intelligence with Uncertainty? It is very interesting.
When I reviewed the slides about this topic, I was excited. As there was some uncertainty elements related to Grey system theory.
When I studied it, I found lots of useful knowledge. I listed some as follows:
1. Artificial Intelligence with Uncertainty was a new research field in AI.
2. There is a wonderful discuss about the revert theory:
对《还原论》的质疑
如同我们不能从最基础的硅芯片的活动来推测计算机网络上电子邮件的行为一样,我们不可能从分析单个离子、神经元、突触的性质去理解人们欣赏落日美景的感受。因此,我们怎么能够设想从分析单个器官、细胞、基因、蛋白质分子的性质和神经传导就能够推断人脑的认知和思维活动呢?系统论关于系统整体特征不是由低层元素加和而成的原理对还原论提出质疑。
3.It was more difficult playing go with computer than playing chess.
象棋有明确的最终目标状态;围棋没有。
电脑象棋可以从一个目标状态不断搜索最合理的走法(推理)达到下一个目标状态;围棋没有。
电脑象棋可以有目的地向着某一目标状态不断搜索最合理的走法(推理);围棋没有。
围棋想围住对方,在某个状态下应对的步骤比象棋要多得多,更注重形象思维,更大局观。
4. The research layer of Artificial Intelligence with Uncertainty was natural language. So we, NLP researchers, should be happy with this conclusion :)
5. 追求普遍性、深刻性和富有意义,追求真和美,是基础研究和交叉研究的魅力所在。领略不同学科交汇乃至科学和艺术的交汇所产生的奇妙景观是人生的一大享受。
Grey system is of the uncertain problem. So what is the effect of Grey system in Artificial Intelligence with Uncertainty? It is very interesting.
2004年9月21日
The English presentation and discussion
The colloquium of this evening was the first doctoral paper reading colloquium with pure English. During the discussion time, we must speak English without any Chinese words.
The first speaker was Wanxiang Che. The main topic was Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with
Perceptron Algorithms. His presentation was perfect. I think so. When he was speaking, I fell this kind of chance of English presentation was very scarce for us. I should cherish it.
After his presentation we discussed the detail of his presentation.
English was so important for us. I should keep practise my English capability.
The first speaker was Wanxiang Che. The main topic was Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with
Perceptron Algorithms. His presentation was perfect. I think so. When he was speaking, I fell this kind of chance of English presentation was very scarce for us. I should cherish it.
After his presentation we discussed the detail of his presentation.
English was so important for us. I should keep practise my English capability.
2004年9月20日
GMCM(4)
This is the final day. I went to sleep at 4:00 and get up at 7:00. The third problem was more difficult than the above two. We must finish a emluator as the solution of the fourth problem. But as the circumgyratetion of the tapers were not clearly we could not realize it.
The left five problems were be solved by Qiyue Yu and Yu Liu.
There were nine hours after my getting up to hand in our paper. It was so quickly that we could feel it. When we finished our work we had not yet any food.
So tired! We all need a good rest.
The left five problems were be solved by Qiyue Yu and Yu Liu.
There were nine hours after my getting up to hand in our paper. It was so quickly that we could feel it. When we finished our work we had not yet any food.
So tired! We all need a good rest.
2004年9月19日
GMCM(3)
There were only three and a quarter days in this contest. This was the third one. After solved the first problem, as the little more restriction, I had believed that I could solve the second one very easy. But there were lots of problems in it.
At about 23:50, I got the perfect solution. The rule was simple. But I had not used it before.
So tired but must stick to.
At about 23:50, I got the perfect solution. The rule was simple. But I had not used it before.
So tired but must stick to.
2004年9月18日
GMCM(2)
The second day I was not very lucky. The second sub-problem was more difficult with some more strict restriction than the first one. The three schemes we obtained were not perfect in my mind. I discussed it with Yu Liu who was my partner. But still not clearly solution we had got now. This problem occupied my whole day to a certain extent. So it was the bottleneck of my tasks. I must solve it.
Some exciting news were that three more sub-problems were solved by Qiyue Yu. The draft paper on the solved problems was being written by Yu Liu. And one of my friends sent some watermelon to us and incouraged us.
Try, try, and try!
Some exciting news were that three more sub-problems were solved by Qiyue Yu. The draft paper on the solved problems was being written by Yu Liu. And one of my friends sent some watermelon to us and incouraged us.
Try, try, and try!
2004年9月17日
GMCM(1)
This was the first day of the first national graduate mathematical contest of modeling(GMCM, shorted by myself). GMCM was little different with UMCM(MCM for Undergraduate). There were four problems of GMCM, But only two for UMCM. The first one was related to positioning system. Lots of related knowledge and sub-problems need to be solved. The second one was the minimum cost of materials problem. The third one was about data mining problem. There were lots of information about the after-sale services of automobiles. The last one was about bi-selection between supervisors and graduates.
After our discussion we chose the first one. It was hard. As it had nine sub-problems. And we spent nearly one whole day to solve the first one. Fortunately, we had got some perfect solution. We were excited. But we had little sleep in the night.
Let us try our best to solve the problems left.
After our discussion we chose the first one. It was hard. As it had nine sub-problems. And we spent nearly one whole day to solve the first one. Fortunately, we had got some perfect solution. We were excited. But we had little sleep in the night.
Let us try our best to solve the problems left.
2004年9月16日
Prepare for the first graduate mathematical modeling contest
Tomorrow I will take part in the first graduate mathematical modeling contest. This is another four days.
I will try my best again! Blessing myself!!
I will try my best again! Blessing myself!!