Bill_Lang: 2005-08

2005年8月31日

Find out some nice courses materials

In the left links, I added three "concerned courses". They all were very good.
I had seen many wonderful courses. But I had not learned them carefully. So I believed I could list them in my blog. And I could study them often.

Just sharing to my blog readers! Welcome~!

2005年8月30日

Dinner with MSRA friends

We had dinner in DongBei again this evening. Guihong Cao would check out tomorrow. We had a farewell this evening. I remembered his angling on river icecap in winter night in Canada. It was a beautiful scene.
After our dinner, we had chatting on many interesting topics.
Nice dinner, nice talk!

2005年8月29日

Minipar使用方法：step by step

不知道各位有没有用过那个著名的Dekang Lin的英文下的依存分析工具Minipar。上午Yichen提起说不知道如何编译通过，考虑到过些日子我也会用到这个东西，我自告奋勇的把它放入VS7.0调试。本以为在当初调试C45R8经验的基础上能够很快完成这个工作，结果却迟迟调试不通。代码是Linux风格的，装上Cygwin后结果还是未能通过。刚才在Chengjie Sun的帮助下才解决掉这个问题。记得在咱们IR-BBS上有人问过类似的问题，我在这里给出运行成功下面的例子pdemo的全部过程。

1. 下载Minipar
在连接 http://www.cs.ualberta.ca/~lindek/minipar.htm下填表得到minipar-0.5-Windows.tar
2. 解压
用Winrar解压后得到minipar-0.5-Windows的没有后缀名的文件，人工添加后缀为zip然后再用Winrar解压得到解压包
里面包含如下子文件夹

minipar-0.5-Windows
|-README
|-data
|-include
|-lib
|-pdemo

3. 设置环境变量
右键单击“我的电脑”，在“高级”中选择“环境变量”。新建变量“MINIPATH”，值为上面data的绝对路径，如“D:\MiniPar\minipar-0.5-Windows\data”。

4. 设置VC7.0
新建VC++7.0 Windows控制台空项目，添加pdemo下的cpp文件pdemo.cpp。
打开项目属性设置“Configuration Properites”下面的C/C++下的General中的Additional Include Directories，添加上面的include文件夹路径
打开项目属性设置“Configuration Properites”下面的Linker下的General中的Additional Library Directories，添加上面的lib下的debug文件夹路径；在同层下的Additional Dependencies中添上“minipar.lib Ws2_32.lib”（中间有空格）

5. 编译运行
Build后在DOS界面下进入对应的Debug下，输入"pdemo.exe -p "D:\MiniPar\minipar-0.5-Windows\data" ",即可初始化依存分析平台，在>后输入英文句子即可进行依存分析，如下

> Many students like programming.
(
E0 (() fin C * )
1 (Many ~ Det 2 det (gov student))
2 (students student N 3 s (gov like))
3 (like ~ V E0 i (gov fin))
E2 (() student N 3 subj (gov like) (antecedent 2))
4 (programming ~ N 3 obj (gov like))
5 (. ~ U * punc)
)
>

至此完成了全部的工作，MINIPAR的中间结果可以被用于其它程序中，可以用来很方便的找到句子中的依存结构。
以上步骤留做记录，便于以后查找。

2005年8月28日

幸福的时空：阅读《复杂》

很久没有用中文写blog了，现在心中一种不可名状的兴奋让我感觉用中文来和大家分享更恰当一些。这就是我近两日阅读《复杂》的感觉。
晚上一位朋友来看我的位置看望我，提及《复杂》时他很惊讶，想必你一定和他一样。其实我读的这本书和我们计算机中平时讲的算法的复杂性几乎没有关系。这本书的全称是《复杂──走在秩序与混沌边缘》（Complexity: The Emerging Science at the Edge of Order and Chaos）。连接就在我这个blog左侧的阅读列表里面。
这是一本电子书，所以阅读的只能是晚上在研究院的业余时间，回到寝室就不能看了。前一阵子阅读这本书的感觉是在讲经济学中的混沌现象以及建立桑塔费研究所的令人兴奋的经过。现在我已经阅读完了前六章，还剩下三章。这本书的撰写方式我的感觉有点散，每出现一位传奇的人物都会转移话题到这个人上半天，好不容易才将话题转回原来，然后继续介绍。整本书我想更多的是在介绍复杂性非线性系统的创立背景，其中的桑塔费研究所成了必须介绍的机构。其中包含了很多精英科学家，因此自然也就要好好介绍他们的经历。当然整本书还是在围绕复杂性系统而展开的，经济学是其中涉及到的第一个实例。
我为之兴奋的是其中对那些牛人的传记介绍。这些牛人们有一个共同的特点。那就是，他们都在作他们为之兴奋的事情，完成他们的研究和学习完全是一种主动的事情。其中介绍了分子自动机，遗传算法，经济非稳定网络等有趣的东西。他们的工作完全都是自主的，而且经常出现自己的想法和别人的想法虽然不同领域但是却非常类似的情况。这让他们的研究很富有激情，而且经常出现灵感。书中崭露出的是研究人员的本性，他们的工作是富有创造性的，而且都能让人在兴奋状态下运作。
中午吃完饭和刚从VS转为正式员工的石磊聊了一会儿。他对我现在在做的工作非常有兴趣。出于保密我现在不能在我的blog中介绍我在这边的工作。这个事情非常具有挑战性。从阅读《复杂》的过程中我找到了很多灵感。我想继续发掘和深化我的想法，让我负责的模块有质的飞跃。
《复杂》真是本好书。可惜我阅读它都是断断续续的，等读完这一遍后我一定会再读几遍。其中的内容让我彷佛回到了小学和中学时对科学的那种由衷的崇拜，也让我认识到了从事科学研究的乐趣。我会继续的。

2005年8月27日

Swimming in four hours

Jianguo Du, Yi Chen and I had been swimming between 13:00 to 17:20. It's the longest period of my swimming history. I felt little tired. I believed I would get up late tomorrow morning.

I met an elder man again. I met him in my all four times swimming. He said he was 75 now and swam every afternoon. He could swim 1500 meters. We three all admired him.

Good health is the basis of all. I felt good at swimming.

2005年8月26日

Wonderful BBQ of MSRA

After long time preparation, the BBQ(barbeque) of MSRA had come this evening. We all gathered at the front of Sigma Building. Our goal was Central Garden, Friendship Hotel. When it was 19:00, all the MSRA interns and Miscrosoft Student Summer Camp attendees came here. Our evening started. There were nearly 500 persons.

At the beginning, I met the four MSClub students of our HIT. They were Zhongqi Yang, Yongqing Xiang, Chao Zhang, Junheng Zhang. They were the current leaders of MSClub of HIT. They had come in Beijing before four days and would return back to Harbin tomorrow morning. We were very happy to meet each other. We had many group photos. I wishes them nice trip.

There were many programs of this evening. Most were of singing. And some interesting games were hold. The most excited was that our Libra Poster for our Ping Pong Club had won the award of "the Best of Best". We five Libra guys were all excited. Harry was humorous. He trumped like us and said some jokes to us. My mentor Ming Zhou was very happy seating around our NLC VSs. At the end, there were two dancing programs. Ming Zhou went to the stage twice and played with us. He said that was very interesting.

During the activity, I chatted with some friends and had group photos with them. Xu Sun, who was a VS of NLC, said that he had seen me in SWCL last year in Beijing Language University. He was the student of Houfeng Wang, who was the leader in anaphora resolution I thought. We discussed some problem on anaphora resolution. I had known some research situation of Houfeng Wang's group on Anaphora resolution. I planed to visit Dr. Houfeng Wang. Now Xu Sun's work in BeiDa was on Abbreviation Resolution. It was very important for Anaphora resolution and Coreference Resolution. It seemed that their works on Anaphora resolution were in deep level. We discussed some related works.

It's a nice evening. Thanks to MSRA!

2005年8月25日

Wonderful Presentation of Study Group

This afternoon, Guihong Cao and Shenghua Bao gave us two wonderful presentations respectively.

Guihong's topic was on his SIGIR 05 paper: Integrating Word Relationships into Language Models. As it's his own paper, he presented it in very detail. The main idea of his paper was combining co-occurrence and WordNet for Information Retrieval. In his talk, I knew that Lemur had the ability of language modeling. And it had three smoothing techniques. I had heard that it could be used for Information Retrieval. However, I heard its language modeling ability firstly. It's interesting and useful. I believed that I should mater more tools for NLP research. There was another measure for evaluation of information retrieval. It was non-interpolated average precision. It was used in TREC Evaluation.

Shenghua gave us a talk on Comparative Study of Name Disambiguation Problem Using a Scalable Blocking-based Framework. This paper was on Name Disambiguation. I thought it was a subtype of coreference resolution and could be viewed as person name coreference resolution. The only feature used was co-author of papers. There was a heuristic idea that we could do coreference resolution by web information. It was newly and could do lots of topics research.

Thanks them for their so nice talks. Next time, it was my turn to give presentation. There was only 13 days left. I should prepare it day by day.

2005年8月24日

[Collection]Machine Learning大家(续)

发信人: Car (得之,我幸,不得,我命;如此而已), 信区: AI
标题: Machine learning 大家（5): Michael Collins
发信站: 哈工大紫丁香 (Tue Aug 23 19:50:58 2005), 转信

Michael Collins (http://people.csail.mit.edu/mcollins/

自然语言处理(NLP)江湖的第一高人。出身Upenn，靠一身叫做Collins Parser的武功在江湖上展露头脚。当然除了资质好之外，其出身也帮了不少忙。早年一个叫做Mitchell P. Marcus的师傅传授了他一本葵花宝典-Penn Treebank。从此，Collins整日沉迷于此，终于练成盖世神功。

学成之后，Collins告别师傅开始闯荡江湖，投入了一个叫AT&T Labs Research的帮会，并有幸结识了Robert Schapire、Yoram Singer等众多高手。大家不要小瞧这个叫AT&T Labs Research的帮会，如果谁没有听过它的大名总该知道它的同父异母的兄弟Bell Labs吧: 言归正传，话说Collins在这里度过了3年快乐的时光。其间也奠定了其NLP江湖老大的地位。并且练就了Discriminative Reranking，Convolution Kernels，Discriminative Training Methods for Hidden Markov Models等多种绝技。然而，世事难料，怎奈由于帮会经营不善，这帮大牛又不会为帮会拼杀，终于被一脚踢开，大家如鸟兽散了。Schapire去了Princeton，Singer也回老家以色列了。Collins来到了MIT，成为了武林第一大帮的六袋长老，并教授一门叫做的Machine Learning Approaches for NLP (http://www.ai.mit.edu/courses/6.891-nlp/的功夫。虽然这一地位与其功力极不相符，但是这并没有打消Collins的积极性，通过其刻苦打拼，终于得到了一个
叫Sloan Research Fellow的头衔，并于今年7月，光荣的升任7袋Associate Professor。

在其下山短短7年时间内，Collins共获得了4次世界级武道大会冠军(EMNLP2002,
2004, UAI2004, 2005)。相信年轻的他，总有一天会一统丐帮，甚至整个江湖。

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
发信人: Car (得之,我幸,不得,我命;如此而已), 信区: AI
标题: Machine learning 大家（6): Dan Roth
发信站: 哈工大紫丁香 (Wed Aug 24 18:52:49 2005), 转信

Dan Roth
http://l2r.cs.uiuc.edu/~danr/

统计NLP领域的众多学者后，我得出了一个惊人的结论，就是叫Daniel的牛人特别多:
大到MT领域成名已久的Prof. Dan Melamed，小到Stanford刚刚毕业的Dan Klein，中
间又有Dan jurafsky这种牛魔王，甚至Michael Collins的师弟Dan Bikel
(IBM Research)，ISI的Dan Marcu，获得过无数次TREC QA评比冠军的
Prof. Dan Moldovan (UTexas Dallas)，UC Berkeley毕业的Dan Gildea
(U Rochester)。但是，在众多的Dan中，我最崇拜的还是UIUC的Associate
Professor，其Cognitive Computation Group的头头Dan Roth。

这位老兄也是极其年轻的，Harvard博士毕业整十年，带领其团队撑起了UIUC Machine Learning以及NLP领域的一片灿烂天空。其领导开发的SNoW工具可谓是一把绝世好剑，基本达到了"又想马儿跑，又想马儿不吃草"的境界，在不损失分类精度的条件下，学习和预测速度空前。什么？你不知道SNoW？它和白雪公主有什么关系？看来我也得学学"超女"的粉丝们，来一个扫盲了: SNoW是Sparse Network of Winnows的简称，其中实现了Winnow算法，但是记住Sparse Network才是其重点，正是因为有了这块玄铁，SNoW之剑才会如此锋利。

近年来Roth也赶时髦，把触角伸向了Structured Data学习领域，但与其他人在学习的
时候就试图加入结构化信息(典型的如CRF)不同，Roth主张在预测的最后阶段加入约束
进行推理，这可以使的学习效率极大的提高，同时在某些应用上，还取得了更好的结果。
还有就是什么Kernel学习，估计他也是学生太多，安排不下了，所以只好开疆扩土。

Harvard出身的Roth，理论功底也极其深厚，好多涉及统计学习理论的工作就不是我这种学工科的人关心的了。

最后广播一条小道消息，Roth正在招Post-Dor，感兴趣的可以联系一下他，呵呵

2005年8月23日

Happy birthday to Jizhou and Shenghua

It was so coincident that Jizhou and Shenghua had the same birthday and in same group of MSRA.
We nine VSs had a nice dinner in Jiutouying. There were two related blogs about it: Jizhou and Xiaoyuan Cui

Good luck to them!

2005年8月22日

Know more clearly about my task

Before I had come here, I had not known my definite task. And during the past three months, I had done many dispersive works. In this month, I had done some research on dialogue. This afternoon, after our project weekly discussion, I had known my tasks and steps very clearly.

I believed it was a piece of good news for me. Then, I could do my research like my feeling on swimming. Let me just do the survey now!

2005年8月21日

[Collection]Machine Learning大家

最近俺们紫丁香BBS AI版上网友University撰写了“Machine Learning大家”系列帖子，特转载于此。University博闻强记，感谢！

发信人: university (侠之大者，学之小者), 信区: AI
标题: Machine Learning 大家(1)：M.I.Jordan
发信站: BBS 哈工大紫丁香站 (Wed Aug 17 18:06:26 2005)
闲着无事，想写点一些我所了解的machine learning大家。由于学识浅薄，见识有限，并且仅局限于某些领域，一些在NLP及最近很热的生物信息领域活跃的学者我就浅陋无知，所以不对的地方大家仅当一笑。

在我的眼里，M Jordan无疑是武林中的泰山北斗。他师出了MIT，现在在berkeley坐镇一方，在附近的两所名校（加stanford）中都可以说无出其右者，stanford的Daphne Koller虽然也声名遐迩，但是和Jordan比还是有一段距离。Jordan身兼stat和cs两个系的教授，从他身上可以看出Stat和ML的融合。
Jordan最先专注于mixtures of experts，并迅速奠定了自己的地位，我们的校友徐雷跟他做博后期间，也在这个方向上沾光不少。Jordan和他的弟子在很多方面作出了开创性的成果，如spectral clustering， Graphical model和nonparametric Bayesian。现在后两者在ML领域是非常炙手可热的两个方向，可以说很大程度上是Jordan的lab一手推动的。

更难能可贵的是，Jordan不仅自己武艺高强，并且揽钱有法，教育有方，手下门徒众多且很多人成了大器，隐然成为江湖大帮派。他的弟子中有10多人任教授，个人认为他现在的弟子中最出色的是stanford的Andrew Ng，不过由于资历原因，现在还是assistant professor，不过成为大教授指日可待；另外Tommi Jaakkola和David Blei也非常厉害，其中Tommi Jaakkola在mit任教而David Blei在cmu做博后。
并且由于形成了向心力，现在投奔Jordan的牛人非常多。如以前stanford的Ben Taskar，数次获得NIPS最佳论文奖，把SVM的最大间隔方法和Markov network的structure结构结合起来，赫赫有名。还有一个博后是来自于toronto的Yee Whye Teh，非常不错，有幸跟他打过几次交道，人非常nice。另外还有一个博后居然在做生物信息方面的东西，看来jordan在这方面也捞了钱。
总的说来，我觉得Jordan现在做的主要还是graphical model和Bayesian learning，他去年写了一本关于graphical model的书，今年由mit press出版，应该是这个领域里程碑式的著作。3月份曾经有人答应给我一本打印本看看，因为Jordan不让他传播电子版，但后来好像没放在心上（可见美国人也不是很守信的），人不熟我也不好意思问着要，可以说是一大遗憾. 另外发现一个有趣的现象就是Jordan对hierarchical情有独钟，相当多的文章都是关于hierarchical的，所以能hierarchical大家赶快hierarchical，否则就让他给抢了。

发信人: university (侠之大者，学之小者), 信区: AI
标题: Machine Learning大家（2）：D Koller
发信站: BBS 哈工大紫丁香站 (Thu Aug 18 15:25:58 2005)
最先知道D koller是因为她了一个大奖，不过不好意思的是这些奖的名字过于冗长并且拗口，所以我没记住。说起这个奖挺有意思的，80年代一个牛人指出机器只能独立学习而得到了这个奖，但是koller却因提出了Probabilistic Relational Models 而证明机器可以推理论知而又得到了这个奖，可见世事无绝对，科学有轮回。

D koller的Probabilistic Relational Models在nips和icml等各种牛会上活跃了相当长的一段时间，并且至少在实验室里证明了它在信息搜索上的价值，这也导致了她的很多学生进入了google。虽然进入google可能没有在牛校当faculty名声响亮，但要知道google的很多员工现在可都是百万富翁，在全美大肆买房买车的主。

Koller的研究主要都集中在probabilistic graphical model，如Bayesian网络，但这玩意我没有接触过，我只看过几篇他们的markov network的文章，但看了也就看了，一点想法都没有，这滩水有点深，不是我这种非科班出身的能趟的，并且感觉难以应用到我现在这个领域中。

Koller才从教10年，所以学生还没有涌现出太多的牛人，这也是她不能跟Jordan比拟的地方，并且由于在stanford的关系，很多学生直接去硅谷赚大钱去了，而没有在学术界开帮立派。其实大多数的professore都是希望自己的学生以后能当faculty，好在学术界形成更大的影响，但在stanford这可能太难以办到，因为金钱的诱惑实在太大了。不过Koller的一个学生我非常崇拜，叫Ben Taskar，就是我在（1）中所提到的Jordan的博后，是好几个牛会的最佳论文奖，他把SVM的最大间隔方法和Markov network结合起来，可以说是对structure data处理的一种标准工具，也把最大间隔方法带入了一个新的热潮，近几年很多牛会都有这样的workshop。我最开始上Ben Taskar的在stanford的个人网页时，正赶上他刚毕业，他的顶上有这么一句话：流言变成了现实，我终于毕业了！可见Koller是很变态的，把自己的学生关得这么郁闷，这恐怕也是大多数女faculty的通病吧，并且估计还非常的push！

发信人: university (侠之大者，学之小者), 信区: AI
标题: Machine learning 大家（3）:JD Lafferty
发信站: BBS 哈工大紫丁香站 (Sun Aug 21 16:37:16 2005)
大家都知道NIPS和ICML向来都是由大大小小的山头所割据，而John Lafferty无疑是里面相当高的一座高山，这一点可从他的publication list里的NIPS和ICML数目得到明证。虽然江湖传说计算机重镇CMU现在在走向衰落，但Lafferty却拥有越来越大的影响力，翻开AI兵器谱排名第一的journal of machine learning research的很多文章，我们都能发现author或者editor中赫然有Lafferty的名字。

Lafferty给人留下的最大的印象似乎是他2001年的conditional random fields，这篇文章后来被疯狂引用，广泛地应用在语言和图像处理，并随之出现了很多的变体，如Kumar的discriminative random fields等。虽然大家都知道discriminative learning好，但很久没有找到好的discriminative方法去处理这些具有丰富的contextual information的数据，直到Lafferty的出现。

而现在Lafferty做的东西好像很杂，semi－supervised learning， kernel learning，graphical models甚至manifold learning都有涉及，可能就是像武侠里一样只要学会了九阳神功，那么其它的武功就可以一窥而知其精髓了。这里面我最喜欢的是semi－supervised learning，因为随着要处理的数据越来越多，进行全部label过于困难，而完全unsupervised的方法又让人不太放心，在这种情况下semi－supervised learning就成了最好的折中方法。但可惜的是，个人认为学术界对如何用supervised信息控制未label的data似乎还没有一个比较清晰的认识，不过这也给了江湖后辈成名的可乘之机。到现在为止，我觉得cmu的semi－supervised是做得最好的，以前是KAMAL NIGAM做了开创性的工作，而现在Lafferty和他的弟子作出了很多总结和创新。

Lafferty的弟子好像不是很多，并且好像都不是很有名。不过今年毕业了一个中国人，Xiaojin Zhu，就是做semi－supervised的那个人，现在在wisc做assistant professor。他做了迄今为止最全面的Semi-supervised learning literature survey，大家可以从他的个人主页中找到。这人看着很憨厚，估计是很好的陶瓷对象。另外我在（1）中所说的Jordan的牛弟子D Blei今年也投奔Lafferty做博后，就足见Lafferty的牛了。

2005年8月20日

Swimming & Research

I liked swimming now. I had tasted some nice feeling of it. This afternoon, Jianguo Du and Songnan Li came here for swimming by my invitation. This was my third time swimming in Beijing. At the second time, I only can breaststroke one time of twenty-five meters. At each end of 25 meters, I would inhale water. This afternoon, I tried more on breaststroke. Finally, I could swim 100 meters without inhaling Water.

I thought more about my progress of swimming. I found some link between swimming and doing research. Doing research, I believed the baseline system of your topic was very important. For construction the basic system, you should read many papers and design your architecture. Then you could make some simple modules one by one. Finally, you should link your modules, try once and evaluate it by some mechanism. The first done basic modules might be very poor. And your final system had very lower performance.

To swimming, you would be not able to at the beginning. Somebody will tell you how to swim and the motions. It was just like reading papers and knowing the whole architecture. Then you would have your own understanding about swimming. Then you would try to practice each motion. It was just like making each module. Affirmatively, you could not do best one each motions. You should try them one by one. In swimming, every fresh guy was excited. He would like to try the whole process at any time. So at first, he would fail once and once again. Only when he had done all the modules with better performance, he might success once. It was just like the baseline system of your research work. Then he tried the process by all means. Until now, he could understand the importance of each motion and try to upgrade them. During the process, his evaluation standard would change to be finer and finer. It's just like the first research experience on the baseline system. Then after fine feeling of each module during the running, each module would be improved. Finally the baseline system would become the best system. But this was the first method of swimming, based on the experience of the first baseline system, he would try other method. And after similar process, he would master more and more methods. And finally, he would link all the methods and found some commons and put forward some newly methods.

Yeah. Just like swimming, you must practice instead of keep standing on banks.

2005年8月19日

Farewell to Chengjie Sun

Chengjie Sun will go back to Harbin for his visa to Korea. We three had a dinner on Caozhou Restaurant near Sigma Building.

2005年8月18日

Begin to try OneNote

It's a nice software just like the combination of MindManager and Tadalist. I used it to edit our project discussion result yesterday. It was convenient and feasible. I liked it and would try it more for my research and project.

2005年8月17日

Knowing what you will do is the first!

This is Aug. 17th. I had come here three months. It was half of our study in MSRA. During the past three months, I had learned a lot, knew many friends from different universities, listened many research talks with high-quality, did survey on my research topics, organized ten ping pong activities, and began to like swimming.

This afternoon, our project group had the routine meeting. We discussed only my recent tasks report. After my presentation, our mentor described my task scene in detail. Until now, we four VSs knew our tasks in detail again. As we had done much work on our project, our goals and tasks were more and more clearly.

Yeah! I believed that knowing what you will do was the first. Our four tasks were all novel and interesting. We all would concentrate on them. I believed we could do the research at the state of the art.

This was the weekly ping pong day. There were about ten accepted my invitation. We all could have a nice sport evening!

2005年8月16日

Do you master your tools?

Sharpening your axe will not delay your job of cutting wood. To our everyday using office software, such as word, outlook, excel, powerpoint, I believed nobody had mastered their total functions.

This evening, I read many training materials on office tools. They were about Outlook, Project and Onenote. During the reading, I found that Miscrosoft Sharepoint was a nice tool for team work. Onenote could be used as To Do List.

Why you not read them? Please visit: http://office.microsoft.com/zh-cn/default.aspx

2005年8月15日

How to found an exciting and effective research organization?

To the title, it was a tremendous problem to the organizer. I had the experience on organizing the machine learning group of our university. I had encountered so many difficulties. And after three times colloquia, we had nearly no presentation volunteers and sharing topics.

I believed that now I knew more about how to solve this problem. Or I knew the important issues on this topic. In the second chapter of Complexity, there was a main subject about the motivation and the beginning of Santa Fe Institute. I had known nothing about this institute before reading this chapter. The goal of it was setting up a place for cross-subject scientists rap off. The first organizer George Cowan had the original idea on building Grand Unified Theories(GUT). It was based on complexity theory. During their first two colloquia, they had got more universal resonance. Then with the development and influence, there were more and more scientists who were interested in GUT on complexity. There was one thing important for this institute. That was so many Nobelists were its members. During the process, I realized that the most important thing for organizing such organization was how to pilot and mobilize the interesting of all.

During this chapter, there was one viewpoint which was more and more clearly. It was many research topics owned something same at some degree. This happened between many areas, such as physics, biology, biologist and economics. I thought, in chapter two, it was the complexity.

I considered that I was more like reading this book. It was good enough!

2005年8月14日

Nice Sunday: read and swim

I found I was fond of reading books. Yesterday, I had found the book Complexity and been attracted by the first chapter. I found out that English books had a kind of style. That was listed many questions which could attract readers' interesting. Just like this book.

I had read the first chapter yesterday. Today was in weekend. After done some works on our project, I began to keep on reading it. I had only Chinese translation version. But I thought the translation was very good. In chapter two, author introduced the phenomena of giving to owners. I understood it like Matthew Effect. Yes. Author had introduced so many examples for explaining it. Author used this for anatomy of new economy. In this chapter, I learned something that was very important for our research on any topic or area. They were erudition and transfixion. Author said something about his idea on how he had found these phenomena and where was his inspiration. Our common, I believed, was hard to understand there was some link between physics, gene and economics. But after the author had read many books on these areas, he got his idea and whole architecture of his theory. I believed you could not understand this process. So I recommended you read it. It was so interesting that I ensure you would like to read it like me.

Wonderful books were the source for our life and research. I knew an archaic sentence "问渠哪得清如许，唯有源头活水来". It was just like the previous English sentence.

This afternoon, Chengjie Sun and I went to Sunshine Swimming pool again. During the two hours, I practiced my breast stroke. I tried more on exchanging my breath at the same place. Then I began to swim from this side to another. Finally, I mastered all the techniques of breast stroke. I could swim form this end to another. It was good news. I felt I should practice more. Do and learn it. This was the rule.

So nice day, it was. I liked this weekend. I should do more works on my project from tomorrow. Let me try more.

2005年8月13日

One nice book: Complexity

When I reviewed my favorite blogs, I found a nice book for reading. It was Complexity: The Emerging Science at the Edge of Order and Chaos, 1995 by Mitchell Waldrop.

Its English introduction was in http://www.tnellen.com/ted/tc/complexity.html

I had found its Chinese virsion on http://www.egr.msu.edu/~hujianju/onlinebook/complexity/ComplexityIndex.htm.

After read the introduction, I am interested in it. I had added it to my reading list.

2005年8月12日

Libra won the award of poster design competition of MSRA

We won the awards of MSRA Student Poster Design Competition! Yeah~! That's very exciting news of our Libra in MSRA. We five main members were Yuhao Zheng, Tian Fang, Ying Song, Zhizhi Zhou, and me. We had prepared this poster four days and discussed so many ideas. It could be observed by my blogs.

Yes. Maybe the character of Libra was balanceable and hesitant. We had not confirmed our final design schema before yesterday. Our final schema was based on the windows game of Mine Sweeping. We used twenty-nine ping pong tables for our main frame of PPC. And there were many numbers like in the game were surrounding the ping pong balls. The timetable above the numbers and balls was two red tags. Their number were 730 and 930 which meant our ping pong club activity time each week. Our slogan was "We are surrounding you!” There was a mouse tag stuck on the core ball.

On the poster show time, our Libra five VSs went to the dais and introduced our design highlights. We were in a great team. Finally, our works won the first three of twelve constellations. The final result would be announced in the last big activity of MSRA in Aug.26.

We all were very happy. After the competition we had a dinner and played billiards together.

So nice teamwork, so nice day!

The related pictures were as following:

Our rudiment idea

------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------

Our final design locale
------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------

The final mouse tag
------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------

Our works on showing
------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------

Other groups
------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------

2005年8月11日

Poster design for our Libra

There was a poster design competition. I was in Libra. And our Libra group had seven guys. All of us had had three discussions about our poster. In the past time, we all were concentrate on the new ideas and the main schemas. During the brainstorm stage, we discussed a lot. And sometime we all had no words seeing others faces. It was interesting!

This evening, we began to discuss from 18:30. First, we discussed the ideas last night we discussed. We voted for the schemas. Finally, when it was 21:00，we all focused on the final idea about using 29 ping pong tables forming our club name. I went to Wall Market to buy 39 balls. When I returned to Sigma 4F, they had done the main frame. Tomorrow afternoon, we would take part in the competition.

I came back to BUAA at 23:30. It was too late. I felt tired.

2005年8月10日

Ping Pong Activity

After the swimming last Sunday, I had done any exercise. As I felt little muscle ache and without time these two days.

This evening, I went to discuss our Poster Design with my Libra friends in brainstorm. After our super, it was 19:17. I had invited our Ping Pong members of MSRA gathering at 19:00 in 4F hall. So I was late for it. When I ran to, there was no one waiting for me. I knew maybe someone, who had wanted to play, had left because of my late. I would say sorry to them.

I went fast back to BUAA. When I brought my pat to the Ping Pong room, there were only four VS of MSRA playing. After a while, Taifeng Wang came. We six played three tables in two hours. During the two hours without substitution, we all enjoyed ourselves.

Caizhi Zhu would like to swim this Saturday. I felt good on this form of exercise.

2005年8月9日

Poster design and English Corner

This morning, I read two papers about dialogue act classification and dialogue clustering. When it came to afternoon, I was busy.

To my plan, I should write some corpus for our project. But there was a poster design competition. The deadline was Friday. This afternoon, Yuhao Zheng called us in Libra for a discussion. Our Libra was assigned to design poster for our Ping Pong Club. In order to design better poster, last week there was a VS who was professional on this, gave us a presentation on how to design poster. And we five Libra VSs discussed in detail about out poster. But finally we had not got the final schema. We must discuss it tomorrow afternoon.

This evening, there was a English corner. The speaker was Nikola who was from Yugoslavia and studying in America. He introduced how to save your life in US. He was very talkative and made a presentation like a cookbook. It was including nearly all the related things for first study in US. This was his last talk in MSRA. He would return to US tomorrow. Thanks for his talkative introduction.

So busy afternoon and evening, I would do more works tomorrow.

2005年8月8日

Typhoon Matsa

It's terrible in the two days. It was said there was a typhoon named as Matsa which would land on Beijing. And it would bring the biggest storm in recent 10 years of Beijing. It had damaged so many cities of China. It was reported that Masta would come into Beijing this morning with the speed of 30 km/h. And before this morning, there were thousands of denizens had moved to safe places.

But until 18:30 after our supper, it did not appear. We all friends did not stay in Sigma this evening. As newly reported on famous website, it would come at 20:00 this evening. So after lunch I returned back to the bedroom of BUAA. We could do nothing except watching TV.

We were in room and expected the coming of the storm. But when we fall asleep, it was without any trace. It seemed there was some inaccurate weather forecast.

So I only do more works tomorrow.

2005年8月7日

Swimming

I had not swum for three years. During my undergraduate life, I only swam three times when I was in the course of P.E. When I was a child in my hometown， I liked swimming a lot. Each summer vocation, we all guys went to reservoirs, rivers and swimming pools.

This afternoon, Caizhi Zhu, one VS of MSRA invited us to Sunshine Swimming Pool. It was very nice feeling of swimming. I believed so. In the morning, Chengjie Sun and I went to Wall-Market for shopping some swimming pants, hats and glasses. Afternoon lunch, we seven guys went to the pool. Carl, Xiaoguang Hu liked swimming very much. And they were in advanced level. They joined us.

We had our swimming from 14:30 to 17:30. Without any professional coaching, my swimming technique was poor. Luckily Xiaoguang was professional now. And his guide was effective. Chengjie Sun, Caizhi Zhu and I were guided by him on breast stroke. To my original level on breast stroke, I could swim with my head above the surface. I was unable to dive and float out for breathing. Xiaoguang corrected my leg poise and taught me for expiration under surface and inspiration above surface. Based on the two hints, I practiced many times. And finally I was familiar with breast stroke little. I needed more practices.

It was effective for practice and keep health by swimming. I'd like to swim again in next weekend.

2005年8月6日

[Collection]学习英语的六个窍门

当我们意识到有必要学会英语，并且下决心去攻克这个难关时，我们就一定要：

1、投资我们的时间和心智。我们并不傻，有足够的智慧和大脑空间来消化储存那些ABCD。别人能学会，我们也能学会，只要我们善于投资自己的时间。上帝赋于我们每日24小时，上班8小时，睡觉7小时，三餐饭2小时，莫名其妙kill2小时，无论如何应有1小时来学习。越忙的人，越有时间做事；越闲越懒散的人，越找不到时间来做事。

2、要从心底滋生出一种对英语的喜爱之情。把学英语当成一个开心而愉快的美差，而不是硬着头皮、头悬梁、锥刺骨的苦力。因此，先要从简单的入手，找一本好教材或一本故事书（生词量不超过30%）悉心研读，默识揣摸，就会有收获感，尝到甜头，进而信心更足，如开始就啃一本词汇量太大，没有词典看不下去的书，只会扼杀学习兴趣，降低情绪，最终放弃。
　　
3、要有自我约束力，且称之为“心力”吧。春来不是读书天，夏日炎炎正好眠，秋来蚊虫冬又冷，背起书包待明年。总有一些理由不学习。这样下去，我们的英语之树永远长不大。古人云：“人静而后安，安而能后定，定而能后慧，慧而能后悟，悟而能后得。”很有道理。在四川大足佛教石刻艺术中，有一组大型佛雕《牧牛图》，描绘了一个牧童和牛由斗争、对抗到逐渐融合、协调，最后合而为一的故事。佛祖说：“人的心魔难伏，就象牛一样，私心杂念太多太多；修行者就要象牧童，修炼他们，驯服他们，以完美自已的人生。”我们学英语也一样，要能够驯服那些影响我们学习的大牛、小牛，抵制各种诱惑，集中精力，专心学习。

4、要有信心。英语不过是表达思想的一种工具、一种说话习惯而已。我们要坚信，只要有投入，有付出，就会有收获。绝不会“付出的爱收不回。”
　　
5、要有实际行动。一个真正的马拉松运动员绝不会空等奥林匹克金牌从天下掉下来，现在就行动起来。
　　
6、要有连续性、持续性。学英语是一个漫长的过程，走走停停便难有成就。比如烧开水，在烧到80度时停下来，等水冷了又烧，没烧开又停，如此周而复始，又费精力又费电，很难喝到水。学英语要一鼓作气。天天坚持，在完全忘记之前及时复习、加深印象，如此反复，直至形成永久性记忆。如果等到忘记了再来复习，就象又学新知识一样，那么，我们就永远是初学者，虽然在辛辛苦苦地烧开水，却难品味到其甘润。

2005年8月5日

Reading(7):Second Part-Semantic Interpretation

This part had so much information I had not known. I'd like to read it in detail.

---------------------------------------------------------------------------------
Pages: 179~188 of Natural Language Understanding, second edition, by James F. Allen, 1995

In the introduction, there was a useful sentence which indicated the basic idea of this part. That was there were two main steps for judging the meaning of a sentence. Firstly, context-free form, namely logical form should be calculated. Then, final semantic form generated by interpretation logic form in context. The research on context-free meaning was named as semantics. At same time, context-sensitive language research was named as pragmatics.

In this forthcoming content of this book there were two terms. Meaning meant context-free. Usage was related to context-sensitive.

Without question, ambiguity was the most serious problem of semantic interpretation. If a word could represented more than two meanings, we called it had semantically ambiguous.

Phases referring objects were context-sensitive. But some sentence structure would produce some referring bind.
For example, the object was different obviously in "Jack saw him in the mirror" and "Jack saw himself in the mirror." The most important task of Semantic Interpretation was reducing candidate senses of each word by bindings.

When objects described in context, we could use some discourse variables. When there was a new discourse variable occurred in context, it was named as a unique name. In common instance, the posterior sentences would point to it.
---------------------------------------------------------------------------------

2005年8月4日

Reading(6):Rule vs. Stat, Evaluation, Baseline

This morning, I wanted to write a program for the distribution of the existed knowledge base. Because I had not mastered C# for such task. I used C++ instead. I overloaded the operator "<" and "==" for sort and unique algorithms, respectively. Yes. C++ was effective for my task.

---------------------------------------------------------------------------------
Pages: 152~157 of Natural Language Understanding, second edition, by James F. Allen, 1995
I'd like to pick the useful information about this book. As the beginning chapters were about syntax parser, I changed my reading strategy. I reviewed the book and picked some useful segments. There were some useful skills for my NLP research. Just sharing them with you.

Evaluation:
To the most research topic in Machine learning and Natural Language processing, when we had the estimating probabilities set and the related algorithms, we would like to know the comparison between your method with other classical ones. The usual solution was dividing the corpus into training set and testing set. The classical share of testing data was 10% to 20% of total. Another refined testing method was cross validation. It used the different segments of all corpus for testing. Each time, different training set was used and new testing data was tested. Then average result was the final result. This method ensured the most likely testing result. I had seen lots of machine learning packages used this method. But in my research, I had not used any. I would try it in the forthcoming NLP experiments.

The importance of baseline:
How to measure the performance of an experiment? I believed that we must consider the difficulty of the problem. There was a concept named as Basline. It meant the performance run by the most simple method. Just like to part-of-speech tagging, if you chose the mostly result in the training corpus, you would achieve about 90% accuracy. It was amazing. The reason was that there were more than half the words of the corpus had single pos. Therefore, we could use this method as the baseline for evaluation complicated algorithms. Unless your method achieved far great than 90%, its effect was not good. To baseline, there was another concept I wanted to share with you. That was upper bound. It was the best result achieved by human craft. The closer your method to upper bound, the best was it.
---------------------------------------------------------------------------------

2005年8月3日

Reading(5): Syntax and its Analysis

Now I was doing some research on dialogue modeling and management. The more I learned more, the more important I felt about Natural Language Understanding. This evening, we would have the weekly Ping Pong club activity in BUAA. So I read the book in the afternoon after our project discussion.

---------------------------------------------------------------------------------
Pages: 31~40 of Natural Language Understanding, second edition, by James F. Allen, 1995

How to calculate the syntax of a sentence? In order to solve this problem, there were two aspects should be considered: Grammar of language and syntax parser technology. To a new language grammar, we cared about its universality, selectivity , and comprehensibility.

How to confirm whether some words could construct some special grammar component? There were two ways. One was constructing a new sentence. In this sentence, some congeneric component used as an paratactic. This was a very good method. As mostly, only same congeneric grammar component could be used as coordinative. The other way was inserting the component into other sentences and checking whether it could be as same grammar component.

There were two kinds of grammar parsers: Top-down and bottom-up. They were same as the process of program compilers. So I introduced it little there.

When researching grammar, you would find that each grammar component had many kinds of usages. Therefore you should try more times in order to check whether it was true of grammar. Meanwhile, you must modify the grammar for the new forms. Under the limited knowledge background, such backdating was necessary. However there was a very key point. That was you must review carefully the new rule and its relation between existed others in detail. I believed it was same as our rule-based system. Sometimes you add a new rule for some special situation now, and there were some many contrary instances to it. So, you must do it very carefully. I believed our language, such as English and Chinese, had not grammar. And many linguists added some rules for them based on some observation. This process was great exploits to humans.

2005年8月2日

Reading(4):Linguistic background knowledge of English syntax outline

To my experience on reading book these days, I found the time span was fixed. If I worked in the morning and afternoon, then I would spend evening time for reading. And then if I spent morning time, I could work in evening time. As the working principle, I chose to finish the reading task firstly.

---------------------------------------------------------------------------------
Pages: 21~30 of Natural Language Understanding, second edition, by James F. Allen, 1995

The ten pages was the main of chapter 2 'Linguistic background knowledge of English syntax outline.' In this chapter English phase structure was mainly introduced. There were materials about noun phrase, sentence, preposition phrase, adjective phrase, and adverb phrase. They all were about English syntax. So I felt about reviewing English knowledge that was learned in high school and freshman grade of undergraduate. There was some new cognition about English noun phrase. As the experience of ACE EDR evaluation, the most difficult task was noun phrase identification. There were so many kinds of conditions. In chapter 2.4 there was some deep introduction about noun phrase. I believed it was the same problem in ACE evaluation. I should pay more attention to it.

There was some related reading material about English syntax. Baker 1989 had give an perfect summarize. The most roundly materials were the books which tried to describe all the English structure, such as Huddleston(1988), Quick(1972), Leech and Svartvik(1975).

Reference:
Baker, C.L. 1989. English Syntax. Cambrige, MA: MIT Press
Huddleston, R. 1988. English Grammar: An Outline. New York: Cambridge U.Press.
Quick, R., S. Greenbaum, G.Leech, and J. Svartik. 1972. A Grammar of Contemporary English. New Your: Seminar Press.
Leech, G., and J. Svartvik. 1975. A communicative Grammar of English. Singapore: Longman Singapore Publishs Ltd.
---------------------------------------------------------------------------------

Tomorrow, I would start reading chapter 3 Syntax and its Analysis

2005年8月1日

Reading(3):Architecture, Word, and Noun Phrase

This afternoon, we four VSs had the group discussion with Prof. Zhou. We finished our discussion at 18:30. It was late for super. We went to Luck of Four Seasons restaurant of B1 in Sigma Building. We four were so hungry to have supper quickly and full all. After a tiring day, we all wanted to return to BUAA for rest. But to my reading habit, I stayed here and read the third times of Natural Language Understanding.

---------------------------------------------------------------------------------
Pages: 11~20 of Natural Language Understanding, second edition, by James F. Allen, 1995

NLU System Architecture:
The basic organization of this book was on three layers: syntax structure, logic form, and final expression of meaning. To each sentence, the intention relied much on its context. The context-free sense form of a sentence was its logic form. Logic form encoded possible word senses and expressed the relations between words and phrases. The final expression of meaning was the universal knowledge representation. Mostly, we used first-order predicate calculus ( FOPC ) as our expression language. Because it was fine-defined and well-known, the process of converting syntax structure and logic expression to final expression of meaning was named as context process. It included coreference and anaphora resolution, analysis the tense of sentence new information, confirming speaker's intention, and reasoning process of interpretation this sentence.

To chapter 2, English syntax outline
Word and noun phrase:
There were so many factors of word, such as tense, derivative, number, and gender. Noun phrase used for describing things, for instance, objects, place, concept, event, and quality. The most basic noun phrase included unitary pronouns, including ‘he’, ‘she’, ‘they’, ‘you’, ‘me’, ‘it’, and ‘I’. The other form was proper noun, such as "John" and "Rochester". The boundary of noun phrase was very complicated. For the ACE evaluation 2004, we had tried our best but achieved lower performance. There were so many situation of it that the best way was based on statistical analysis. Noun phrase might include specifiers and qualifiers. Specifiers could be categorized as article, demonstrative, possessive, wh-determiner, and quantifying determiner. Qualifiers included adjective, noun qualifiers. There were two kinds of noun: singular and plural.
---------------------------------------------------------------------------------

When I was in the working days of a week, I only had the evening every day for keeping reading habit. I could not do some practice every day. So I might change my life habit. Maybe doing morning exercise was the better way for it. I could try the morning practice habit for my reading habit. Just another habit. Why not try again following the rules of how to keep a habit in my blog.^_^

订阅：评论 (Atom)