2004年11月30日

学习技术的几个要素

今天看到fatcat的blog里面的一段较为经典的话,摘录于此:

学习技术的几个要素是:
兴趣,信息,交流,思考,实践

兴趣,就是要有本身的能动性,对技术新知识感兴趣,而不是别人灌输
信息,就是能够利用各种工具(google, maillist, forum),找到资料,找出有用资料
交流,是能够和别人讨论,问有价值问题,但是不依赖别人
思考,能独立思考,能有创新,善于总结,能够摸索着前进
实践,要动手,要验证自己的想法,要实战学来的技术

大学期间这些能力的培养至关重要!

2004年11月29日

机器学习的几个定理

今天学习《模式分类》最核心的一章《独立于算法的机器学习》,里面提到几个非常有意思的定理:

没有免费的午餐定理
丑小鸭定理
最小描述长度原理
Occam剃刀原理

甚为有趣!

2004年11月28日

GA for anaphora resolution(2)

After some search in internet, I found a new paper using GA for Chinese Pronoun resolution. I read this paper quickly.

The main idea is same as my original. So I understood a rule that if you have some new ideas, you should realize and publish as soon as possible.

2004年11月27日

Make new acquaintances

This afternoon, when I searched some information about anaohora resolution. I found some information a Ph.D. student of Beijing Normal University. He focused on the centering theory, and had published a paper on zero anaphora resolution on centering theory. He was in the second year of his Ph.D. process. His school request two core magazines' paper for graduation of Ph.D.

We talked lots about the research on anaphora resolution. He was familiar with the linguistical knowledge. We agreed on that we could cooperation on this point.

Good idea. Good news.

2004年11月26日

The introduction about anaphora/coreference resolution

Dr.Tliu had requested us to make full introduction about each research filed. I was charged with the introduction of our information extraction and summarization group. As the sub-areas were three. I divided it into three parts: multi-documents summarization, single document summarization, and anaphora/coreference resolution. I made the introduction about anaphora/coreference resolution, and hand the other two to other persons who were familiar with them.

Under the usual framework of introduction to research area, I finished it in three hours. During the three hours, I found lots of materials. And then I became more familiar with this area.

2004年11月25日

Research plan of anaphora/coreference resolution

These days, one of tasks was to submit a research plan of anaphora/coreference resolution.

As in my original idea, I divided the research into two main parts: intrasentential and intersentential. In another way, I divided it into four fractions: combination the syntax information for resolution, combination the knowledge bases for resolution, features set generation and optimize, machine learning algorithms comparison and selection.

Right now I knew the difficulty about making research plan.

2004年11月24日

新书到

晚上Carl定的几本书都到了。其中一本《word排版艺术》是我和victor都非常想欣赏的书。《统计学习基础——数据挖掘、推理与预测》、《道法自然——面向对象实践指南》也都随之而到。

新书摆了一堆,真是不知从何读起。翻翻自己的抽屉,又发现《编程高手箴言》一书,原来是上学期买的书,但是到现在为止也没有看。看来读书计划的制定何执行是非常重要的。打开《编程高手箴言》,发现按照里面的排序我现在还处在初级程序员的水平,真是惭愧。看来需要好好学习和练习编程了。

2004年11月23日

So interestig the plays.

This evening, with WF, I went to watch the Graduate English plays contest in A building of our campus.

I meet our best English teacher Mrs.Zhang, our classmates Wang Shuting, Jiang Haiyan and so on. Some of the plays were very very wonderful. I was admiring them. So many groups played ‘pilgrimage to the west’(西游记). One group played some snippets of The Matrix Reloaded. There were some very intesesting scenes of tussle. I liked it very much.

So interesting.

2004年11月22日

So busy day!

This moring I was noticed to make the presentation on the CS Clubs inaugural meeting. I prepared for it in this whole morning.

As be noticed to prepare for the introduction of ACE EDR Co-reference.

This evening I took part in the CS Clubs inaugural meeting. And after this meeting, from 9:30pm, I took part in the weekly IRClub meeting.

So busy a day!

2004年11月21日

Perl 5 编程详解

         今天的任务动态列表中很值得完成的任务是继续学习Perl编程。         一致都知道Perl非常适合于处理文本,页非常适合于自然处理研究工作中涉及到的一些编程任务,但是迟迟没有展开对它的详细学习。今天开始仔细学习Perl。
         翻开yhb师兄借给我的《Perl 5 编程详解》,很快就被其强大的功能所吸引,尤其实在正则表达式方面简直非常的完美。仔细查看Perl实现正则表达式的算法方法时发现它采用的模糊匹配的方法居然非常的高效。正则表达式是Perl的生命力所在,俺务必掌握之。
         在原先学习完基本程序语言要素的基础上,今天我主要是学习了Perl的正则表达式一章,打算采用Perl来完成我最新需要处理的一批语料。
         编程最快的学习方法就是不断的读别人的代码和自己不断的实践练习。相信我会很快掌握Perl的基本功能的。

2004年11月20日

战胜Matlab必做练习50题

         查看今天的动态任务表,发现最适合完成学习《战胜Matlab必做练习50题》的学习。
         打开这本书,翻看了一下它的目录和主要内容。发现本书以单元练习的形式,从MAATLAB最基本的问题入手,循序渐进,逐渐过渡到较为复杂的数学问题、信号分析问题、力学问题和电学问题的求解,将MATLAB的学习贯穿在解决不同领域实际问题的过程当中。每一个练习都结合问题,介绍与之相关的MATLAB使用知识,全书50个练习基本上涵盖了MATLAB的主要功能。
         书中的内容各个部分介绍都比较简单,和其他的书籍比起来只是形式变了一下。但是从中我学习到了一些以前不太熟悉的指令和方法,如:求伪逆矩阵的pinv(包含inv);矩阵除法解线性方程组的速度比求完逆矩阵后再相乘的方法快;subspace可以求解两个同长向量张成的子空间的夹角;roots求根法先把多项式转换为伴随矩阵,再求特征值,可靠性和精度都高于经典方法,而且比数值解法更好的是可以求解出复数解;end可以表示向量的最后一个元素;plot(p)在p为实矩阵时绘出每列元素与其序列号的对应关系,这样可以很方便的绘制对比曲线。
         现在领悟了一点“书读百遍其义自现”的含义。

2004年11月19日

两件感触

感触可以用“件”来作量词吗?或许可以,这里试试^-^

感触之一

上午8:00开始上《机器翻译》课,今天这堂课是机器翻译实验室的杨沐昀老师主讲知识工程。课程的内容也就是差不多的知识工程中的那些知识,但是讲课之余的一些“闲话”倒是让我感觉很是受益。我的感受如下:

1、作自然语言处理的某项研究,一定要精选一些论文阅读,现在能够找到的论文中很多都是一些层次很低的论文(可称之为“垃圾”),如果找到了一定要鉴定一下,否则可能会浪费时间,最好的方式是阅读那些大牛的文章以及他们的文章中提到的那些文章。

2、研究不要自我评测,因为自己评价自己总会是好的结论,应该参加一些公共的评测。现在自然语言处理的几乎各个子领域都有相关的评测,开始实际动手左作一些工作的时候一定要调查好相关的评测机构和评测组织以及可能的竞争对手。关注并参与这些评测对于研究工作是非常好的。

3、在从事某项研究的过程中一定要注意撰写出一些非常高质量的代码,这样便于维护和转交给别人。看来我起初的ACM计划还需要继续进行下去。



感触之二

下午两点半金山公司的雷军总裁及金山词霸、金山毒霸技术总监和我们实验室的成员进行了一次短暂的座谈。座谈主要的内容是实验室各位成员自我介绍以及雷先生的讲话。

从雷先生那里了解到CMU的博士一般要五到8年才能毕业,刚开始读博之前需要编写十万行的代码,所以CMU毕业的每位博士生编程方面都非常厉害。从事研究工作需要非常良好的编程功底,否则不可能编写出有用的代码,甚至不能进行有效的研究。原先我一直在编程方面吓得功夫不是很多,现在看来我前几日开启的ACM计划需要认认真真、彻彻底底的进行下去。

和其他的大的软件公司一样,金山公司讲求软件框架设计的可维护性、代码的可维护性、文档的完整性。这些东西虽然在软件工程中学过,但是在自己的研究工作中体会的还不是很深。还需要锻炼一下自己的这方面的能力。



两个感触的小结
研究需要写代码,而且需要非常高效、准确的写,需要软件工程标准来要求和进行质量的控制。

既然知道了这些,那就好好实现吧 ^-^

2004年11月18日

组合数学考试

晚上如期进行组合数学的考试,由于是开卷考试不像以往本科时的那些数学考试一样紧张。大家都准备了书本,考试前也显得很轻松。

考试题目和类型也都是大家熟悉的,但是有一个题目的提法让人费解,需要揣摩一下才能明白。这就像出题者给我们设立了很多的关卡一样,需要层层破解。感觉就像破解谜团一样,有趣。

考完试整个人也就轻松了一些,但是回到实验室一下列出了22项任务需要完成。需要好好整理了。

2004年11月17日

Sciense and technology philosophy exam

This evening, we took the Sciense and Technology Philosophy exam.

Frackly speaking, this exam was long for us. As I known, all of us had been wirting for nearly two hours. When I handed in my exam paper, I fell little tired of my right hand.

We must begin to prepare for the Combinatorics of next day.

2004年11月16日

Some nice books

When I was studying in the campus library, I came to the second floor to view the books. I found some nice books and had borrowed them. Ther were Program Generators with XML and JAVA, Introduction to Management Science: A Modeling and Case Studies Approach with Spreadsheets, and The Bible of Visio 2000.

I planed to read them after the two exams.

2004年11月15日

IRClub weekly meeting

This evening we hold on the weekly meeting of IRClub. As in the exam season, many menbers were not absent.

All the eight members who came here introduced their workings and progress. Some good news was that the mp3 and pdf information extract modules were finished quickly. The sub group of research had chosen their supervisors and will begin their research.

We decided to choose some members as the main speakers of next meeting.

2004年11月14日

Paper discussion

This afternoon, I invited Dr.Tliu, Miss Qin, Mr. Lu, and Carl to discuss the framework of my paper. It was about the summarization evaluation.

The discussion result was that I should add some experiment results about the comparization between anto-summarization and human-summarization.

This was a wonderful idea for supporting my idea and result in the paper. After the two recent exams I could finish it.

2004年11月13日

ACM.hit.edu.cn

This is a wonderful website!

It was set up by paws, xiaoyin and xiong before one year. I knew it when it began. But I tried it from this evening.

After so long time without practising programming, my program ability was very poor(I thought so.) The first problem was very easy. I passed it very fast. But the second problem spent me about two hours. As there were so many restrict of c language that I forgot.

Finally, my program passed the limit.

So wonderful and useful website. I would like to solve a problem per day.

Thanks to Paws for his recommendation.

2004年11月12日

指代消解之句法分析

看过好些文章也有好些idea急待实现,其中一条是借助句法分析。但是现在国内的中文句法分析的效果还不太好。实验室的金山师兄正在完成一个很好的句法分析器,期待结合他的成果完成新的指代消解系统。期待中……

2004年11月11日

GA for anaphora resolution

I was looking for the related materials about GA for anaphora resolution.

2004年11月10日

程序设计心理学

上午10:00来到实验室,一会儿金山的一位程序开发反面的高手给我们培训程序设计心理学。

题目的新颖让我们很兴奋。演讲开始介绍了许多心理学的内容。提到《发生认识论》这本书。他极力推荐大家阅读本书。

程序设计中的心理学就是要尽量考虑到用户的使用心理。比如一个原则是人同一时间注意到的事物的数目在5到7个之间。

2004年11月9日

组合数学

近日随着复习的深入,组合数学的精巧和神奇才慢慢被我领悟到一些。

原本非常复杂的问题在它看来都是那么简单和容易推导出来。

继续体会中……

2004年11月8日

Perl

重新学习Perl,感慨于它的灵活方便和处理文本的精巧。

2004年11月7日

复习一日

快要考试了,抓紧时间复习中……

11月16日科学技术哲学考试,还有8天;
11月18日组合数学考试,还有10天;
12月初模式识别考试,还有23天。

2004年11月6日

亚瑟王

晚上和WF到学校电影院看了电影《亚瑟王》。

整个电影场面宏大,剧情跌宕,感觉是一部很好的片子。对亚瑟和他的骑士们的那种崇尚自由、敢于拼搏的精神所感动。以前还看过《勇敢的心》,也都是反映了人们对自由的渴望和追求的艰难,但是最终也都获得了自由。


2004年11月5日

The first snow of this winter

This morning, when I got up I found there was a thin snow on the ground. This was the first snow of this winter. I was little excited.

Recollecting the days of the first snow in my freshman period, all the members of our class were excited at that time. Many of us played snow war on the playground. We were happy.

It is a collection of memorable pieces about the undergraduate campus life that was going down in those days.

2004年11月4日

Studying for the Pattern Classification

Right now all of us were busy with the homework of the pattern classification. We were all puzzled by some of the themes.

Just now I fell that we were learning mathematics instead of Pattern classification. The reason, as I guessed, was that our teacher introduced the mathematical aspect instead of the application one.

Chapter nine was the core fraction of this book. I liked it.

2004年11月3日

Abid Khan

Abid Khan is a foreign student of our lab. He had come to our lab twice. But each time I was not in lab.

This evening he came here again. Under the arrangement of Dr. Tliu, he used the computer of CYH. He was polite. When he came into the room, he shook hands with all the persons. I introduced myself to him and ask his how to spell his name. His English was fluent and better than mine.

I made a conversation with him. I introduced the library sources to him. He asked me what should be mastered for his PH.D. learninig. I introduced something to him.

When I talked with him, I found I forgot some glossaries. I should improve my English.

2004年11月2日

Doctoral English Reading activity

This evening, it was Ph.D. Mjs's turn to give his presentation in English.

His topic was Learning Random Walks for Inducing Word Dependency Distributions.

The main idea of this paper was constructing a rule links from expert experience, wordnet and other resource. On Markov chain method there were some ramdom walks of the links graph. After some search, the probabilities could be obtained. This method could solve the data sparseness problem.

So wonderful idea. This random idea was little similar with the random forest. Based on the random idea, we could do lots of things.

2004年11月1日

IRClub activity

This evening, all the members got together. The main topic was distributing the tasks to the members. The main tasks included two aspects: research group and development group.

All the members were excited. But after the meeting, a member sent a mail to me. He decided to leave out our lab. He said as follow:

I must say thank you for informing me of attending the meeting.and thank the club for teaching me a lot.
however i really don't think i am quite ready for the club,also there is some other reason of myself,so i hope you will permit me of farewelling with you now.and i am also very sorry for having taking you a lot of trouble,sincerely sorry.
maybe when i am better prepared,i can learn more from you and our club and lab of IR.
again, thank you and sorry.

I replyed him as follows:
I am very glad that you have known yourself more. We are welcoming you paying attention to our lab and club. Also, we welcome you take part in us when you prepare well.
May you good achievements!

Little sad for him and for our club.