2003年12月31日

New Year

This the end day of 2003, tomorrow is a new year.

New year, will be new vision.

I'll try again!

2003年12月30日

First Volume of FSNLP

Last midnight, I finished the reading of the first volume of FSNLP.
This morning, I changed it for the second volume.

The first book gives me a summary of FSNLP. And I have found out that FSNLP is very good for the primary researcher to read for being familiar with NLP.

With the Adding Knowledge Program, I have persisted and gained a little.

I will keep on.

2003年12月29日

YOCSEF Meeting

This morning, I arrived at the administration building's YOCSEF assembly room.
After we had disposed the assembly room, the meeting's chairman Pro. Zhao Tiejun declare the begin of YOCSEF. This is my second time to attend YOCSEF.

The subject of this YOCSEF is Digital Olympics and muitillanguage information processing. And there were three specially invited guests. They described the blueprint of Olympics and the current most difficulties.

I have been familiar with the general picture of the Digital Olympics. And I found out that the Information Processing is more useful in the current times. And the prospect of our lab is very beautiful.

2003年12月28日

找到高中校友

今天晚上在实验室忽然接到一个电话,说是和我一个高中毕业的同学。一想才知道是那个大二的和我一个高中的校友在一位计算机系大三同学的帮助下找到了我。我让他马上过来。

我们聊了很多。包括他的学习状况,工作和生活,还有我的学习工作和生活。从他那里知道了在哈工大的其他几个高中校友的联系方式。以前就听说过有这么几个校友在,但是一直都没有找到。今天终于找到了。刚才给一个在化学系的大三的校友通了电话。原来他们都知道我,只是一直没有找到我而已。

想和他们约好时间一起聚聚,共述家乡情谊。

久旱逢甘露,他乡遇故知。 故乡情谊总是最真的。 感动……

2003年12月27日

《手机》观后感

回到寝室,和同学一起看了《手机》。

首先,佩服葛优的演技。

其次,影片的反映的主题很沉重。剧中台词:近,太近了,近的人都喘不过气来了。片尾严守一的侄女给他演示手机的全球精确定位和即时照相的功能的时候,严守一吓呆了。

科技是把双刃剑。想想小时候度过的那种信息不很发达的时代,对比现在的存在于任何空间的信息,人类确实进步了很多。但是,进步的同时是否有失去了很多……

2003年12月26日

Rough Set Theory

Our WSD research group will do some experiments on Rough Set. And Mr.Lzm believes that rough set can be used for WSD very effectively.

So I am arranged to read dome materials on RS.

Firstly, I read Knowledge Discover by TSinghua Express. But the content of this book is not enough. So I find some papers about RS.

I have understood that reading summarize article is the fast way to be familiar with the area. I read a summarize paper about RS. And the knowledge points are very clearly.

2003年12月25日

Studying paper of Li Juanzi

This morning I am reading Li Juanzi's paper 《语言模型中的一种改进的最大熵方法及其应用》which is published on JOURNAL OF SOFTWARE.

In her paper, she used an updated method combining maximum entropy, mutual information and Z-test to choose the best feature of context for a multivocal word and then used IIS algorithm to optimize the parameters of the linear model for Word Sense Disambiguation.

The experiment results are displaying the advantage of this approach. But I think the paper has two flaws. Firstly, the experiments for WSD is not enough. Secondly, Z-test is usually used to test normal distribution for large scale's samples. And in this paper there is a connotative hypothesis that the mutual information between the feature set and the category a multivocal word is followed normal distribution. The experiments did not prove this hypothesis.

And I think this experiments could be done more fully, and the experiment should do the hypothesis test.

OK. There is an idea. We can arrange some little experiments of our current information corpus to do some hypothesis test. En, this idea should be thought more and more.

2003年12月24日

收到李涓子老师的博士论文

感谢李涓子老师的热心帮助!

2003年12月23日

A bless mail from Korean

One of my last year's MCM teammate, who is studying for her graduate degree in Korean Puxiang University, mailed a Christmas card to me.

She told me a lot of news of her study and life. She has gained the usually first achievement of her class. And at this Spring Festival she won't come back. Inseadly she will go to Seoul University to meet her friends. She has a long term scheme to publish two or more international papers. To this scheme she has a lot of confidence. And I believe her can do so.

Also, I have told my recent state and my works in recent months to her. She told me to study more studious and do more practices and prepare to do more achievement.

I make a good bless to her study and life in Korean and may her journey to Seoul.

2003年12月22日

A exciting news!!

There is a exciting news that our lab has gained first in a recent evaluating.

I'm excited. Dr.Tliu tells us that we should have self-confidence to do everything better.

Great!


2003年12月21日

Matlab 程序升级

卢老师让我再做10-10,3-3,3-7,7-3等神经网络下的词义消歧实验。
想到上次5-5的实验花了我一整天的时间,而且还需要人工不断的切换,非常麻烦。
这次实验的总量是5-5的4倍,肯定不能采用原先的那种半自动的方法了。为此我花去了好几个小时来编制全自动实验的Matlab程序。

磨刀不误砍柴功。我在下午4:30编写完程序并将10-10的语料全部规范化后开始在服务器上运行程序,伪词01的21组实验花去三个小时左右就完成了。

这让我再次体会到了程序全自动化的好处。比起上次不断的人工切换程序,这次没有花费任何中途的人工干涉,效率大幅度提高亚。

2003年12月20日

Study for vc++ again!

Visual C++.NET is somewhat different from VC++ 6.0.

I think so.

2003年12月19日

Visual C++

In yesterday's adding knowledge program, I only read some base knowledge of Visual C++. But this evening, I did a small program on MFC after the guide of the book.

After I understood some codes of this program I was very happy.

MFC in .NET is more wonderful. Because it's desktop program like visual basic.

Great, study for program must do a lot of practices. I'll keep on.

2003年12月18日

My Adding Knowledge Program

Up to now I have joined in IRLab for nearly five months. And I gradually discovered my shortage of my knowledge system. I summarized it into three main aspects: English general ability, Visual C++'s programming ability and the foundation knowledge of Natural Language Processing.

I had been analyzing them for a whole week. After the relative systemic analysis I made the Adding Knowledge Program aiming at the three aspects.

After compute everyday studying time, I was somewhat astonished. Because it is five hours. And after my careful thought, I found it was feasible.

The detailed program was made sure yesterday evening. And I carried it out this evening. Now I find it is reasonable for me.

I like the program. I believe I can stick to it.

2003年12月17日

Deep research of our method of WSD

Our method has achieved to a good effect. But in our paper there are a lot of view points should be deep researched.

Dr.Tliu discussed detailed with Mr.Lzm and me. And we got a lot of constructive results which could lead us to do deep research of this point for WSD.

I should do some constructive experiments for our research.

Just so.

2003年12月16日

Dr. Rahmat Shoureshi's visit

This afternoon, about 4 o'clock, the American guest Dr. Rahmat Shoureshi visited our computer department. When he, with our dean, went into our lab, I made the demos for him.

I had being prepared for his visitation for three hours. But I had somewhat strain. When I made the demo of Dependency Parsing, the visitor maybe be confused. So Dean Xu let me change another one. I made the latter demo of Englishing Writing Assistant. I said a lot of prepared sentences to him. At last, he understood the main idea of the demos.

I am very happy to this chance for communicating with this Doctor.

2003年12月15日

Rough Set theory

Today I read through the paper of Dr.Chen Qingcai. In his paper rough set theory has been used very often.

It can be used in changeing pinyin into words, and reducting the baseline words for computing similitude degree of words.

A doctorial paper is very ample. I have found the difficult of it.
But at the same time I find out that grey system, like rough set, can be used widely into Chinese Information Processing. But the foundation of Chinese Information Processing and Grey system should be combined. In order to do so, I should learn more of them.

Try, and again.

2003年12月14日

陈清才的论文学习之一

内容很丰富,今天才看了35页,明日继续。

2003年12月13日

Get along with the experiment

At 8:30 I began to do the one hundred and sixty-eight experiments. I divided the experiments into four parts. I followed the way which I had used in the science and technology innovation contest that change the iterate times to observe the trend of results.

I implemented them one by one, and wrote the results on my notebook.

At 4:30 this afternoon, I had done them finally. At that time I was very happy.

Mr.Lzm observed the results, and asked me to study the paper of Dr.Chen Qingcai and try to use the rough set theory.

Rough set is very good theory. I think it is useful as Grey system theory. I will contrast them and use them in WSD.

2003年12月12日

Detailed experiment for NN-WSD

Based on the result of yesterday's full xperiments, after discussing the detaild experiment with Mr.Lzm, I begin to do the series of experiments on all kinds of corpor scales.

The diffilcult portion of the series experiments is to chage the original corpus into the format for matlab.

After I had done the difficult portion, I implement the programms and get the results quickly.

When I hand over today's results to Mr.Lzm, he gives me a lot of suggests for updating the experiment. So I will do careful experiments on five cmputers tomorrow.

It will be taken a whole day.

2003年12月11日

Full Experiment of WSD

In order to make sure the optimal inside net structure, I have done a self-contained experiment plan.

Likely to the experiment of my science and technology innovation, I confirm the parameters one by one.

At last, just now, I complete the whole experiment plan. I make the conclusion that: the magnify parameter is 9, the err_goal is 0.3, one connotative layer is enough and obligatory, the number of the connotative layer's node is 12.

I thought originally that two connotative layers is the best choice. But after my test experiment, I find out my former thought is wrong.

Experiment is the best tool to prove your idea.

2003年12月10日

O-O for Software Engineer

This evening, I begin to study O-O for Software Engineer.

At the class, I find out that the teacher's style is very different from Chinese teacher. Because the teacher, with five years oversea working career, is back from America. After the discuss of my friend and me, we agree with each other. We think that American teaching style is to make the problem simplier and simplier. The questions we are asked are very easy and can be answered quickly. Their teaching contains a lot of instances and is short of clearly logic.

At the end of the class, I begin to adapt to her teaching style.

Great! Continue studying for it and going to the classes.

2003年12月9日

续昨日

今晚按照今日计划,我继续阅读《自然语言理解-计算机能思维吗》。虽然今晚已经看完了全书厚度的一半,但我明显感觉我所学习到的东西没有全书的一半,大概只有十分之一。书中有些费解之处或是繁琐之处我略过了一些。

现在感觉这本书写的非常好,非常适合我们初步涉猎中文信息处理的学生好好研读。

今日从书中最大的收获便是:三段论正确的本质原因是包含关系的传递性。

还有一点是:语法包含传统语法、结构语法、短语结构文法、转换生成语法、格文法、CD概念从属理论。

此书需读百遍,其义方能初显。 感觉如此。

2003年12月8日

《自然语言理解-计算机能思维吗》

昨天开始学习王开铸老师在1995年写的《自然语言理解-计算机能思维吗》,书很薄,但很精辟。我感悟颇深。

摘抄一些如下:

经过漫长的社会演变,已经形成如今的八大语系:汉藏语系、印欧语系、亚非语系、阿尔泰语系、乌拉尔语系、尼日尔-刚果语系、马来-玻里尼西语系和德拉维达语系。

自然语言理解的三种观点:系统工程观点、层次结构观点、层次间单向观点。

对话双方的言语链过程:思维层-〉生理层-〉物理层-〉生理层-〉语言层-〉思维层

以上这些均是从第一章中摘录的。 仔细理解,确实很耐人寻味。

2003年12月7日

The adjusting period

This afternoon our lab has the weekly report meeting. It is turn to our Dic-Constructing group to report our works and difficult.

Zhu Liuliu take the report firstly. And I give two demos of our Cup-Dic. Indeed, there are a lot of shortages of our data files and demos. And I have the confidence to make it better, under the guide from Dr.Tliu and Mr.Lzm.

If you find the errors, you can solve them one by one. Yes, our group has found the shortages of our work. We will cut them one by one. So, I think we will do better after some time.

And after these days busy, I find some lack of my work style and arranging of my life. I analysis and get the conclusions as follows.

Firstly, my attention is easy to be thrown into confusion. If there is a emergent task which I must finish in few days, I easily do not carry out my former plan.

Secondly, my emotion is impacted easily by the current working state. Before a week, I should finish three big tasks parallel in few days. During my hard and hard working, I was not happy. But after the busy period I find I can do them better if I plan better.

Sharpen the saw. I must plan carefully enough, and do them one by one every day. Every morning, I should plan my detailed tasks of this day. At evening, I must chek them one by one, and adjust the leaves. Every week, I should do like this. So do every month.

Do them at once.


2003年12月6日

Construction of the Dictionary

This weekend, it is turn to Zhu Liuliu and me to report the latest development of the construction os the dictionary.

Zhu Liuliu and I were preparing the materials for the report. I have outlined the report into two mainparts. First one, for Zhu Liuliu, are bottom data files construction and the similitude degree of Chinese words, and the other one, for me, are the two demos of IdeaNet, Cup-Dicthe shortages and the next scheme.

I have done my part. And just Zhu Liuliu have done,too.


2003年12月4日

Exciting of the message from Dr.Tliu

These days our lab is devoting for the Big project of National center.

And Dr.Tliu gave us some very exiting news.

This afternoon, Dr.Tliu came into our room with the successing message.

All of us were excited.

Yes, it is exciting. We should try more!

2003年12月3日

心绪 2

前几天写下了《心绪》,那时心情不太好,是因为许多事情压过来,感觉无法承受。

今天再写《心绪》时,我感觉到的是需要充实的生活。 这种感觉伴随着软件工程的考试结束而更加强烈。我们的大学四年生涯已经所剩无几了。每天在寝室听到考研的倒计时,每减少一天我都会默默的祝福我的同寝室的同学们,同时也发现时间确实过的很快,特别是早上起床后,在晚上睡觉时感觉更是强烈。

时间如流水,匆匆逝去。 我们无法改变时间之水的流动速度,但是我们可以认认真真的度过每一分钟,这样才会充实得不至于整天都很忙,但碌碌无为。

好好珍惜这些大好时光,不要浪费掉每一秒。
这就是我对时间的心绪。

2003年12月2日

Exam of S.E.

Software Engineering is the last exam of the classes that should be numbered in the total achievement.

I have been reviewing for it for five days. I read through the book at least two times. And I have done one exam paper and the all homework to Mrs.Wang Yuying's powerpoint.

I think I have prepared fully for it.

But when I doing my exam paper, I find there is a lot of difference. The paper include some knowledge about the Compile Theory and the whole system of our school. I add a lot of my experience of my development of some software.

I think I have tried my best. It is enough.

2003年12月1日

The last class of Communist Party Colledge

This evening, there is the last class of Communist Party Colledge.

After the compere's audios of Divine Boat and Yang Liwei, some students go to the dais to talk about the self-feeling.

I am the third one who go to the dais. Firstly, I talk about the Diving boat success's significance. Then I state my viewpoint about the Diving boat's success. I anlysis the contrast between little success of Diving boat and the austere situation of our nation. I think the success of Diving boat is very little of the full project of our nation's renaissance, and we must foucs on the other things of the face of the project.

But the career is not by only day, it is a long term scheme. We must try our best to do what we must do and what we want to do.

After my address, all of our students agree with my view.

It is ok!


2003年11月30日

NN for WSD

This afternoon, Mr.Lu gives us a wonderful lecture of WDS in Chinese. He thinks it is good for the new students to understand WSD. And in his lecture, he added some newest cognition of WSD.

At the last I have a short lecture of how to design the net structure in matlab. There are fifteen pieces of experience.

Good! I think. But I find my speech speed is somewhat fast.

2003年11月29日

MonthReport and Matlab for NN

When I open my email, I find there is a email from Dr.Tliu. He requires all the month reports of each group.

And some days ago Mr.Lu told me to write some reports for him. So, I prepare to the month report of my works' situation of this month.

I outline my works into four parts: Audio for IRLab, NN_Experiment_of_WSD, CUP-DIC, and WSD_Studying.

When I finish it, I use the technique to build a catalog of my report and convert it into PDF format. I send it to Mr.Lu for his opintion.

When I come here to write my blog, I find a email from Mr.Lu. He requires me to make a short lecture of Matlab for NN. I agree with his requirement and do the ppt at once.

ok.I have prepared for the weekly lecture which turns to WSD group of Mr.Lu.

2003年11月28日

Chat with a good teacher!

At the last class of Soft Engineering, Mrs.Wang Yuying invited two good teacher who,with many years developing experience in software company, had returned back from American.

After the class I got the email address and msn address of one of the two good teacher. The teacher is ebullient and help a lot to my Software Engineering and my project of MSCVB.

Yesterday, she suggested me to study for her class Object Oriented with UML. And this evening I went to listen to her class. She had prepared enough for all of us. She made a Wame-up-session for her class. Her English is very good. She prepared to teach us in fully English on class. I think this way is very good for me to practise my listening English.

And my msn have conteced to hers. We had talked in English for several times. This evening our talking subject was why her outlook express can not send email.

Thanks a lot to this ebullient and good teacher.

2003年11月27日

Make headway in WSD

Today, I have some good news to note.

The experiment of the NN-WSD has made sure the input information format. After Mr.Lu and me discussed many times, we have decide a new information from the 50M corpus. And Mr.Lu has programmed for it. My designed NN-WSD model has been made sure by us. So, our experiment will run out the first conclusion soon.

I have mad a study plan in WSD that I should read a paper every day. After the repairment of my computer's operating system, I realized it today.
I have read a paper by a graduate student from Da Lian University of Technology. This paper includes some good idea for WSD. The auther had defined a dynamic context window to adapt for the process of WSD, and there was a filter to filtrate the inessential word in the context for the multivocal word.

There is a good idea for NLP is that we can make sure some base normative corpus for a experiment and then we use the normative corpus to find the other normative corpus in large-scaled corpa.

Ok.I must go for MCM class now.

2003年11月26日

New Operating System

Yesterday, I spent nearly a whole day to change my Operating System.
But at last, there was no spare space in C dick. So I must to install the Operating System again at another disk.
This morning, I formated my D disk with near 13 G. And then I installed every software.
During the installing, I clean up all my documents.
Right now, the installment has been finished and I change my desktop theme.The new theme is cool.

Good tool is the good foundation of my other work. And I will do my best from now on!

2003年11月24日

Some good news

This evening, we have a exam for the Communist Party college exam. After the exam I goto join the summarizable meeting of the CUMCM of 2003. At last, YU qiyue, Hongweijun and I form a team for the 2004's MCM. It is a good news for me.
After the meeting, I and Victor goto join the lecture of a people with successful carve out. The speaker says we should keep going ahead every day, and we should stick to the thing we devote to and never to abandon easily.
Indeed, I think so.

2003年11月23日

心绪

这几天心情一直不太好。
主要原因是因为我的各种任务的时间安排上出现了一些问题。导致出现了一些紧张局面。
这几天我们要两科考试,但是前一段时间我一直在处理实验室的实验任务和实验室简介的视频。我的原计划就是考前抽出时间来复习和完成需要上交的论文。
但是实验室的词义消岐实验又耽误了一些。
看来又需要加班了。
等过了这几天,我一定要采用非常规范的时间管理模式来规范我的生活和工作。

2003年11月21日

三日辛劳

三天以来,我一直都在制作IRLab的简介视频。确实,需要采集许多图片,需要逐一编辑和调试,最气人的是昨天晚上11:00我在找到方法生成最后的视频的时候机器突然死机了,然后重启N次还是不能进入系统。

幸好今天早上在Carl帮助下成功进入了系统。可是我生成的第一个版本足足有9G,吓坏我们了。 不过感到高兴的是本视频的导演Tliu老师对这个视频很满意,只是时间太长了。需要在制作一个剪辑。

呵呵,刚才Victor告诉我可以进行视频压缩,原先9G是因为没有进行任何压缩。现在这个视频正在最后生成…………

2003年11月17日

more tasks

I listed my recent tasks. And I was frightened by the table. There were eight big tasks I should do.

And I made a full plan to complete them, I discover that I must be very busy during this week.

Ok. I should began to complete the first one.

2003年11月16日

Ulread Video Studio

Ulread Video Studio is very powerful for editing and generating video files, such as avi and mepg format.

This afternoon, Dr.Tliu gave me the scenario for IRLab's intruduction. After had supper, I began to use Ulread Video Studio to complete it.

Ulread Video Studio is very interesting. You can merge video snippets, pictures, audio snippets, and texts to a nice and abundent video file.

Just now, I have constructed the whole frame. Tomorrow, I will photo some pictures to insert into this frame, the day after tomorrow I can invite Zsq and Wanglijuan to dub, and spending some time to modify it I can do a very good video to introduce our lab.

It is a good work.

2003年11月15日

生死抉择

党校学习要求观看《生死抉择》,今晚6:00到9:00在L001观看了该影片。
感触很深,片中分析了贪污腐败的原因和一个共产党员面临的生死抉择。
片尾 李高辰选择了党和人民,党和人民最后也选择了李高辰。
主题意义深刻!

2003年11月14日

project manage

Project manage is very interesting and diffucult.
The tools for project manage are many, for example Microsoft Project Manager .
The Gannt Graph and the Pert Graph are two kinds of project manage graph.

Constructing a Pert Graph is trobulesome.
And so on……

2003年11月13日

soft engineering

It is useful and powerful.

2003年11月12日

An experiment scheme

This morning, I came to 610 and discussed an experiment for WSD with Mr.Lzm. And we had some different view on experiment. But we made sure the scheme at last.
The whole scheme had been made.

2003年11月11日

WSD and dictionary groups' progress

This morning, I discussed with Mr.Lzm for WSD and dictionary groups' work.
I listed all the problems and tasks of the two groups. We discussed the way to construct the third layer and the fourth layer. And we discussed the way to do experment of WSD.
There were good progess.
We will go on discussing tommorw.

2003年11月10日

yesterday's busy!

This diary should be written yesterday. But yesterday I left from a Mathematics department office at 23:30. Because our teaching evaluation project's all algorithm modules must be modified.
And we, three students, began to modify all modules and test one by one. After we tested the last module, it was 23:30. And we were all tired.
During the testing, we found a strange phenomenon that when we tested whether two same long type number were equal, the system's answers was difference at random. We were puzzled by this problem firstly. At last, we found that when we compare whether two long type number are equal, we should not use "=" directly, and we can use the absolute value of a minus b less than a infinitesimal number. This was the effictive solving means.




Today, Monday, a usually busy day again.
Tommow I will discuss some problem with Mr. Lzm.

2003年11月8日

Two astonishment

Yesterday night I found a nice paper about neural network used for word sense disambugation. But I did not read it over. This morning, when I did today's work scheme, I decided to read the paper firstly.
The idea of this paper is very nice, I think!
Through the context vector, we could build a lot of input models and output models, then train the network to get the optimal structure. Later we could input the openning testing corpus, simulated to get the results.
The idea is nice.

Just now I searched "灰色" in Super Star. And I got a book with a string "grey" on the cover. At the first glance, I thought it was wrong, as I thought it should be said "gray". In order to make sure the answer, I found "gray" and "grey" in a dictionary. The answer was that they are same but "grey system" usually to translate for "灰色系统". Had found this true, I searched the "grey system" in Internet. Wa! The returned answers were more related to "灰色系统" than "gray system".
I got the conclusion that when you needed to translate a English word to Chinese or Chinese word to English, you'd better search more and more detailed, or ask for other person.

2003年11月7日

WSD for research

This afternoon, when Mr.LZM came to 615 to read Journal of Harbin Institute of Technonoly(New Series) we talked much on the reserach on WSD. He told me there were a lot of reseraching points inWSD. And we discussed a lot of techniques which could be used for WSD,such as neural network, heredity arithmetic, Simulated Annealing Algorithm, Gray system, and so on. Firstly, we didn't know whether neural notwork had been used in WDS. After we searched in Internet, we found that neural network had been used at early 90's aboard, and at 2001 domesticly. So I thought it is a good way for WSD. And the other ways were also good for try.
There were so much could be researched on.

2003年11月6日

A very busy day!

It is true that this is a very busy day!

This morning we went to a whole morning's classes of VC++ and Soft Engineering. At 13:00, I went to visit harbin electric machinery factory. It is true that the factory is famous and of large scale. At 15:30, I came back to lab and do my work again. At 18:30, I went to join a check for a software that we have spent more than half a year. And the result was that we should modify nearly half of the evaluation algorithm modules.

Right now I must complete the soft engineer's study.

So busy……

2003年11月5日

the Notice for Transfering Deliver Paper

This afternoon I received the notice for transfering deliver paper from the Editorial Department of CONTROL AND DECISION. They told me that as there are too many papers to publish and the publishing period had been delayed and so the employing proportion of new papers was too low and my paper had not been employed. But they proposed me to transfering deliver the paper to the 2004 annual learing meeting of the control and decision and if I agree with it they would deliver it to the annual meeting directly and the annual meeting would employ the paper preferentially.
Later I found some information relating to the annual meeting. And I found that the meeting is one of the six authoritative learning meeting in China. And the meeting includes the areas which gray systems included. The meeting's organizers are six big organization.
I introduced the situation to Dr.Liu and Dr.Su. Dr.Liu suggested me not deliver the paper to the annual meeting and modify the paper again and deliver it to another periodical. Dr.Su suggested me deliver it to Hit periodical.
I have thought it for several hours and there is no conclusion.

2003年11月4日

design error

This morning, I meet Mr.Tliu at 615. After I started my computer, he told me there were a lot of design error in my program for the dictionary. And I started my program, he analysised some errors in the program, and gave me many constructive advices.
After the examing of the program I use a famous dictionary's program, and I found some other errors in my program.
Why there were so many errors? I think there were two reasons.
Firstly, when I was programing for it, I wanted to complete a primary version, and then to keep consummating it. And this is a good working model for me right now, I think.
Secondly, although I have completed a lots of small projets before I joined in IR, I have a lot of flaws in programing. I should improve it. Right now, we are studying Soft Engineering, I want to study it well firstly.

2003年11月3日

WSD's idea

At this noon, I studied for a paper which was written by Changling Huang. In this paper, there is a good idea for WSD. Usually, we do WSD by making choice of the number of the semantic classes. But we do not make very good language model for the words in the context. And in this paper, the context is fully used to construct the language model. There is a noun named observational window for every word. Make a statics of probability for the words in the window to the key word, and build a vector for the key word of the context. If you make the vectors for all words, you can get a vector space. Then choice some typical high-frequency words to imply the clustering algorithm to get a lot of sets of words.
In this paper, there is a conclusion that using this method for a large corpus, you can get a lot of semantic sets which is consistent with Cilin at average probability of 81%.
This is a very good idea for WSD. But I think there are lots of other good method for WSD. We should mine them.

2003年11月2日

编程任务完成

最近两日都在完成词典界面的设计任务,卢老师交给我的任务是采用较好的界面提供词典第五、四、三层的查找功能,要求使用vb实现,原因是可以快一些实现该任务。
参考了一些有名的词典后我决定采用treeview控件来实现第五层的查找功能。但是treeview控件我从来没有用过。 又一次采用摸着石头过河的思路,我逐一解决了各个难点。最后实现的界面的第一版。 稍候又将第四层和第三层的信息加入,昨天上午最后完成了final版。 今天中午给卢老师看看后,各项要求都已经实现了。
VB真的很强大,我算是体会到了。
今天又花了一些时间来学习一些VB编程经验。感觉很有成就感呀。

2003年10月29日

Manage my time!

How to manage my time? It is an intractable problem for me for the recent and later days.

I have try it for a long time. After 12 day's taggment for the auto-abstract, I made a detail plan for every day. And today is the first day which I carry the plan out fully.

I divide my whole day into three main parts.
The first one is the morning which is from 8:00 to 11:00. During it I will study some English writing techniques and some base theory of NLP and IR.

The second part is the afternoon which is from 14:00 to 17:00. I will complete the tasks which were assigned to me, including program tasks and other tasks.

The last one is the evening which is from 18:00 to 22:00. During the four hours I will study for the curricula of the specialty and my avocation.

When I was completing the tasks for the auto-abstracts, I acquired a very important experience for me that when your must complete more than one tasks side-by-side, you should detailedly design your plan and assign some time to each tasks. Only do like this and you can complete your all tasks.

I have pay attention to my spare time during the three main parts of my whole day. In the morning, I get up at 6:30 to do morinig exercises. At noon, I go to gymnasium to play tabel tennis or have a lie-down. At night, I will go to bed at 23:30, sleeping for the next day.

I will carry out this life plan and keep improving it.

2003年10月26日

An Chonggen's award ceremony

The day before ninety-four years, at October 26 1909, An Chonggen who was the national hero of Korea shot down 伊藤博文 and gave his life as a sacrifice for his country's independence.

I heared this news at the award of ceremony of An Chonggen Scholarship.After the president told about this news, I was greatly touched.

After the ceremony we had a sodality with fifteen Korean graduate student coming from Seoul. I had a chat in English with a graduate student who is researching in database. We told a lot about the campus life, database theory, and the programming technique.

The Korean students were openhanded and ebullient.

2003年10月25日

Fourteen day's task has been completed!

This afternoon,after I put in my last programs which can evaluate the quality of the auto-summary, mine fourteen day's task has been completed.
I am very pleased, because I have learned a lot of things.
Firstly,I have practised my programing technique of vc++.Right now I have been more familiar with VC++.I feel VC++ is very good to process all kinds of problem of corpus.
Secondly,my teamworking competence has been practised once more.In modern times teamwork is the most thing for a team.If everybody contributes his full ability and does good co-operation,I think the team can do everything!
Lastly,I have learned the fussiness in organising many people to tag corpus and the method how to organize very well of their work.

Learn so much!

2003年10月24日

编程一日

今天,全天都是编写程序。
顺利完成了1个长度为378行的程序,之后又编写一个132行的程序,但是现在已经调试一个小时了,还是有问题,明日再调!

2003年10月23日

评测预料完全结束

早上8:00,几位标注同学彻底完成了标注任务匆匆上课去了。
忽然间,前几天的那种没有尽头的感觉忽然没有了。
想想也是,只要每天都完成任务又怎么会不能完成任务呢。

8:00,开始整理标注完成的语料。由于原先对于评价程序的输出没有弄清楚,我花费了较多的时间来试图将语料中对应文章进行人工语句对齐。这种工作量太大了,因为足足需要手工对齐近2000篇文章。下午找到秦老师来验收语料时谈到评价程序输出的问题,谈论间想到一种不用手工对齐语料的方法,这种方法非常节省时间,假如原先设想的工作量是1000的话,现在这种方法的工作量只有100。大大节省时间呀!

现在我又一次真正体验到了磨刀不误砍柴工的道理呀。

早早来到实验室

昨天晚上又没有时间写Blog,原因是几位标注的同学晚上9:00才有时间过来,每个人都还剩下一些文章没有标注,不过几位同学都很有责任心,今晨早上7:00都来到实验室继续标注了。我也只好现在补上Blog了。

原定的应用文昨天每位同学平均补充了4篇,达到18篇的上限。又让找了约24篇奥运文章,结果刷掉7篇,剩下17篇。此刻几位同学都还没有吃早饭,正在完成最后的任务,让人很感动呀!

看来今天就可以结束这个12天的标注任务了,再将评价程序完善,我的这个任务也就可以完全结束了。-)

2003年10月21日

标注工作进展顺利

连续11天以来,我都在处理这个较为繁杂的标注工作。
今天终于看到了快要结束的影子。
前几天的她们完成的工作一直由于不断的中断而没能够整理出来,总感觉这个工作遥遥无期似的。今天抽出时间,将她们的前几天每日的工作逐一整理出来,发现缺少什么马上让她们补上。很快就整理到了第八天的工作成果。现在第八天的工作也已全部整理完毕。明日只需要整理出应用文和奥运文3就可以了。

进展顺利,可喜可贺亚。

看来每日的工作都应该好好整理,这样每日都会有成就感,也就不会觉得遥遥无期了。

2003年10月20日

清楼老师还没有来

清楼老师还没有来,现在可以写Blog了。

今天算法课后终于有时间可以再和老师探讨动态规划的n-best问题。
我先将我的基于屏蔽的方法和老师指点的基于回溯的方法细化之后的结果向老师汇报,之后提出两种算法存在的一些问题,请老师再次指点。

老师提出了一些基于局部贪心最优来获取整体最优的想法逐一被我否定,最后的答案是这个问题太难,让我看一下已有的n-best问题的解决方案,然后再寻找出答案。

回到实验室,一会儿几位老师的小组讨论时又提到这个问题,听了一下,得知这个问题现在是一个很难的题,还没有找到很好解决它的已有方案。

这个问题确实很难。我原先设计的屏蔽方法不是一种好算法,等再忙几天后我一定要再好好研究。

2003年10月18日

工作失误

今天,我被批评了。原因是我没有按时完成分配给我的任务,原定的三个词典需要在15日前完成,今天都已经18日了,而我才初步完成了一个词典。

呀!清楼的老师又来了。只能明日再补了!

2003年10月17日

忙忙碌碌&&充实的感觉

最近几天,事情全凑到一起了。
今天与忙了一整天,现在才有些个人时间。
上午算法课后回到实验室,马上就到618听许峰雄的座谈会。会上我震动最大的是深蓝并没有采用什么高级的人工智能算法,而是采用硬件加速and树形结构快速搜索的算法,深蓝并不具备真的智能。

不好,清楼的老师来了,强制停止呀
明日再补了

2003年10月16日

张亚勤一行的风采之初感受

最近两天轰动学校的两条新闻是神州五号飞船飞行的成功和微软亚洲研究院院长张亚勤率团来工大参观访问。
今天中午他们来到了我们实验室。张院长先进来,面带微笑,很亲切的与我们每一个人握手。从此一举,我感觉到了真正的大家风范。
晚上7:00学校大礼堂又见到了他们的风采。
张亚勤院长讲述了信息技术的发展情况和展望,让人很受鼓舞呀。
之后的“真情对话”各位专家都讲述了许多他们自己的经历和经验以及对于我们哈工大学生的勉励。
我最受益的是作自然语言处理研究的周明博士的一句话:“好好总结自己,看看自己有什么优点和缺点,再看看别人的长处,然后调整自己,不断进步。”

2003年10月15日

动态规划算法的改进

上周实验室论坛中Simply介绍了动态规划算法,会下大家讨论了动态规划算法如何改进得到多条排位靠前的优选路径的问题。
昨天和Tliu老师,lzm老师,gold师兄讨论了一种基于屏蔽的方法。
今天在bbs上看到gold师兄对此方法有质疑之处,找到gold师兄讨论后又得到在昨天的基础上改进的算法。这种算法的效率还需要进一步提高。

还需探究一下呀!

2003年10月14日

标注工作进展&紧急切换课题任务

标注工作进展

昨天忙到晚上11:00,终于将标注工作完成。今天工作进展还算顺利。明日继续。
看来我采用的这种每日任务驱动的管理方式能够胜任这种找人来标注的工作。

紧急切换课题任务

今日得到通知,我的课题已经切换到词义消歧。和卢老师交流一下后发现这个领域有很大的研究空间,也需要学习很多理论和算法。很有挑战性。
我要迎难而上呀。

2003年10月12日

调整语料库

昨天和今天上午我组织五位外语系的同学找的语料库今天经过秦老师的核查以及刘老师的规定后发现有接近三分之一不符合要求,于是我又耐心的让几位同学补上了三分之一的语料。

真想不到今天准备好语料库后时间已经到了晚上9点。和刘老师商量后决定明天开始标注,五天一个时间段。

还要好好组织呀!

2003年10月11日

收集语料

早上早早来到实验室,调好机器,后来和Tliu老师、Carl师兄商量后定出了外语系的几位同学的工作详细安排,接下来就是指导外语系同学工作了。
先是演示,再是调试局域网,再调试UltraEdit,一直忙到现在。

看来组织别人来完成一件工作不是一件很简单的事情,需要细心周到的工作呀!

2003年10月10日

修改论文

原先科技创新的论文经过很多次修改,终于修改完了。但是还有两幅图总是存在问题,就是因为打印的结果看不出来对比情况。
我改了很多次,还是不行,看来还得要“不厌其烦”的修改呀!

2003年10月5日

Must Recover Quickly!

During the National Day's vacation,our boarding house provides electricity all-day.And some of my roomates who like to see American movies enjoy themselves at night.So we only can go to sleep very late.

I think life should follow the rule.I should recover as soon as possible!

2003年10月4日

坚持就是胜利

今日,我完成了我的抄写《Foundations of Statictical Natural Language Processing 》的每日任务。
看来完成这个任务确实需要花费很多时间,但我不能放弃,坚持就是胜利!

2003年10月3日

MSCVB小组

9月28日,我正式加盟哈工大微软俱乐部,担任VB技术小组的组长。这是我第一次在一个社团中担任职务。我想既然当上了组长,就要当好,就要好好的爱护我的组员,让他们在小组里学到真正想学的东西,也让他们的长处得到很好的发挥。
为了这个目标,我会努力的。
今天微软俱乐部有一个全体会员的活动——“素质拓展训练”。听Victor描述过他们在北京参加夏令营时的那几个素质拓展训练活动的内容,很有意义。下午我就问到他们的开会地址后就去参加了。
在X721开展了猜词、唱歌、圈跳等有意义的许多训练团队精神的活动,之后又转移到排球场举行网络信息传递的活动,很有意思。
微软俱乐部的气氛非常好,我开始喜欢它了。

2003年10月2日

Foundations of Statistical Natural Language Processing

Today, I begin to read Foundations of Statistical Natural Language Processing.When I have read some,I feel it is very good for me to understand NLP quickly.And I think it also can improve my poor English.

Think is the first, do is the most.I want to copy it by hand ,ten pages every day.Right now I have written 8 pages.And later I will complete today's scheme.

I think I should keep up to do it.

2003年10月1日

国庆&中国男篮

今天是国庆节,值此佳节,作为中国人都是应该好好庆祝的。
晚上6:00左右,zsq,taozi,slchen,carl,lee,and me 在615一起小庆了一下(一堆水果&煮香瓜子)。之后我们就开始看今晚的中国男篮vs韩国男篮的亚洲男篮决赛。

这次比赛相当精彩,中国队凭借着那股团结一致、奋勇拼搏的精神终于获得了冠军。很难的呀。在去年败给了韩国队后,经过一年的训练,中国队终于获得了胜利。在比赛最后一节的最后三分钟里,中国队和韩国队在82:90时陷入了一个短暂的僵局,这时老将范斌在一次三分不中,姚明抢到篮板回传给范斌后,范斌一个干脆的三分,之后又是范斌快速运球到对方篮下突破得分,中国队在范斌连得5分的情况下,打破了短暂的僵局。之后的比赛简直是你死我活的打法,对方不断犯规,范斌罚球很稳,不断得分。终于在哨声响起时中国队以106:96获得了胜利。也为国庆添加了一份厚礼。

最后的三分钟是最精彩的。短暂的三分钟,我深深体会到了中国男篮队员的团结和奋勇拼搏的精神。

我为之感动,我为之自豪!

2003年9月29日

阶段小结与下一步计划

感慨:〉时间如流水,匆匆逝去!

完成状况:〉
今年3月1日以来,我一直处于非常忙碌的状态。
主要完成的个人工作如下:

时间 工作 成绩
3月 1日~9月12日 哈工大教学状态评价系统算法设计与模块实现
4月 1日~4月11日 申请科技创新课题
4月12日~5月10日 完成科技创新课题
5月11日~5月31日 复习课程参加学科考试
6月 1日~6月10日 学校数学建模竞赛 学校数学建模竞赛三等奖(有些遗憾)
6月10日~7月14日 期末考试复习 年级31名(比以前有些退步)
7月15日~9月10日 完成科技创新课题 学校科技创新一等奖(付出=〉收获)
8月10日~8月14日 参加YOCSEF和JSCL
8月20日~9月29日 完成科研实习课题
9月22日~9月25日 参加全国数学建模竞赛

现在有了可以放松一下的念头。但是现在是阶段总结的最好时机。

总结如下:〉

1.对于科研的认识

题目=〉查看资料=〉分析=〉建立模型=〉编程验证=〉改进=〉得出结论
↑__________________________________↓

2.对创新的认识

创新=坚实的理论基础+活跃的思维火花+扎实勤奋的工作

3.对合作的认识

现代社会的几乎任何工作都是建立在合作的基础上的

4.对数学的认识

数学=〉一切

5.对坚持的认识

勤奋刻苦乃求学成功之法宝,实现超人的理想必须付出超人的代价!

6.对新知识的认识

问渠哪得清如许,唯有源头活水来!

7.对锻炼身体的认识

身体是革命的本钱,锻炼身体也是磨练意志的方式。

8.对于学习英语的认识

英语=步行的工具,不学好英语=〉寸步难行。


下一步计划:〉

1.补上《软件工程》丢下的知识
2.完成毕业设计课题
3.学习《人工智能》、《机器学习》、《统计自然语言处理模型》
4.加强学习数学建模基础理论和Matlab,Word,Excel,VC++的学习
5.加强英语的学习
6.力争每天坚持锻炼身体

最近目标:国庆好好休息两天,之后开始详细执行下一步计划

2003年9月25日

数学建模竞赛

又是三天三夜。今年的全国数学建模竞赛终于在今天早上8点落下帷幕。

每次参加完一个竞赛,我都会有很深的心得体会。这次同样。

三天三夜,我都是在理学院的大机房度过的,总共睡觉时间不超过8个小时,有两顿饭没有吃,共编写了约三千行代码(但是仅有三分之一的运行结果有效),结论好像还可以,但是方法实在是不高明。

我采用的是线性规划和动态规划模拟的组合。线性规划的方法采用matlab下的linprog函数实现,动态规划的方法开始采用计算机编程模拟的方法,可由于程序逻辑太复杂,编写完后运行结果不合情理,后来采用手工表格事件步长动态规划的方法,完成了动态规划的设计。与其他小组的结果对比,差不多,看来可行。

感触:正如victor的Blog中提到的"书到用时方恨少"。遗传算法,模拟退火算法,神经网络算法,这些方法我本来都学过,但是都没有学透,在比赛时不敢轻易使用,做B题的大约15个小组中有人就采用了遗传算法,而且局部答案与我的线性规划算法的结果非常吻合。

看来,我需要踏踏实实的学每一本我想学的书,读书不在多,而在精,每学一本,就要学得精通为止。


2003年9月21日

标注之感觉

刚来实验室时就知道标注的“痛苦”的感觉

今天算是真正领教到了

我的标注共有280k(其他人也是这么多),共有5200行左右

前一段时间比较忙,每天的标注量也不是很大,这两天开始大量标注,今天标的最多,

足足有1000行。

现在的状态是看到命名实体就想去标注。哎!

不过,在标注过程中确实也找到了一些技巧,比如大量采用宏,预先将各种标注的关键

词都用宏分别录制,然后在真正标注时,只需要按很少的键即可完成一次标注。

嗯!实践出真知,实践出技巧,真要好好实践呀!

2003年9月19日

乒乓球比赛

为了下周能够和MTLab进行乒乓球比赛,今天中午实验室大部分成员都到乒乓球馆去进行排位比赛了。

开始我还在为怎么安排比赛顺序犯愁,后来和Tliu老师还有Lee师兄一起将14个人分为两组,一组是基础组(包括所有女生,是不是有一点不公平,:~~),一组是级别组,两个组单独进行内部大循环赛。我的第一个对手是语音组的张磊老师(一员女将),连续两局都是在关键时刻丢分。哎!我需要加强心理素质呀。第二个对手是吕滨(语音组男将),我们水平差不多,第一局我小胜,心里感觉轻松了许多,第二局12:14败下阵来,第三局我加强防守,以8:11获胜。看来乒乓球比赛不单单需要技术,还需要很好的心理素质呀!

我的第三个对手是Tliu老师,其实本不用打的,我以前和Tliu老师打,从来都是大比分落败(除非Tliu老师让我,:~~),但是还是打了。第一局由于我几日来加强了长拍下旋球的训练,和Tliu老师打成8:11(算是小败,:~~),第二局我就不行了,连续输掉6分才扳回一球,结果以3:11大败呀。Tliu老师真的很厉害,改日练习好后,再和他切磋。

我的第四个对手是Truman,由于他对于下旋球的接法不了解,我获胜了。

其他人的比赛也很激烈。但是到了1:45,我们就被迫退出乒乓球馆了,因为有人要上课。

哎!还没有比完呢,排位也没有个结果,目前的比分情况是
Lee 4
yuchen 2
stream 3
Carl 0
zhanglei 2
lvbing 1
Truman 1
Bill 2

看来需要和大家一起讨论怎么处理呀!

今天的收获就是,无论做什么事,心态一定要稳住,这一点往往是最关键的地方。


跑步

跑步的好处,我在高中就已经知道的,高中三年的“被迫”的晨跑使我的身体素质有了很大的提高。大一以来坚持了两年。大三由于学习和其他工作太忙,跑步的时间也就很少了,前几个月几乎就是没有跑。

忙完科技创新后,感觉身体素质下降了不少,又开始“从操旧业”了。

每天早上6:30起床,跑3~5千米,再做些舒展筋骨的活动,约莫7:10就回寝室了,再收拾一下就出来了。离开公寓的时间也与以前差不多,但是感觉可就大不一样了,走路都觉得比以前有力,连续坚持了5天,精神是一天比一天好。

看来又开始了大一大二时的感觉。

跑步真的很好。跑步是我一天的开始,跑步使我更加坚定意志,使我的神经克服了懒惰的状态。特别是压腿,其实就是压“懒筋”呀,每天坚持“压腿”,人就不会变懒了。:)

晨练贵在坚持。我这次一定要坚持下去。


2003年9月18日

感触于“每日一贴”

今天在实验室的BBS上面看到TTS版出现了“每日一贴”,引用TTS斑竹Simply师姐的
原话是

“所以,基于上述的考虑,我有了一个想法,在和Angela以及candywinter讨论了之后,我决定在bbs的TTS板块开展“每日一帖”的系列文章,每天介绍一个基本概念,并且介绍概念的由来,涉及到的相关的算法,应用等。举个例子来说,如果要介绍“基音周期“这个概念,那就要介绍这个概念是怎么来的?要定义这么个概念?这个概念要应用到哪些方面?求“基音周期“的算法又有哪些?当然在这一帖里可能不详细介绍相关算法,但若提到相关算法的话,在下面的帖子里要给解释清楚。
这是我初步的想法,欢迎tts组成员以及对tts感兴趣的同学提意见,讨论,大家共同进步! ”

这个想法非常好,对于斑竹有压力,因为要每日都理解一个基本概念后撰写出来。但是对于斑竹会有很大的好处,斑竹能够每日理解一个基本概念,而且别人还会对于他(她)的理解给出讨论。这样的形式会让斑竹进步神速的,但是斑竹一定要坚持下来,如果不能坚持,那一切都是白忙了。
对于其他讨论的参与者也会获益的。因为在别人理解的基础上再提出问题或者见解来,本身就是一种进步。
如此有百利而无一害的好事,我也要参与呀!于是,立刻在我负责的Machine Learning版推出了“每日一贴——灰色系统”的第一贴,里面写了我学习灰色系统理论9个月以来对于灰色系统的理解。写完之后感觉自己对于灰色系统的理解又加深了一些。
看来“每日一贴”真像“每日一贴膏药”,效果奇特呀!:)


2003年9月17日

感动于此


刚来实验室的时候,就在Victor的推荐下看过Carl的Blog。一次和victor闲聊时得知很多人都有Blog。昨天Victor告诉我地址后,跑到上面一气呵成的看完了Victor,Carl,Stream,Bert,Lee,Simply的所有Blog,感触很深。

每个人都有着一些随想,或者仅仅是在大脑中仅存在几分钟的一个创意,这些都是很宝贵的资源。从大一开始的第5天,我开始记《我的大学日记》,每天都写,一直坚持到现在,已记了五个日记本,里面记录了我的许许多多的想法和每个阶段的总结。记日记真的很有用处。

看完了大家的电子日记,特别是看到每篇日记下面都有一些别人的评论和指点,感觉这种方式的日记很好,因为你的个人的一些想法和见解,别人也可以看到,别人还可以参与你的想法的讨论。这样就完全打破了以往日记本不允许别人看的约束,自己也可以在别人的指点下加深对于问题的理解。

真是日记的革命呀!

感动于此,在Carl 和 Victor 的帮助下我也拥有了Blog。现在很兴奋。特别感谢Carl 和 Victor的帮助。谢谢你们!