
Rreply to the robot attack of my blog

Nowadays, I discovered that many many bad comments appeared in my blog. As I had not closed the option of allowing comments, lots of robots put some bad comments. I believed that there were some SOE(搜索引擎优化公司)s use this trick to advance the rank of some hyperlink's rank in Google.

Formerly, I deleted the bad comments one by one. These days I dealed with it as it deals with my blog. I used a robot named as ROBOT5 to delete the comments one by one automatically and without any human interface. By this robot I could record some operations as a macro and run this macro for any times.

This robot software was useful for lots of mechanical operation. Wonderful!


Pattern Classification homework

This evening, I was studying in a classroom in D building. My Pattern Classification homework had not been finished. After studied the chaper four I began to finish the five subjects. But the first was nearly a pure mathematical one. I thought it for an hour without any solution. The other subjects were easy to be solved. At some extent, I believed that mathematical was very important for the computer application subjects.













Google desktop search

今天收到最新一期《计算机世界》。像往常一样,我迅速的浏览了全部内容。其中有两篇文章谈到了Google Desktop Search。一篇说Google Desktop Search很简洁高效,整个软件也非常小,神奇的查找资料的方式和Google的网页搜索融为一体。总之就是非常赞赏。

另一篇文章谈到一个使用过这个软件的人都会想到的问题,那就是它的安全性。Google Desktop Search可以非常方便的帮助用户查找机器上的资料,同时这种功能也具有最完美的间谍功能,因为你的邮件、聊天记录、个人office文档都尽在它的眼中。如果别人一旦侵入你的机器,你的个人资料和个人隐私将100%的被盗走。








The students club plan

Nowadays, our college leader intend to go ahead with the students club plan. The clubs are founded based on some research centers and labs, under the lead of some Ph.D. and master students. The aim was to practise the scientific research and development ability. The basic idea was to let the sophomores and juniors join in the students club earlier and know more about each research centers and labs.

The result of our IRClub interview was published this noon. I phoned the students who passed the interview one by one. They were all excited. This afternoon, there was a junior who had not passed our interview came here to ask us give a chance to him. As he cherished this chance very much. Carl and me were moved by his spirit and let him take part in the meeting of tomorrow.

This kind of chance was very good for each undergraduate. When I was a understudent, we had not any chance to join in the research center or lab. I admire them.


IRClub Interview(2)

Our Information Retrieval Club interviewed about twenty-seven sophomores this evening. Based on the experience last evening, we interviewed them more standard.

Faced to the sophomores I could feel clearly that their experience and thoughts not abundant and profound than the juniors. Thinking back the my sophomore and junior years, I was like them. At this momnent I understood more about the effect of university. We must cherish more the campus life.

The process was three and a half hours like yeaterday. We tired also. But I fell better.


IRClub Interview(1)

Our original plan was to arrange the IRClub interview at tomorrow evening. But at 18:10, some juniors came to our lab said that they were noticed to be interview this evening. Dr.Tliu said that we should interview them under this case.

Carl, zsq and me, based on the interview excel table that I designed this noon, combined a three interviewers group quickly. Our rule was three students as a group. Any student should be interviewed by three times by us during 15 minutes. We asked each interviewee some questions and made a score. Finally we unified our opinions to each student.

This process was strict to and responsible for each interviewee. There were twenty-seven juniors who were interviewed by us.

We were all tired after the four hours. But this was a nice experience. This was my first chance as an interviewer.


Begin to study VC++

This afternoon, I began to study VC++. This time I had made up my mind to study VC++ and never study VB.

I studied MFC firstly based on some experience on a simple calculator. I had finished a "Hello World!" program. This was a simple but useful program for me to understand the mechanism of VC++.

Continue this process!


Reading the new paper about Anaphora Resolution

There is a new paper about Chinese Anaphora Resolution that is On Anaphora Resolution within Chinese Text. The author is Wang Houfeng, who is a expert in Chinese Anaphora Resolution.

In his paper, he mentioned some issues on anaphora resolution within Chinese text and analyzes the difficulties to solve these issues in the current state of art. Three aspects of anaphora resolution are discussed: (1) It is difficult to identify some Chinese anaphors such as zero forms and common noun ones; (2) there are a lot of difficulties to recognize potential antecedents and their features like gender, number, and grammatical role etc.; (3) There is a lack of both necessary technology of NLP and Language resource.

I learned some new technique for anaphora about the syntax. That is C-command condition.


Time Management

How to manage your time when you are busy with more things than your consideration or burden? This is a big problem of my study and life.

During the period of time, I had spent lots of time on the lab's tasks. When I had some spare time I concentrated on them also. So there were some chapters of Combinatorics and Pattern Classification I had not read any more. And some homework of them I had not finished.

This evening I thought more about my recent life and study. I found that I should manage my time more reasonable. And I listed a simple time management as follow:

6:30 Get up, do moring exercise and have breakfast.

8:00~11:30 Finish the task of lab or read some materials about the research theme.

11:30~1:30 Have lunch and take a nap.

2:00~5:30 Practise the programming techniques or finish the program task of lab.

6:30~10:00 Study the course of graduate.

10:00~10:30 Write my diary and make the detail plan for the next day.

11:00 Go to sleep.

This plan is flexible for my study and life. There is a celebrated remark that you must devote all your energies to your work and study when you are working or studying, and spend all your energies to enjoy your self in your spare time. This rule is adopted to my needs.

New management, new life! I wish so.


Face to the visit.

This afternoon, I received a task that introducing our laboratory to some foreigners in English. When it was 3:10 this afternoon, the guests who were a couple, came to our laboratory. After some introduction by our associate dean, I began to introduce our laboratory to them. During the speech of our associate dean, the woman smiled with me. I felt her kindness.

Frankly speaking, before their visit I had prepared the English introduction for one hour. Firstly, I talked some about the research areas, IF, IE and NLP. Then I made some demo to them. At the beginning, I fell some nervous. About two minutes later, the couple and me sat down to chat. The man was very interested in the natural text understanding technology. When he looked the demo of Chinese sentence dependency parser, he asked some questions about the Chinese character word segmentation and said that was very different from English words. When I demoed the summarization system, he told some thing about his works about reading lots of information. Finally, he was interested in the Chinese character recognition system. He wrote a character that was a old symbol in Chinese. So this system could not recognize it. He was very interested in this character and said some history about this symbol.

Frankly speaking, I had not understood some sentences of them. But I could feel that their English was perfect. This was a nice chance for me practicing my speaking English.

Nice experience!


First TA

This evening, I came to the Second Campus for TA of experiments of C language programming. I substituted my studying brother to guide the experiments. This was a good chance for me to practise my ability.

There were fifteen students who were been guided by me. Some of them finished the experiment quickly and better. But some of them were not smart to the problem and the language. I thought back to my C language experiments when I was a fresh man. The guidline was more strict. Every fifteen students had a guidiing teacher. I believed this rule could give more benefit to the students.

It was a good experience.


Continue writing SE paper

This was a whole day's work.


Get together

This noon, Hang Chen came to the campus. Ten of our class got together at Hong Ming restaurant. We talked lots on the working experience and recent situation. Hang Chen was more mature and stout. He gave us some advices about finding a good job. He also told some news about other classmates. Some of them wanted to change a job and some prepared for the recent graduage enrollment exam.

We all fell happy with the good memory in undergraduate four years.



很久没有和别人讨论灰色系统方面的问题了。今天在HIT-IR-BBS上的Machine Learning版遇到一位ID是phew的朋友。开始是他对灰色系统理论提出质疑,后来是我们之间的一些讨论。将这些讨论列举如下:




令 t = (1,2,3,4,5)

假定时步 dt=1
对于序列 (35,47,22,150,47,33),因为它不是递增的,所以进行累加,得到如下序列
于是:[dx/dt]= (47,22,150,47,33)
[x]= (82,104,254,301,334)

dx/dt + ax + b =0

x=exp(-a * t) – b/a

E1=47 + a * 82 + b
E2=22 + a * 104 + b
E3=150 + a * 254 + b
E4=47 + a * 301 + b
E4=33 + a * 334 + b

按最小二乘法,使 min([e] * [e]’)
得到 (a,b) = ( -2.4362 ,-0.0105)

Dx/dt – 2.4362 * x – 0.0105 = 0

X=exp(2.4362 t ) – 0.0043

X=(10 13 14.9 1707 19509)





1、 dx/dt=dx 是有条件的。那就是,dx/dt在0附近

-349 1
-612 1
-858 1





用最小二乘法求 A





2、有太多的人在造数据,上面提到的那篇论文,a 的值明明是 0.0648,而最后的反推公式则奇迹般地变成为0.01804(希望是作者的笔误)。但是,无论用 a 的那个值(0.0648 或者 0.01804),按论文描述的灰色思路,都无法还原到原来的序列(希望我的验算是错误的)。



有太多的问题是说不清楚的,因而需要理论的突破,不只是灰色,风行的Fuzzy Probability 同样也在遭到数学家的拷问,不能采用糊涂的理论解释未知的事物。我希望灰色理论有大的突破,因为我的专业也在等待着。但不是用这样的方法。





出版社 : 清华大学出版社
作者  : 周志华/ 曹存根/
系列名 : 中国计算机学会学术著作丛书
出版日期: 2004年9月


Related materials on Summarization Evaluation

There were so many materials on summarization evaluation. In the recent DUC 2004 conference, there was a summarization evaluation tool named as ROUGE. It's main idea was calculating the n-gram co-occurence rate. Following the successful application of automatic evaluation methods, such as BLEU, in machine learning translation evaluation, Lin and Hovy(2003) showed that methods similiar to BLEU, i.e. n-gram co-occurance statistics, could be applied to evaluate summaries.

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes several automatic evaluation methods that measure the similaity between summaries.


[1] Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of Summaries, ACL2004
[2] Chew-Yew Lin, and E.H.Hovy.2003. Automatic evaluation of summaries using n-gram co-occurance statics. In Proceedings of 2003 Language Technologyu Conference, Edmonton, Canada.


New Scheme

I have done the summarization evaluation task. But I have not studied on CR for nearly two months. So my recent task was to read lots of papers about CR.

We had studied lots of classes on Pattern Classification, Combinatorics. I had not reviewed them for nearly half a month.

Two main aspects of tasks I could work for. Try again!


Poor pronunciation

During this year I had made two English presentations. The first one was on the Graduate English Class. At that time my topic was High-tech. It was only eight minutes. After that presentation my English teacher Mrs. Zhang suggested me improving my pronunciation. My second presentation was in the summer holiday's Fault Tolerant Computing and Wearable Computing Class. My topic was Power Management. It was thirty-five minutes. After that presentation the teacher Dr.Daniel P. Siewiorek suggested me improving my pronunciation.

This evening I made the third presentation. This time was in the Doctoral English Forum about discussion some papers about our research fields. It was my trun to give presentation. My topic was Random Forests in Language Modeling. It was seventy minutes. I kept my speaking speed in order to express myself clearly. After the presentation I answered lots of questions. Dr.Tliu suggested me improving my pronunciation.

It was clearly that my pronunciation was not good. I fell this problem was very serious to me. I should solve this problem from now on.


Continue reading paper

For tomorrow's presentation, I must continue reading the paper. There were lots of puzzles to me.

One thing I'd like to note that the author Peng Xu was a Chinese. His education experience was as follows:

1990 ~ 1995 B.S. in Tsinghua University;
1995 ~ 1998 M.S. in National Lab of Pattern Recognition Beijing;
1998 ~ 1999 Ph.D. Candidate in Brown University;
1999 ~ now Ph.D. Candidate in The Center for Language and Speech Processing, Johns Hopkins University, USA.
Fields of Interest: Speech Recognition, Pattern Recognition, Language Modeling, Machine Translation, Natural Language Processing, Multimedia Coding.

He had done lots of performance.


Random Forests in Language Modeling

It is about the language model. The reading outline was as follows:

Title: Random Forests in Language Modeling

Author(s): Peng Xu and Frederick Jelinek

Author Affiliation: Center for Language and Speech Processing, the Johns Hopkins University, Baltimore, MD 21218, USA

Conference Title: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.

Language: English

Type: Conference Paper (PA)

Treatment: Practical (P) Experimental (X)

Abstract: In this paper, we explore the use of Random Forests (RFs) (Amit and Geman, 1997; Breiman, 2001) in language modeling, the problem of predicting the next word based on words already seen before. The goal in this work is to develop a new language modeling approach based on randomly grown Decision Trees (DTs) and apply it to automatic speech recognition. We study our RF approach in the context of n-gram type language modeling. Unlike regular n-gram language models, RF language models have the potential to generalize well to unseen data, even when a complicated history is used. We show that our RF language models are superior to regular n-gram language models in reducing both the perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system.
Descriptors: Natural Language Processing basic problem
Identifiers: random forests, language model, decision tree, perplixity

Personal feeling: It introduces the decision tree language model and random forests concepts. The main idea is wonderful. Random generating some decision tree language models and combine them for a whole model. This model could solve the data sparseness problem at some extent.

Some thing could be updated: The random decision tree generation method was not good enough. I believe we can use some optimization principles for get better random decision trees.


Three new summarization systems

Based on the best evaluation methods' idea, I realized two summarization systems. The first one was following the traditional methods that calculates the weights of all sentences and selects the best ones.

And the new idea for my first system was the weighting methods. I used the evaluation methods for calculating the weights of each sentences. The final summarizations of the test papers achieved some better score under my best evaluation system.

Then I changed my point to combine all possible sentences set then calculating the similarty to source file. This methods was very slow. As its algorithm complexity was pow(2,n).

The third one was generalizing the weights of each sentences based on the first system. But its final evaluating results was worse than the realized four system by yhb.

Three systems, three methods, I will think more about them.


Some checking results

The original plan was that I use the new evaluation method for yhb to obtain the best one. And yhb gave me eight new system, I used my progam to evaluate them one by one.

It was perfect effective. But there were some trend not following Mrs. Qin's feeling. We could analysis more.


Exciting Scheme

This morning, after reading the daily latest news I made the daily plan. Just now I finished the first one: realizing the relatively word frequency approach. The final experimental result was close to the intending. It was of the ability of distinguishing different summarization systems, but not high relative to the human feelings. It was not well as the two ones, could be a comparison results.

Right now I had an exciting scheme. I discussed the recent development with Wanxiang Che who was a PhD.Student. He did not believe that my TF method was powerful and suggested me to realize some new method. But he suggested me to realize a new summarization system based on my TF method. We could make some evaluation by human to prove the useness of this method.

This system was not complicated. I could realize it quickly. So exciting.

After comparing this system with others I could do another thing: comparing the summarizations with human summarizations and get new evaluation approach.

So exciting news for me today! I was excited!!!!

Let me start the new plans.


Simply but effective method for SE

It was said that the most effective method was the most simply one. I could not believe it ever. But now, I couldn't help believing it.

This morning, I kept feeling sad on the SE task. I had no idea. But I sticked on my viewpoint about the break point that was how to combine the new system into my SE system. I wanted to realize the famous package for Summarization Evaluation in DUC2004: ROUGE. But there was some unkonwn problem that I could not run the progroms. I had no idea and began to review the presentation ppt on 26 Sep. Suddenly, I found the two methods in that ppt could be re-realized with some new usage.

I compared the new four ranks data, and got the ideal result. Wonderful!!

I realized it. I told this news to Dr.Tliu. He discuss it with me and was exciting ,too. He suggested me to think more about the methods.

At noon, I kept working and realized a more basic method and achieved better result. Good news for me.

Frankly speaking, the two new methods were not master, but simply and effectively. I could not explain it fully.


Visiting science and technology museum

This morning we began our visiting plan: to science and technology museum.
We were 17 person including members of our lab and WF.

Remembering last time, we planed to visit this place. But it was close every Monday. Insteadly we plaied in the Sun Island. Today was Tuesday and sunny.

In this beautiful building there were three floors. All kinds of item under science and technology were interesting.

The most exciting program was the four-dimensional film.


No answer for SE

One whole day, I was thinking the key of the SE problem.

I found I had not any idea for this problem. So I changed my view to the publication papers of others. There was lots of papers about SE by Hongyan Jin. She was a famous person in this area. I had read some of her papers. But there were not any information guiding me to finish my task.

The key problem of the SE task was how to combine the new summarization system into the SE system.

There was a famous summarization evaluation conference in DUC. Their evaluation tool was ROUGE which was based on n-gram and other gram information. I wanted to test it. But there was some problems of my perl enviroment.

Until just now, I had not got along with it.


Three new methods for SE

My recent task was designing new methods for SE(Summarization Evaluation). Today, I tested three new methods for SE. They were Artifical Neural Network, Decision Trees, multi regress analysis. But no one was good for my task.

I believed there was a breach. I must combine the new system to my SE methods. How to combine? This was the essence of my problem. I could ponder it much.


New method for SE

There was a new method for SE(Summarization Evaluation). That was based on the hint of Dr.Tliu, Car and Yhb. This method was so good that I could use my machine learning methods. I wanted to extract lots of features of the Summarizations and let the learner fitting the data.

The framework has been fixed on. I only realized the sub-modules one by one.


National Day!

This is national day!

Under the original plan, I was working in lab in this morning and afternoon. When it was 3:00 pm, I, with WF, went to Harbin Odeum. There was a wonderful concert of Harbin Philharmonic Group for national day. It began at 6:30 pm.

The conductor was a famous young man named Qiuhong Teng. He conducted 12 compositions, including the famous Carmen, the Blue Danube, and so on. They were beautiful.

This was the first time I went to listen concert. Wonderful experience. And WF and me wandered in soome streets, including the Central Street.

When it came to 8:00 pm, we came back.