2005年12月31日

Celebrating New Year: Football Match

This was the end day of 2005. After a nice discussion and preparation, we hold this celebrating New Year football match. Our opponent was ITNLP. At 9:00, we started our game on time.

We had 11 vs. 11 means. During the two hours, we had a nice competition. Finally, we had the score 2 vs. 2. It was just a deuce. We all fell tired. Liqi Gao and Jianguo Lin had been little injured during the match. We had encourage and physical strength in the game.

It was the first match between our IRLab and ITNLP. Nice match, just for New Year. Thanks to every body in the match!

2005年12月30日

Bug on Map

Map a nice container of C++. I love it. But this afternoon, after debugged four hours, I found that there was a bug in my mind about using map.

There was a sample code:
------------------------------------
#include <map>
#include <string>

using namespace std;

int main()
{
map mapTest;
mapTest["Good"] = 2;
mapTest["Morning"] = 3;
mapTest["To"] = 4;
mapTest["You"] = 5;

return 1;
}
------------------------------------
In the above program, before returning, the map content is:
=======Content of mapTest=======
Good 2
Morning 3
To 4
You 5


Then I used a search on this map by "YYThanks" as index, as following:
------------------------------------
#include
#include

using namespace std;

int main()
{
map mapTest;
mapTest["Good"] = 2;
mapTest["Morning"] = 3;
mapTest["To"] = 4;
mapTest["You"] = 5;

int number = mapTest["YYThanks"];

return 1;
}
------------------------------------
Before returning, the content of maptest is:
=======Content of mapTest=======
Good 2
Morning 3
To 4
YYThanks 0
You 5

So, it meant that after the search of "YYThanks", the content of the map is changed. Why it happens? After the discussion with a member of our lab, our conclusion is after the empty query on mapTest, mapTest had a index-value match as "YYThanks"-"0". After changing the value from int to string, I ran such program again, the new match was "YYThanks"-"". It meant that the new match index-value had the default value of the definition kind of value.

It was a deep hiding bug in my program. Luckily, I had found out and debugged it.

There were two another kind of searching on map in C++ Primer:
-------------------------------------
map word_couunt;
int count = 0;
//1. Count(keyValue)
if(word_count.count("good")) count = word_count["good"];

//2. Find(keyValue)
if(word_count.find("good")!=word_count.end()) count = word_count["good"];
-------------------------------------

There was a introduction about the operation of map search in C++ Primer, Page 251. If you'd like to know more, please look it up.



Three conclusions:
1. Doing is more important than thinking.
2. C++ primer is very good. It should be read once more.
3. No hypothesis is better than any wrong ones.

2005年12月29日

Bayes Model for recognition

Doing is harder than thinking. It is my practice conclusion. These days, I was trying Naive Bayes Model for my gender recognition task. First, I deduced the formulation of Naives Bayes for my recognition problem. There were two experiments I should finish. But how to write program for realization the formulations? This was my first time using Naive Bayes Model. There were many little-big or big-little questions laying front to me.

For example, how to compute the model? Which parameter should be calculated? How to use the model for open testing? After asking help from many friends, I studied out the experiment plan for my plan.

After this process, I had been familiar with Naive Bayes Model a lot. Before today, I had been known Baive was very easy to use. However, now, I knew nothing was easy. If you doubt it, just do it?

2005年12月28日

Coreference Resolution Research State

This afternoon, our Text Mining Group had a weekly group meeting. This time, I presented the research situation of our CR(coreference resolution) sub-group.

Abstract :
After a long time doing projects, I return to my favorite research on coreference resolution. Our CR(Coreference resolution) group will do deep research. In the presentation, we will conclude the tortuous past works, show you the current wonderful research on gender and number recognition, and put forward our magnificent futures. Although it is a short draft, we will do it out and out.




2005年12月27日

On Enlish Learning

“骐骥一跃,不能十步;驽马十驾,功在不舍。”---《荀子》劝学篇

I have collected some nice websites for English learning. They are

VOA美国之音 http://www.voa.gov
英国BBC网站 http://www.bbc.co.uk
美国CNN网站 http://www.cnn.com
剑桥辞典在线 http://www.dictionary.cambridge.org
Merriam-Webster辞典在线 http://www.m-w.com
朗文网络辞典 http://www.longmanwebdict.com
柯林斯在线词库 http://www.cobuild.collins.co.uk
Encarta Online微软百科全书 http://www.encarta.msn.com
美国《商业周刊》网站 http://www.businessweek.com
英国《经济学人》网站 http://www.economist.com
美国《国家地理》网站 http://www.nationalgeographic.com
美国《首映》网站 http://www.premiere.com
美国Billboard音乐网站 http://www.billboard.com
新东方学校网站 http://www.neworiental.org


One more nice blog about English learning is as following:
古德明每日开讲.
I think it is very good!

2005年12月26日

Reading Group--My presentation

This afternoon, from 16:00, we started our reading group. It was my turn to give presentation. After a week's preparation, I worked out 31 slides for the presentation. At the beginning, I introduced the reason why I chose the paper for gender recognition, and some research background about anaphora resolution research to our sub group.

I had invited many friends to my presentation, when it was starting, many Ph.D. candidates came here. My supervisor Prof. Tliu came here also. During the one hour's presentation, I introduced the paper in deep detail. And finally, I concluded the research of this paper and gave some plan on my current research. Many attendees gave nice advices and suggestion to my research. During the discussion, we found some doubts to this paper. I would send mail to discuss with the author.

Prof. Tliu gave me some good suggestion about my current research. He advised me to use some long distance context information for anaphora resolution. Chengjie Sun and Guanglu Sun thought using the gender type based on parsed corpus and web was not enough for anaphora resolution. After our discussion, we all believed that we should use the context to bind it. Hongfei Jiang took part in our reading group firstly. He gave me some suggestion about context modeling. But I listed the problems on context modeling. Maybe after some days, I would discuss it with Jiang. Wanxiang Che thought using linear kernel could not combine the expected value and variance squared enough. As I did not know more about SVM kernels, I could not discuss it more with Wanxiang. But I believed using SVM was a kind of combination for the 20 expected values and variance squared. Maybe we will discuss later.

Do you still remember the slogan of HIT machine learning group? Let intercommunion to be a habit. In that spirit, I knew the virtues of intercommunion. Yeah! After the reading group, I knew more about this research topic. There were so many suggestions and advices that I will learn. I liked this form.

After the meeting, in our research room, Prof. Tliu gave me three advices on my presentation. The first was my speech was so quick that many people could follow me. I should slow done my speed. Second, I would decrease the walking frequency. Audience would like pay more attention on the moving objects. If I was walking before the screen, the presentation effect would be decreased. The final problem was my poor English pronunciations. Yeah! It was a very serious problem to my English presentation. I had not spent more time on it. After the half and one hour's presentation, I felt little pain of my voice. I suggested to myself that I should learn and practice some on pronunciations and voice.

Yeah! I'd like to list the gains of my reading group presentation as following:

1. Considering the context modeling techniques.
2. Using long distance context information for enhancing the performance of anaphora resolution.
3. Learning more on SVM
4. Practice more on pronunciation and voice.
5. Reduce the walking frequency before the screens.
6. Discuss more with other researchers.

You can download my presentation slides here: Automatic Acquisition of Gender Information for Anaphora Resolution

2005年12月25日

Merry Christmas Day

Christmas is here! In the morning, Yajie and me went to our lab. We stuck some pictures on the wall. When we looked around of our lab room, we all believed it was beautiful scene.

It was Christmas day and Sunday. We went to watch movies in the Cultural Palace of Harbin Railway. The movie was the newest one: The Promise. Although it was Christmas day and Sunday, there were very few people. The scene of the film was great. We all liked it.

At five o'clock this afternoon, Yajie returned to her campus. It was a nice Christmas Day.

2005年12月24日

Happy Christmas Eve with Yajie

Merry Christmas Eve to you! After I had done my tasks this week, this afternoon I met Yajie on HongBo square. We went to some marketplaces. Finally, we chose a beautiful feather cloth and a sweat. In the evening, I taught and played billiards with yajie. Finally, she won me. :)

In the morning, we played card in Lilac Bar. Yajie teached me the playing method. But she was good at it. So I failed more. We talked a lot for anything.

It was a nice day. In the morning there was a heavy snow. The entire world was in white color. When we walked on the thick snow, it was nice feeling.

2005年12月23日

"Crazy" on Google Talk

Recently, I was suffering from MSN Messenger. It was disconnected so often as to I can not link my friends conveniently. Many of our lab members had encountered the same problem as me. After a short discussion, we all wanted to use another chatting product--Google Talk.

There were many tips and skills about how to use google talk. I collected them in the following link: Google Talk Tips and Skills

Welcome you to add yours.

2005年12月22日

Reading Group Invitation

Welcome to IRLab Reading Group
Reporter: Jun Lang
Date: 2005-12-26(Monday) 16:00
Location: Room 618, New Tech Building





Paper Information:
Author: Shane Bergsma
Title: Automatic Acquisition of Gender Information for Anaphora Resolution
Conference: Canadian AI 2005, May 9-11
-- Winner, AI'2005 Best Paper Award


Abstract:
  We present a novel approach to learning gender and number information for anaphora resolution. Noun-pronoun pair counts are collected from gender-indicating lexico-syntactic patterns in parsed corpora, and occurrences of noun-pronoun pairs are mined online from the web. Gender probabilities gathered from these templates provide features for machine learning. Both parsed corpus and web-based features allow for accurate prediction of the gender of a given noun phrase. Together they constructively combine for 96% accuracy when estimating gender on a list of noun tokens, better than any of our human participants achieved. We show that using this gender information in simple or knowledge-rich pronoun resolution systems significantly improves performance over traditional gender constraints. Our novel gender strategy would benefit any of the current top-performing coreference resolution systems.

2005年12月21日

Ping Pong with Sweat

What is sweat? You can read the answer from several dictionaries or repositories from google. Definition of "Sweat". I thought sweat was very good for health. Sweat could bring away many impurity of body.

This evening, our IRLab had the ping pong activity again in base room of Flat 12. There were about ten persons this time. We all played very happy. During the two hours, I had played once with Prof.Tliu. But the final score were 8:11 and 9:11. Prof. Tliu was very good at playing ping pong. By influenced with him, ping pong was the favorite activity of our lab. I played with the classmate of Liqi. He was in high level. We played in high spirit. But finally, my packet was nearly broken. There was a crack of the board. I would repair it tomorrow.

I had sweated a lot. It was a nice feeling.

2005年12月20日

Preparing reading group

Recently, I was preparing the reading group next Monday. It was my turn for giving a English presentation on some better paper. Considering my current research point and the requirement of reading group. I selected the paper about gender recognition. Its information was as flowing in detal:

Shane Bergsma, Automatic Acquisition of Gender Information for Anaphora Resolution, In Balázs Kégl and Guy Lapalme (Eds.) Advances in Artificial Intelligence: Proceedings of the 18th Conference of the Canadian Society for Computational Intelligence, (Lecture Notes in Computer Science, Volume 3501, © Springer Verlag), Canadian AI 2005, May 9-11, Victoria, British Columbia, Canada, pages 342-353.
-- Winner, AI'2005 Best Paper Award

I like this paper very much. I will read through it, and then give a nice English presentation on it. The presentation will be 4:00 pm on next Monday in Newtech Building room 618. Welcome you if you would like to take part in.

2005年12月19日

Nice talking with Zhenghua

Now I was the mentor of Zhenghua. He had been supervised by me about one year. We had cooperated on several items. Now his work was on coreference resolution system implementation. We had discussed a lot about this work. Now we had finished the third vision. Our evaluation score was less than the first class of international evaluation. We will try more and overpass their sores. We had the resolve and belief.

After the whole lab's meeting, I had a discussion with Zhenghua. We discussed some related problems about our works. At the same time, we had a chat on some free topics. I knew more about him. He was an excellent student. Bless him!

2005年12月18日

Beautiful Harbin with Yajie

Do you know how more beautiful Harbin is? Do you know more about Harbin? I knew more today!

This morning, Yajie went to our campus. We met in front of the supermarket. After a nice breakfast, we went to the Big World Marketplace. After our looking around, we found the goods we wanted. Then we went to Center Street. I wanted to buy some bag. Yajie said there was a very big market saling bags. In it, so many bags were around with us. We compared some types and selected one nice bag finally. We all liked it.

We had a nice experience on shopping today. Thanks Yajie!

2005年12月17日

Snow Football match with score 5 vs. 5

No body can hold up you! It was a nice feeling in snow football match. I thought so.

This morning, we had a snow football match. The both sides were our lab and one class of CS graduate. The day before yesterday, there was a snow. After that, it was very beautiful on the football playground. We all thought it was time for a snow football match. Our captain Jianguo Lin organized the match. He had contacted our opponent. He was indeed a good captain.

From 9:00, we started our match. We were divided into two parts, 9 persons respectively. After half and one hour "fight", we got the final score 5 vs. 5. We all enjoyed ourselves. We'd like to play football match twice a week. It should be mentioned that our professor Tliu was with best level on football. Of the 5 goals, he won two. He was the best professor on football match. I thought so.

2005年12月16日

Reading Roadmap

Do you like reading? I mean reading papers related to your research. I fell reading was a very important skill for any research. As based on your reading, you could know all about others. Research is full of competition. If you can not be the best, you will be the last. There were many new papers every day. After you collection as more and more, you will read them one bye one. But, how to read them? I believed there should be a reading roadmap.

So what is reading roadmap? In my opinion, I believed that you can read them on several rules. For example, you can read papers on time. To each researcher, you must master the recent papers in a decade related to your research topic. So reading on time, you can read the papers which were in year by year. The second rule was reading on person. To research, the best way for understanding a person, I believed, was reading all his papers. Then you can discuss with him/her. To the best rule, I believed you should be intercommunication with others. You could spend some time finding the "others". If you be a member of the community, you would learn more.

The above words were my personal opinion. Just sharing with you.

2005年12月15日

Congratulation to my roommate

There was a piece of great news to our bedroom. One of my roommates had got the offer and visa for studying aboard. Before several days, he went to Shenyang for his visa. According to his words, he was very lucky. The visa doctor gave agreement to him in that day. After waiting the final document, he returned today.

We all delighted in his news. Yes. He had pained much for it. Somebody had said that no pains, no gains. It was validated by his case. I knew his effort and staying up every night. I congratulated him. I blessed he would be lucky also in USA.

Aha! I will do morning exercise alone after his left. I will miss him.

2005年12月14日

Survey to end

Who? Where? When? How? What? They were all the survey goals. I believed so.

I had said goodbye to research on coreference/anaphora resolution more than seven months. However, my favorite research, now, was that. It was a very interesting topic. I could do a lot in it. But my time was limited. I should concentrate all my energy to do it.
So, now I fell across the old and newly topic: how to do research? Under my personal understanding, I believed the first step was defining a research roadmap for your research. This step was very important. And no body could manage it once. It should be a dynamic and active action. We could spent time for it regular.

Based on my understanding, I had defined a simple research roadmap. The succeeding, I thought, was survey. So there was some requirement for survey. This topic had occurred several times in my blog. I wanted to collect all the related papers in this decade. Meanwhile, there was another topic: how to collect all the papers. So, until now, you will found out that there were many problems related to research. To be a professional researcher, you must master all the skills.

Nowadays, my task was surveying to end. It was very important.

2005年12月13日

Practice in the evening

I found out that my physical force had fallen a lot. I needed some practice every day. Consider my recent time schedule every day, I chose the evening to be my practice time.

Yesterday evening and this evening, after I returned to my bedroom, I began to practice. It was little cold. I wore my glove and ear cap. Jogging around our campus firstly, then I began to do some physical practice in the P.E. aisle.

The whole process was about half an hour. I felt very good in the days. Nice habit, nice feeling. I will keep on.

2005年12月12日

Getting Data

Recently, I was doing some research on Chinese Name. I needed some data. These days, I asked help with some friends. Yajie helped me to obtain some data. This morning, I went to a place to ask data. But it was fail finally.

Never mind it. The reason was that personal name was private information. It was difficult to get. I would try more about it.

2005年12月11日

Bowling

After long period of leaving sports on bowling, I had the chance to bowling. Yajie and her brother came together. We played four frames each other. It was a nice feeling. We were very happy to play together. Yajie and her brother played bowling at first time. But they played very well. They got two full scores in the final round respectively.

Nice feeling on bowling.

2005年12月10日

Using pure Linux system

After trying more and failing in Cygwin, I had chosen the linux system of our lab. It was a best choice. On this system, I ran the decision trees correctly. It was a nice feeling. I can do my experiments. It was a best feeling.

2005年12月9日

Problem with installing Cygwin

I had been used to compile source code with linux style in Cygwin. When I used it again this evening, I encountered a serious problem. I could not install it correctly. I copied the executable edition from another machine. But it failed again.

I had encountered this problem at the second time. I did not know the reason in detail. It was boring. I would find some other solutions.

2005年12月8日

Leiyu's blog

Eight months ago, I removed Lei Yu's blog link from my blog links. The reason was he had not updated his blog frequently. Last night, Lei said to me that he had sent his blog link to me. He had written so many blogs. There were some nice collections.

This morning, I opened his blog Torpedoes' Blog. I found there were many new articles. They were in some types: Computer Science, Miscellaneous, Probability and Random Process, Linux and Unix, Internet Technology, and Programming. I found some nice articles in it. I liked the two articles as following: 细谈VC程序调试的若干方法 and gnuplot-数据可视化工具.

So his blog link returned to my blog links again. I liked his blog.

2005年12月7日

Research Roadmap

How to define the research roadmap? It was a big problem in front of me. I would work out the research roadmap of coreference resolution. It was my recent work.

In terms of my understanding and study, I listed all the problem domain, theory domain, approach domain, corpoa domain. But they were very simple. I will work out the second vision in detail tomorrow.

2005年12月6日

Ping Pong Club

There was a Ping Pong Club in our IRLab. Before I went to MSRA, we played often. But we did not play regularly.

This time, I would like to play ping pong with us and organize it regularly. This evening, I, with Gold, booked the Ping Pong room. Tomorrow evening we would play together. I liked play ping pong.

2005年12月5日

Discussion and presentation in IRLab

This morning, under my plan, I had a discussion with Zhenghua. His work was under my original research plan. But with some sudden problems, his work had not finished on time. After his introduction, I knew there were three features had not been realized. And his whole system was running on VS.NET 2003. Finally, we listed all the action items and deadline. This week our research on coreference resolution was preparing for the whole work in this month.

On the global plan of our lab, I should present this afternoon. At two o'clock, we had the dual-weekly meeting. I introduced all the works and life I had experienced in MSRA. Many members asked some questions. I like the questions. It was the nice feedback to me.

After the whole meeting, we had the weekly reading group discussion. Zhichang Zhang gave the talk this time. He introduced one of the papers in SIGIR 2005. It was about passage retrieval for QA. The basic idea was using some sentence dependency relation and the path information for more features with ME. The evaluation was on TREC. I believed it was a nice way. We had used some dependency information for some NLP research. But we used only the node with only one arc. We could use more arcs and more features for research.

New meetings, new feeling. I like our IRLab very much!

2005年12月4日

New planning

How to do new plan? I need to think more about it. I had broken away research half a year. To a research, it was very dangerous. There are many things I should finish. I listed all of them on a piece of paper and categorized them into three parts: research, study and life. They are just like my blog name: bill_lang's study and life.

My current works in this week is just planning. It is just like choice which is the most difficult thing in one's whole life. With many nice tools and approaches, I believe I can manage it very well.

2005年12月3日

Snowing

Just now, out of the windows, there was snowing. It was a little snow. After several hours, I saw there was a thick layer of snow on the ground. It was a nice feeling when you walking on it. It was just like some sand.

When I saw through the window, I found there was a white world around our school. It was very beautiful. It was saying that a snow year, a rich year. I believed it was a snow season, a rich season.

I liked such scene and feeling. Do you like it?

2005年12月2日

New all

Dec. 2 is the first formal day I re-worked in IRLab. There was a new machine and monitor assigned to me. I formatted the machine and installed all the software. It was fresh to me. As my IP was a new one, there were many problems when I visited the servers. Victor and Gc helped me a lot. Thanks to them.

After installing a new operating system, I could do many things. I cleared up all the materials and install many common softwares. I thought in this weekend I should prepare all pre-requirement environments. Then I can do my works in the next week.

2005年12月1日

美丽的哈尔滨,我回来了

哈尔滨很冷了现在,金山师兄昨天告诉我。早上5点二十分,火车终于停下来了。冲出车站的那一刻,我感觉很像2000年9月6号那天第一次来到哈尔滨,又向去年过完年一人回到哈尔滨一样,整个儿心里是一种无尽的喜悦。是呀,半年前离开哈尔滨的时候也是有点凉意的,现在的哈尔滨已经很冷了,凌晨五点多非常寒冷,大约有零下20度吧。好在很快就回到了寝室。

早上补了两个小时的早觉后我来到了既亲切无比又有点点生疏的综合楼,进入实验室的时候我敲了敲门,老师们都在这里辛勤的工作着。看到老师们以及实验室的师兄师姐师弟师妹们,我真的感觉像是回到了家。哈尔滨看来真的成为了我的第二故乡了。

回到实验室的感觉真好,我原先的位置已经给了正华,现在只有一个位置了,安装了显示器和机器后开始了实验室的学习生活。这台机器的主板好像出了些问题,还好实验室机器大官家金山师兄很快找来专人帮我换了一块。

晚上六点左右我终于见到了亚杰。她今天也很累的,刚上完课就往这边赶过来了。晚上非常高兴的和亚杰共进晚餐了。晚上亚杰看到了我在北京所有的照片,也见到了我们实验室很多的同学们,一时之间她也感觉有点记不过来了。呵呵,以后一定会经常见到的。