2005年9月30日

Wavelet is a good thing!

I knew Wavelet in some mathematics courses. And last year, I knew some body used it for blaze temperature. In his slides, there were some tree structures. It's a nice link to sentence tree structure. I wanted to do some research on combining wavelet for NLP. But there were so many mathematical formulas that I had not known it clearly finally. And by surrounding many other tasks, I abandoned it nearly.

These days, Ke Wu, who was from ShangHai Jiao Tong University, had introduced many related basic knowledge to me. We discussed many ideas on combining wavelet method for NLP. This afternoon, we had a casual brainstorm on the combination. However, after checking on web, we found some similar ideas.

We believed our ideas should be insistent. We will try more.

2005年9月29日

The art of Asc II

I had one more task for Giza++. I should do some tokenization for the parallel corpus. The simplest approach was using space for separating interlunations.
Using C# regular expression, it was simple.
"str_temp = Regex.Replace(str_temp, @"\.", " . ");"
could be used for separating period.
But there were many symbols should be separated: ,./;'[]\<>?:"{}|`~!@#$%^&*()-_=+
If using the approach, it was boring.

However, there was another usage for regx. "\x09-\x15" could be used for matching lots of symbols. To do so, I should check the Asc ii table.
I found one nice table of Asc ii, as following:






You could find out the interlunations were in some areas: 21-2F, 3A-40, 5B-60, and 7B-7F, in Hex respectively.
So my solution for such task was
str_temp = Regex.Replace(str_temp, @"([\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7F])", " $1 ");
str_temp = Regex.Replace(str_temp, @" ", " ");
The second step was used for deleting redundant spaces.

Until now, I thought Asc ii was an art. How do you think?

2005年9月28日

Happy Birthday to Zzz

This morning, our Libra members had a discussion on the celebrating activity for birthday of Zhizhi Zhou(Zzz). This time, we all had not any hesitation. Our decision was gathering at hall of one floor and going to have a party around Renmin University.

Tian Fang, with his girl friend, Ying Song, and I were on behalf of MSRA. We all were in Libra. We knew each other on the poster design competition. Zzz's three roommates and one girl's boyfriend went together to the Western-style restaurant. We were very glad during the four hours. We had nice dinner there. I liked beefsteak with black pepper very much. After our dinner, we had some games there. At the beginning, we told many jokes. Then we played the classical game "Kill man".

How time flied! When we realized the time. It was 10:30pm. We had such a nice evening. Happy birthday to Zzz.

2005年9月27日

我的生日

记得2003年我生日那天我们实验室和机器翻译实验室进行了乒乓球友谊赛,2004年我生日那天我参加了硕博连读的面试,今年生日很遗憾没能在实验室度过。但是不要紧,现在的网络非常发达,身在北京我同样收到了很多的祝福。今天的我是幸福的,快乐的。
早上我到北航邮局取回了亚杰寄来的生日礼物,说来也巧,她数日前寄出,我今天刚好收到。小心打开包裹发现里面是一个精致的钱夹和一盒精美的巧克力,钱夹里面还有亚杰的照片,那是她在北京的时候我帮她照的,很不错的。非常感激她的礼物。

晚餐我邀请了NLC的朋友们一起到希格玛对面的湘军宴餐厅。这里主要经营湖南菜肴,味道非常的不错。到座的朋友有我的兄弟世奇,重大的明儿个飞美国的际洲、酷爱大长今的陈议,天大的未曾蒙面就很熟悉的小崔、惯于语出惊人的永火、天大求实AI版主张扬,南开的幽默爱好广泛的徐君、带领我们外出游玩的刘菁菁,上海交大每晚熟读paper的包胜华、今天刚我讲授小波的吴科,中科大给我讲我的姓的来由的蒋凡,北大才子代码高手蒋龙。大家在席间谈笑自如,非常感谢大家陪我度过了我的24岁生日。

24岁代表了我度过了两个人生的年轮,下一个五年计划就是我的读博之路,我会为之奋斗的。

2005年9月26日

Busy on C# programming

Now, my task was extracting dictionary entries from a sql server database. I was programming in c# language. After the programming practice on knowledge base import program and developing a simple website for knowledge base writing, I had been familiar with C# a lot. Its string process ability was very good.
This time, my key point was on arraylist. There was a bug I thought in C#. That was is you use:
-------------------------------------------
string[] array_test= ……
foreach(string i in array_test)
{
string str_temp = i.ToString();
int int_start = str_temp.IndexOf("test"); //there will be an error. It is said that int can not transform into string
}
-------------------------------------------
I did not know why it happened. My solution was using one more function importing i.ToString() and return the string processing result.

There was some difference between C# and C++ programming. I would be know more about their difference.

2005年9月25日

又是王垠

昨天的blog中提到的文章完全用Linux工作,摈弃Windows在文尾才提到原来作者就是王垠。他的这篇文章已经在Linux界引起了强烈的反响。

这两天看到周围很多同学在看一篇文章清华梦的粉碎—写给清华大学的退学申请。原本以为和以往看到过的描述读博期间遇到的寒心往事差不多也就没有关注。今晨打开电脑,这篇文章又出现在我的Gmail论坛邮件里面,打开链接后发现作者又是“王垠”。联想到昨日的文章,不禁想了解一下这位大侠究竟是何方神圣。

他的文章从他儿时的经历开始。看着他探索自然观察周围的那种兴奋时,我忘了文章的标题,因为这种东西彷佛在自己身上出现了共振。自己小时候也有过类似的情形出现。文章在描述他的高中生涯时发出了对中国高中教育的控诉。到大学时,文章中最扎眼的就是一位老师在签字时说的话以及在和他的兄弟们告别宴上一位同学的肺腑之言。写到作者到了清华还是他的硕博之路后就开始了他对“什么是研究?”的探索。《计算几何》课程中体会到了研究的乐趣。之后和导师以及副导师在论文上的斗争是作者最为激烈的感受。作者生活中体会到的大学研究不过了了,很难将他心底儿时积攒下来的那种发现创新欲望发挥出来。最终导致了作者的退学申请。

这篇文章写于9月22日,也就是三天前。他也许会出国去寻找他所期望的研究乐土。网上有人评论这篇文章,说是作者的决断或许偏激,但是所提到的现象在大学中确实存在,论文其实已经成了衡量科研的唯一标准,没有人更好的过问科研成果的转化问题。

或许王垠的案例中体现出来的是一种中国当前高等教育科研的现状。说来很巧,王垠家在眉山,那个地方和我的家乡峨眉山非常的近,也就三十公里之遥吧。王垠是硕博连读,我也是。想到此,自然会问自己,“我会遇到王垠的这种情况么?”。

有人列出他的经历,对于未曾经历类似情形的人是一种教训。我应该尽量避免出现类似的问题。事实上,我现在感受的科研还没有出现王垠遇到的情形。王垠文章中提到的那种类似国外大学的 Common Room在我们学校里面已经有很好的形式。我们实验室举办的reading group&coding group以及我在学校里面和另外一位师兄发起的machine learning group都起到了很好的效果。我的老师对于我组织类似活动是极大支持的。在我所在的实验室中体会到的是一种研究的自由,那种自由的空间是非常明显的。老师的开明和言传身教让我觉得我的硕博连读是我正确的选择。

或许应该感谢作者的这篇文章,让我多了一个审视自己的机会。现在的我更加深信自己的连读是正确的。我也要感谢我的老师以及我所在的实验室HIT-IR-LAB


ps:在王垠的个人主页(清华)上面发现了很多有意思的东西,大家不妨上去看看。

2005年9月24日

The process of learning on computer application

I had such cognition that every body learns computer application from high to low level. It seems to be incomprehensible. You can read on my understanding.

When some body, who have never known and used computer, will be attracted by the wonderful interface and nice basic application. In terms of some all-known reasons, his first using operating system maybe Windows 2000/XP. He will be used to use Windows Office 2000/XP/2003 for his common requirement. After his enjoyment of much nice beautiful software, he would be like to learn programming. So he would choose some nice IDE, such as Miscrosoft Visual Studio. These tools are integration by many little tools. But those IDE software have some faults impacting the high level users. He would find out such faults and then try other tools. Under the main current, he will try command line style tools, like "dos", "Linux", and so on. Finally, he will be sensitive on tools for his requirement.

I had little feeling on one article, just above. The article is 完全用Linux工作,摈弃Windows. Maybe you will get some inspiration on it also.

2005年9月23日

美国评选出35名前沿热门技术年轻科学家

由美国《技术评论》杂志发起,19位美国大公司总裁及大学教授组成评议小组,从美国各大学、大公司和新建小公司35岁以下的科学家中,评选出了35名最优秀的科学家和工程师。他们的主攻科研方向正代表着当今最热门的技术领域,他们正在试图解决人类面临的最困难的科技问题。

其中很多都和IT技术非常有关系。下面列举了这些精英人物。其中的多文档文摘系统Newsblaster能够获奖对于咱们的NLP研究是一件让人振奋的事情。

1.多伦多大学29岁的阿拉比(Aarabi),他正在从事计算机监听技术的改进研究。
 他发明的一种算法,能计算出一个声音达到两个十分靠近的麦克风之间的差异。根据这一声音的延迟差异,该计算机软件能测定讲话者的方向,并可将众多讲话者之一的声音放大,而将所有其他人的讲话处理成噪音过滤。阿拉比的发明能消除手机对话中的无用噪音,并能增强汽车中声音的控制。

3.麻省理工学院34岁的巴兹利(Barzi鄄lay),她正在领导开发“Newsblaster”系统。
  该系统能识别不同新闻机构对同一题目报道的新闻,然后从所有新闻报道中摘出重要内容并产生新闻提要。尽管人很容易从文章内容推测出一个词的意义,但计算机不能。巴兹利采用统计学的机器学习软件,教会计算机做出良好推测。目前,她开发的软件已能对新闻报道做出比较准确的摘要。她还正开发一种新软件,可用于概述录音讲话的内容,以及处理打给航空公司的订票电话。

7.32岁的布特费尔德(Butterfield),他创建了图片共享网站Flickr。
  他于2004年夏天创建的照片共享网站Flickr,目前已快速成长。他采用一种“标签”,使人们可利用内容查找上网的照片。今年3月,Flickr网站被雅虎公司收购。目前,Flickr网站有100多万用户,每天有几十万张照片上传。

  8.Aster数据系统公司30岁的坎迪(Candea),他设计出了一种防软件毁坏技术。

  软件在经过“训练”后,可监测本身的运行情况。如果探测到软件本身有某些错误,就会开始对有问题的部分进行“外科式”修复,而整体系统功能不会中断。

  9.SunMicro系统公司31岁的坎特里尔(Cantrill),他解决了实时诊断软件的问题。

  通常,一个软件不能正常发挥作用,系统管理人员要花几天时间查找并排除问题。坎特里尔开发出的被称为DTrace的应用软件,可自动对软件进行实时诊断,从而使工作人员在几分钟内就能将损坏的软件修复。

  10.数字网络公司34岁的卡文(Carvin),他正帮助解决“穷人”上网的问题。

  目前,他已帮助750名技术活动家、教育工作者及小企业主建立起了一个网上社区,其任务是解决信息时代信息享有不平等问题。他最终的设想是,能利用手机将信息以“网志”形式上传。

  11.BitTorrent公司29岁的科恩(Cohen),他研发的BitTorren软件,解决了大文档的传送问题。

  通常,400兆位的影视文档传发给单一用户需要几小时,而利用BitTorren软件,就能将大文档分解成几千个小文档,每一个在网上传送的时间仅需几秒钟。他同时消除了带宽问题,单一用户可在很短时间内收到别人传来的大文档。目前,游戏公司和Lin鄄ux研发人员都在试用该软件传送大文档。

  14.Dodgeball公司29岁的克罗利(Crowley),他创建了一个网站,通过这个网站,你就能告诉朋友你所处的地方。

  在将自己和朋友的名字登记到Dodgeball网站后,外出前,你将自己的目的地输入网上,你的行踪就可以随时为其朋友所知。Google公司非常喜欢克罗利的设想,并于今年5月收购了Dodgeball公司。克罗利说:“在全部时间里都能知道你朋友的行踪,这真是一件感觉非常好的事。”

  18.Squid实验室31岁的戈理费斯(Griffith),他是一位遵循“灵感”搞发明的人。

  他只用5分钟就为客户订做了价值5美元的透镜。他在麻省理工学院媒体实验室读完博士学位后,组建了Squid实验室,以探索发明业务。他正在开发用于计算机设备的开放源码硬件。

  21.Tronos网络公司31岁的查利(Chari),他正在制定无线区域网络标准。

  以前无线区域网络仅用于军事领域。在哈佛大学读物理学研究生时,查利就发明了一种算法,将无线区域网络用于民用通信。2000年,他成立了Tronos网络公司,将无线区域网络商业化。当他将路由器装在路灯杆上时,由于成本低廉,无线区域网络在户外、医院和工厂得到广泛应用。利用查利建立的“路由协议草案”而实现的无线区域网络服务,在新生的区域网络工业占据了领先优势。电话公司十分担心这一技术的推广会威胁自己互联网客户的资源。现在,查利希望他的技术能使发展中国家实现在任何地点、任何时间的通信联系。他的第一批无线网络设备已运到了印度。

  22.加州理工学院29岁的特蕾西•赫,她正在研究如何让互联网更有效率。

  当今的互联网,是将文档变成数据包传送信息,每个数据包从一个路由器通向另一个路由器,一直到达终端用户。但当文档越来越大或发给多个用户时,传输这些数据包变得十分复杂。特蕾西掌握了新的替代方法,让网络节点随机地混合数据包,以足够的信息对它们加标签,从而帮助终端用户的计算机恢复初始的数据。这种分散化的方法自动地使带宽应用达到最佳化。特蕾西提出“分布式随机信道编码”计划一个月后,就受到了微软公司研究人员的重视。目前,微软公司已投资这项被称为“雪崩”的计划,希望尽快将特蕾西的想法商业化。

  28.康乃尔大学33岁的马诺哈尔(Manohar),他从事帮助计算机在没有时钟的情况下更好工作的研究。

  通常,计算机芯片的不同功能由芯片内的“时钟”协调完成,这意味着在最慢速的操作完成前,不能进行最快速的操作。马诺哈尔设计的芯片内取消了时钟,但芯片运行速度更快并可节能10%%。在芯片外,他采用短导线来载送全球定时信号,当前一个运算完成时,就使接续的运作处于待命状态。

  29.雅虎公司研究部34岁的彭诺克(Pen鄄nock),他正在研究如何预测市场未来。

  彭诺克试图通过计算来表述经济理论,他的研究不仅构成了预测市场的基础,而且大大提高了雅虎和Google网站的检索功能。

  30.Rosum公司32岁的拉比诺维茨(Ra鄄binowitz),他改进了全球定位系统(GPS)的定位精度。

  在建筑物内部和郊外山谷区,GPS技术往往不准确。拉比诺维茨通过采用嵌入广播电视信号内的同步编码,提高了GPS精度。他组建的Rosum公司开发出采用同步编码的手持装置,通过计算用户离信号源有多远,从而能测定他或者她的位置。他开发的技术使户内或城市内GPS定位误差处于1米至2米的范围内。

  33.麻省理工学院32岁斯特拉希(Stel鄄lacci),他能更快地制造基因芯片。

  基因芯片对研究遗传疾病如糖尿病和多种癌症等非常有用。目前,基因芯片制造成本高,费时长。斯特拉希有望找到快速生产基因芯片的方法,每片成本接近50美元。他将单一DNA片断的基因信息“刻”在基片上,然后把这个基片作为生产多个相同基因芯片的主模板。

  34.约翰•霍普金斯大学24岁的斯塔博利菲尔德(Stubblefield),他在很多安全信息系统中找到了漏洞。

  他证明了早期的无线安全协议草案(WEP)并不安全,还帮助破解了“安全数字音乐计划”中的电子水印,并帮助揭示了一种电子投票机软件的安全缺陷。

全文链接:http://www.stdaily.com/gb/stdaily/2005-09/23/content_437393.htm


2005年9月22日

Keep health and do better

Somebody of us found that we had been little fatter than ever. Before three days, I began to do physical practice every evening. At beginning, I found after lazing for a long time, my physical force had been worse. The three evenings, I ran and did some physical practices. I felt little pain all around of my body, now. To my experience, it was very common. My physical was resuming.

Keep health and do better. This was my faith on practice!


2005年9月21日

[Collection]SQL Server 7.0数据库的六种数据移动方法

1. 通过工具DTS的设计器进行导入或导出
DTS的设计器功能强大,支持多任务,也是可视化界面,容易操作,但知道的人一般不多,如果只是进行SQL Server数据库中部分表的移动,用这种方法最好,当然,也可以进行全部表的移动。在SQL Server Enterprise Manager中,展开服务器左边的+,选择数据库,右击,选择All tasks/Import Data...(或All tasks/Export Data...),进入向导模式,按提示一步一步走就行了,里面分得很细,可以灵活的在不同数据源之间复制数据,很方便的。而且可以另存成DTS包,如果以后还有相同的复制任务,直接运行DTS包就行,省时省力。也可以直接打开DTS设计器,方法是展开服务器名称下面的Data Transformation Services,选Local Packages,在右边的窗口中右击,选New Package,就打开了DTS设计器。值得注意的是:如果源数据库要拷贝的表有外键,注意移动的顺序,有时要分批移动,否则外键主键,索引可能丢失,移动的时候选项旁边的提示说的很明白,或者一次性的复制到目标数据库中,再重新建立外键,主键,索引。
其实建立数据库时,建立外键,主键,索引的文件应该和建表文件分开,而且用的数据文件也分开,并分别放在不同的驱动器上,有利于数据库的优化。

2. 利用Bcp工具
这种工具虽然在SQL Server7的版本中不推荐使用,但许多数据库管理员仍很喜欢用它,尤其是用过SQL Server早期版本的人。Bcp有局限性,首先它的界面不是图形化的,其次它只是在SQL Server的表(视图)与文本文件之间进行复制,但它的优点是性能好,开销小,占用内存少,速度快。有兴趣的朋友可以查参考手册。

3. 利用备份和恢复
先对源数据库进行完全备份,备份到一个设备(device)上,然后把备份文件复制到目的服务器上(恢复的速度快),进行数据库的恢复操作,在恢复的数据库名中填上源数据库的名字(名字必须相同),选择强制型恢复(可以覆盖以前数据库的选项),在选择从设备中进行恢复,浏览时选中备份的文件就行了。这种方法可以完全恢复数据库,包括外键,主键,索引。

4. 直接拷贝数据文件
把数据库的数据文件(*.mdf)和日志文件(*.ldf)都拷贝到目的服务器,在SQL Server Query Analyzer中用语句进行恢复:
EXEC sp_attach_db @dbname = 'test',
@filename1 = 'd:\mssql7\data\test_data.mdf',
@filename2 = 'd:\mssql7\data\test_log.ldf'
这样就把test数据库附加到SQL Server中,可以照常使用。如果不想用原来的日志文件,可以用如下的命令:
EXEC sp_detach_db @dbname = 'test'
EXEC sp_attach_single_file_db @dbname = 'test',
@physname = 'd:\mssql7\data\test_data.mdf'
这个语句的作用是仅仅加载数据文件,日志文件可以由SQL Server数据库自动添加,但是原来的日志文件中记录的数据就丢失了。

5. 在应用程序中定制
可以在应用程序(PB、VB)中执行自己编写的程序,也可以在Query Analyzer中执行,这种方法比较灵活,其实是利用一个平台连接到数据库,在平台中用的主要时SQL语句,这种方法对数据库的影响小,但是如果用到远程链接服务器,要求网络之间的传输性能好,一般有两种语句:
1> select ... into new_tablename where ...
2> insert (into) old_tablename select ... from ... where ...
区别是前者把数据插入一个新表(先建立表,再插入数据),后者是把数据插入已经存在的一个表中,我个人喜欢后者,因为在编程的结构上,应用的范围上,第二条语句强于前者。
6. SQL Server的复制功能
SQL Server提供了强大的数据复制功能,也是最不易掌握的,具体应用请参考相关资料,值得注意的是要想成功进行数据的复制工作,有些条件是必不可少的:
1>SQL Server Agent必须启动,MSDTC必须启动。
2>所有要复制的表必须有主键。
3>如果表中有text或image数据类型,必须使用with log选项,不能使用with no_log选项。
另外max text repl size选项控制可以复制的文本和图像数据的最大规模,超过这个限制的操作将失败。
4>在要进行复制的计算机上,应该至少是隐含共享,即共享名是C$或D$…。
5>为SQL Server代理使用的Windows NT帐号不能是一个本地的系统帐号,因为本地的系统帐号不允许网络存取。
6>如果参与复制的服务器在另外的计算机域中,必须在这些域之间建立信任关系。

2005年9月20日

On Chatbot: Loebner Prize in 2005

Loebner Prize
I had heard that Loebner Prize in 2005 had been announced. It was the fifteenth annual Loebner Prize contest, Loebner Prize 2005, was held Sunday, Sept 18, 2005 in New York City.

Here are the results:

First Place - Rollo Carpenter Mean Rank 5.75
Second Place - Vladimir Veselov Mean Rank 6.00
Third Place - Steve Watkins Mean Rank 7.00
Fourth Place - Richard Wallace Mean Rank 7.25

Detailed results are at: http://loebner.net/Prizef/2005_Contest/results.html

Hugh Loebner, who was Loebner Prize Sponsor, announced three things at the contest.
1. The next contest will be held Sunday, Oct 1, 2006
2. The final four contestants, or their representative(s) must be present and supervise their entries during the contest.
3. He will award 4 stipends of USD 250 to each of the Final Four. This is to underwrite travel expenses, or to allow them to hire a local representative if they can not appear.

The session logs(transcripts) should be up by 9/20/2005.

I remembered when I studied on Artificial Intelligence, I had new some information about Loebner Prize. You could review my blog http://ir.hit.edu.cn/~bill_lang/blog10/archives/001007.html. Yeah~! It was very interesting contest. Maybe someday, we could take part in it.

Some experience on text processing

This morning, my Giza++ program had been down. After check, I found the two files for comparing had different number of lines. There were three lines missing in one file. Giza++ was robust. By its error file, I had known this. So one day when I write some toolkit, I should consider its robust error reporting ability.

There were some useful techniques on text processing. I just record them here and sharing with you.

1. Merging many files into one.
You can use the classical dos command COPY to do you. I first copied all the files which I wanted to merge into a folder. And then
"copy *.txt final.txt"
could manage this task. You also could use similar command, such as
"copy one_*.txt final.txt".

2. Compute the number of lines of a text file
Somebody told me dos had not any such ability. But under Linux, you could use command "wc". It could print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. I used this command under cygwin. I used "wc *.txt -l".

3. Batch processing
I had heard that console program had one advantage that you can use batch processing. I had not understood it clearly before these days.
Maybe you had similar experience of me. You had written a console program. There were many parameters you should type in dos interface. After you tried once, you should type same parameters again. It was boring. Batch processing meant you could list some command stream in a txt file named as xx.bat. You could config all the parameters into it. Then at each time, you can run the bat file directly. It was very convenient. I believed it was like macro in Ultraedit, office. It was very useful.
There were some related materials about batch processing.
Batch processing From Wikipedia
Batch processing command in detail

It's never too late to learn so.

To each text concerning researcher, Ultraedit was very powerful for us. It had many nice features, such as macro, regular matching, format transferring, and so on.

2005年9月19日

Giza++ in Windows

Giza++ is a famous tool for machine translation research. But it's original edition is for Linux. It could not be used directly in windows. The main problem was GUN compiler.

How to run it in windows? It was a big problem. After searching answer in web, I found there was a student had solved this problem. The related code and documentation were in this link: http://blogs.gcomputing.com/bluegene/archives/000466.html.

2005年9月18日

Nice Mid-autumn

I had remembered clearly the past two mid-autumns. I spent that with my friends of IRLab. Two Chinese sentences could describe it. "2003:数人夜话紫丁香,2004:月饼飘香六一五". They were very beautiful. This year, I had a similar wonderful festival.

This noon, Shiqi, Jizhou, Yichen and I, had a nice dinner in Ganguoju nearing Wudaokou subway station. We four celebrated the festival firstly. During our dinner, we talked many free topics. According to our improvisational feelings, we poetized. We four gave a sentence respectively like Chinese old style. Our final work was as following:

中秋佳节时,(shiqi)
月夜花香似。(jizhou)
梦憬乡人情, (me)
遥想君何是。(yichen)

How do you think about it? We all liked it very much. :)

After the noon dinner, we went to Xidan again for our autumn-winter clothes. We walked around for a very long time, and bought some nice clothes. We had supper at the top floor of that big emporium. We were very relaxing and glad.

We all were so tired that our original visiting Tiananmen plan was overturn by ourselves. We returned to our bedroom and had a nice sleep. In the evening, Yajie gave me a phone. We talked on some interesting topics again. We were very happy to talk to each other.

2005年9月17日

Astonished by new programming techniques

There were so many new programming techniques surrounding us. If you do not take care of them, you would never be astonished by them. During I tried the practices of the book Professional UML with Visual Studio .NET, I found some new techniques.

I used Visio professional editing the UML structure. But this edition could not generate code from class structure. After checking on MS web, I knew Visio had several editions. And only Visio for Enterprise Architects had the ability of generating code. Visual Studio.Net Enterprise Edition had this Vision edition as its attachment. It's badly that now I used Visual Studio.NET professional edition. So I should change to some other choices.

During finding the solutions on web, I found some nice tools for programming in VS.NET. They were Ten Must-Have Tools Every Developer Should Download Now. I tried Snippet Compiler and NUnit of them. To NUit, it was fitting the XP programming thinks. CR999 had introduced it in our lab last year. But I had not tried it more. After reading the article "Test-Driven C#: Improve the Design and Flexibility of Your Project with Extreme Programming Techniques", I thought I had been like it.

It's coincidental that VS.NET 2005 had integrated similar features like NUnit. After tried VS 2005, I found it was very convenient. Many others nice features were in VS 2005, such as unit test, refactor, and Team Test. I'd like to try it more.

2005年9月16日

Professional UML with Visual Studio .NET

Now I met a problem. I would do some program architecture design. How to represent the architecture? After my survey, I found UML was very good for this requirement. There were some related materials for UML design. It was reported that Microsoft Visio was the most popular UML software in 2004. And there was a direct book: Professional UML with Visual Studio .NET -- Applying Visio for Enterprise Architects.

I had bought it from China-pub. After my practice following the guide, I could work out some nice pictures on UML. Software engineering methodology was very important for programming. It seems time consuming. But through the whole process, it was needed. The programs I had written before, I thought, were very weak in software engineering. I could practice more on my current design task.

I had read one third of this book. I thought I could finish it in this weekend.

2005年9月15日

Farewell to Chengjie Sun

Chengjie Sun, who was a Ph.D. candidate of ITNLP lab of HIT, would return back to Harbin tonight. Last evening, we all NLC VSs had a farewell feast with him. Today, his check out was well-off. Shiqi, Yichen, and I had the last supper with him in BUAA restaurant. Then we sent him to the door of BUAA. As the taxi was space limited by loaded his boxes, we said goodbye to him at the door.
I had acquaintance with him on the Ph.D. candidate seminar of our three labs. But after half a month, he went to MSRA. When Shiqi and I came here, he helped us a lot. Everyday, we had dinner together. And we had visited some places of interesting together. He was a good brother to us.

From tomorrow, there were only Shiqi and me from HIT in MSRA. We were little lonely. We would do our best on behalf of HIT.

2005年9月14日

1000帖1000梦!

Link: http://ir.hit.edu.cn/cgi-bin/newbbs/topic.cgi?forum=20&topic=360&show=0

热烈祝贺HIT-IR-BBS-MachineLearning发贴数超过1000!
本版成立于2003/08/01晚上9点左右
到今天(2005/09/14)总共走过了776天
目前的帖子总数为1000,主题个数为336
平均每天有1.289个帖子诞生,每三天就有一个新的主题出现!1000贴1000梦,大家的水平也在进一步的提高之中!

在这过去的两年多里,在诸位网友的积极参与下,咱们探讨了很多的问题,包括了当前机器学习中的热点和难点问题,在这个小小的版面上大家之间也增进了彼此之间的友谊,虽然我们大多都未曾见过,但这丝毫不影响咱们的交流!

诚然,目前的版面中也存在一些问题。那就是探讨深入一些的机器学习问题的帖子比较少见,多数还都是一些介绍性的资料,以及一些资源的连接。这恰恰放映了咱们对于机器学习认识的正在深入的良好景象。我相信在诸位朋友的努力下咱们一定会对机器学习深入认识,大家在机器学习上的研究和开发能力也一定能够得到极大的提高!

在这里我想对大家说:让交流成为习惯!咱们的明天会更加美好!期待您进一步的参与和讨论!

2005年9月13日

Change into another project group

This morning, I got the news that I should change into another project group. My direct supervisor was Cheng Niu, who was very famous on Information Extraction. And I should pay little attention on my original project. My direct task was of more emphases on coding. There were some new concepts and techniques. I had ordered some books on China-pub.

New task, new challenge!

2005年9月12日

Reading more on another topic

This afternoon, in our project discussion, I had shown my ideas on dialogue management. There were many problems remaining.

Maybe I should change to another topic. I should give a direct answer for this question. After the project meeting, we four students discussed some related topics at the lounge. On our brainstorm, we found out some good ideas about my research topic. After checking on web, I found there were not many research on such topic.

Good. I could do a simple survey firstly.

2005年9月11日

Visit friend of Beijing Normal University

On plan, this morning I went to Beijing Normal University. I got up as usual. When I came the south door of Beijing Normal University, it was 8:50.
My friend was Deliang Wang. He was a Ph.D. candidate in the third year of Foreign Language Department. Now, he was a English teacher in his department. Meanwhile, he was studying on Chinese Zero Anaphora Resolution. He had some little questions. We were talking in a drinking bar from 9:00 to 12:00. It was a wonderful talk. He introduced his Ph.D. paper's outline and catalog. We discussed his forthcoming experiment and exchanging some ideas on anaphora resolution research. There were some fresh concepts to me, such as exophora, RST trees and head-veroins. He sent a very nice book on anaphora resolution to me.

It was a very nice intercourse. Thanks to Deliang Wang. Wish he graduate on time. :)

2005年9月10日

Teachers’day & Beijing Arboretum

It was teachers' day. Blessing to my IRLab teachers!
Chengjie Sun, Yumei Li, Pengbo Xu, and I gather at the door of Sigma with Mr. Changning Huang, who was an elder man of MSRA. We all expressed gratulation to him. He was very glad. Then we went to Beijing Arboretum by taxi. We went to Beijing Brboretum with us today. Mr. Huang had good body and physical force. He walked with us and talked humorously. He was in nice mood with us. It's a long-time walking.

Beijing Arboretum was very large, just like Harbin Arboretum. So many trees and plants were green now. There was a show of the 8th Miniascape and Stone enjoyment of Aisa Pacific. We saw so many surprise stones. We were astonished by their figures and artistic conception. When I got the pictures, I would share with you. There was a unique garden. It was a very big glasshouse. It was introduced that many intertropical rain forest were in it. But the door fee was very expensive. And somebody had been to it. So we decided to go in it another day.

2005年9月9日

MS Scholar

One visiting student of our NLC Group had won the MS Scholar Award. We all celebrated him. He was very glad to feast with us.

2005年9月8日

My Presentation on Study Group

This afternoon, I would give me first presentation on study group of NLC group. My reading paper was Dialogue Act Tagging for Instant Messaging Chat Sessions. It was a ACL paper on Student Workshop 2005. The topic was very interesting.

I had prepared the slides from the morning to 2:00pm. It's a tiring and exciting process. Because there were so many tricks in doing slides. And when I finished my slides, I believed it was very beautiful.

There was a wonderful materials about how to do slides: How to Make Slides & Overheads. I believed it was useful to you.

Yeah. Doing slides was definitely artistic. If you had done beautiful result, why not pay more attention on doing beautiful slides.

2005年9月7日

One more nice free software: R

This morning, when I was in deep learning the papers on IM Message Act Classification, I found a concept Kappa Statistical Test. It was very useful for the consistency test. Now, I could not use Matlab. And Excel was without such computing factors. So I began to try others.
I remembered someday Jietang recommanded one GNU free software R. I had read some materials about it. Jun Xu had used it a lot. He said there were many professional statistic researchers used it a lot. Maybe the reason was it's free. :)

I had collected many materials. Their links were as following.

http://www.r-project.org/
On its left bar, there were many useful links of R.

Introduction of R(Chinese virsion)

After running it on my computer, I felt it was similar with Matlab. Yeah. It's very good for me.


It could plot many beautiful pictures. Like the following one:


2005年9月6日

Yichen's Birthday

This is Yichen's birthday. We had a wonderful dinner in Tianfuyuyuan. We were Yichen, Shiqi, Jingjing, me, Xiaocui, Jizhou, Shenghua, Longjiang, Junxu, Chengjie Sun, and Yonghuo. We were all of NLC group. After our dinner, we had the classical game of stick, tiger, chook, and worm.

There was another piece of exciting news. I had finished the first version of my specification for our project. I could do more on my other task.

2005年9月5日

Specification and corpus

Nowadays, I was concerning on the specification. Until the project discussion, I had finished most of them. There was another exciting news to me. That was I had got the LDC corpus for my research. So I could do lots of interesting research.

2005年9月4日

夜空之美

晚上在北航的大体育场上跑步,最后一圈是走完的。说来也巧,以前跑完步的最后一圈也都是漫步一圈,但是今晚忽然看到了夜空中闪亮的星星。最后一圈刚开始我还走得较快,有了这种感觉后脚步不经意间慢了下来。一时之间感受到的是一种和谐与灵动。心底儿时的梦想和憧憬又一次浮现在了眼前。

这是一种很微妙的感觉。忽然想到了最近看到的那个《Time Management Tips for Developer》.其中的一段是

Set up your goals: long term and short term


To make life better, first of all you need to know what is "better" personally for you. Where do you want to be next week, next quarter, next 2 years, or even next 20 years (if you young enough :-). You must decide for yourself what are you wanting from your life, and why you are still where you are.

的确,现在的我需要好好的完成这件事情了。

2005年9月3日

Specification~!

Nowadays, I am working on the specification. It was little time-consuming. Fortunately after my working these days, I worked out the framework and finished the main parts. I should enrich it and report to my mentor.

After the specification, I would pay more attention on my dialogue modeling and management. It was my main task. There were many papers waiting for me. I should try more.

This afternoon, Dr. Tliu had a dinner with us. He was on business in Beijing. He introduced his recent experience and feeling. Thanks to him.

2005年9月2日

[collection]Time Management Tips for Developer

Source: http://www.codeproject.com/useritems/Time-Management-Tips.asp
Author: Alexander Fedorenko

There is a way to make life better. Really few time management principles can bring our life to success and improve software and web development to really profitable and fascinating work.

Introduction


Software and Web Development can be really exciting, after years of development it can reward with a million dollars or became a groove. Many of us usually are about all can hope for just keeping heads above water. But I think this is not our goal what we dream in school and university.


To earn more, many of us are searching for some additional work and can't bother about anything except hands down programming. We can't take a rest, can't spend more time with the family and friends, we can't do anything other than work. This leads to stress and unsatisfied life.


But wait. There is a way to make life better. Really few time management principles can bring our life to success and improve software and web development to really profitable and fascinating work.


Set up your goals: long term and short term


To make life better, first of all you need to know what is "better" personally for you. Where do you want to be next week, next quarter, next 2 years, or even next 20 years (if you young enough :-). You must decide for yourself what are you wanting from your life, and why you are still where you are.


Do not lazy planning


Napoleon told what only properly planned things can produce the desired result. Don't ignore this principle and invest time for planning. Remember, mussing is not planning. I like classical citation: "Sometimes I sit and think and sometimes I just sit". Usually this phenomenon can eat much time. If you found yourself mussing, switch to another work, look at the window or simply relax your eyes.


Update your plans according to reality regularly.


If you can't plan, just track


Watch for yourself if you can't plan anything this time, you will be able to comeback to planning later. Just track what you are doing on paper, excel sheet or using a task management software. Update at least hourly, not at the end of a day. This will help found common interrupters and recurring tasks, thus you can plan these things in the future.


Look at your time journal and try to find things that don't really need to be done, things what could be done by someone else, work what can be done more effectively or quickly, actions what wastes others' time.


You can download a simple time tracking template here.


Collect all tasks in a to do list


Sometime we doesn't have anything to do, but later we remember (or manager remembers for us) a lot of important tasks, which automatically became urgent. To avoid such situations the only way is collecting tasks in the to-do list. Add tasks to a list whenever it comes from your boss, colleague or from your mind. If you can't access computer, don't remember the task, write it down on a scratch or any other media. Transfer it to the main list when possible.


Estimate every task, set deadlines yourself. This will help you avoid doing things at the last minute.


Adjust priorities


Drucker Dictum told: "Doing things right is not as important as doing the right things". In software and web development it is possible to spend a lot of time for tasks what produce insufficient value for a customer or even do not produce a value at all. For example, writing a regular expression to split a coma-delimited array or even worth: writing a CORBA application to access two methods on a remote server. There is no a silver bullet what can shoot all prioritization cases, but few tips can help:



  1. Ask customer or manager for tasks ordering and prioritization first. Be sure to do this beforehand: not every customer will answer immediately.
  2. If someone else dependent on specific task then do it first.
  3. For equal tasks set priorities using task difficulty: ugliest tasks first.

Delegate when feasible


If you know people around you, who is available to take a part of your work, do not hesitate to delegate it. Give objectives, not procedures, require responsibility, accountability. Describe task clearly. Provide a "how to test" example.


The following rules can be used to determine delegate specific task or not:



  1. Will he/she do it better or quickly than you? If yes, no doubt, delegate it.
  2. Will you commit a task to somebody if you have more important tasks to do? If yes, delegate it.
  3. Is available person can complete a work without your assistance when you are out of office? If yes, delegate it.
  4. Of course, you can even delegate your work to your boss, but do not abuse.

In multi project environment work of the whole team can not be distributed equally to every member. Someone will have to do more and someone less. Using Goldrat's Theory of Constraints, project can not be completed until the slowest member completed his work. Thus delegation must be used inside a team, not only from manager to developer. This process can only be effective in teams with honest and open communication, like in XP teams.


Perfect is not better than good


When writing a code, for example, it is more important to finish in time than worry about naming a variable or perfect design. Get the job done and you can refactor later. On the other hand pure code will lead to many problems later and unnecessary time spending for fixing and debugging. Thus, try to find the happy medium. Unit tests are really helpful here. They allow worrying about perfect code less and will simplify refactoring.


Split difficult tasks in bite-sized pieces


People usually avoid difficult tasks. Break them down into small steps. Complete manageable chunks and soon you will notice what problem resolved. Very helpful approach is adding "how to test" to each task. This will setup a micro goal and will allow determining task completion. Of course, if these tests can be automated this will reduce time on repeating tests.


Identify your time wasters


Usually we deal with people around us. This can be our colleagues, friends or kin. They are bothering or gladdening you in various ways. They can contact you directly or via phone, instant messaging or email. This leads to interruptions as well as time spending. Interruption of 6 - 9 minutes will take additional 4 - 5 minutes of recovering. Five interruptions will shoot an hour. You must reduce frequency and length of interruptions. But you can't firewalling yourself or ignore others. For example ignoring wife phone calls will be over really badly for you ;) The only way to reduce such time spending is investigation of repeatable time wasters. After you know the whole picture you can decide where to save and where you can't save. Be sure, your boss is not a time waster in any case.


Plan times for relaxation and recreation


Keith Frayn, professor of human metabolism at Oxford University, told TV Plus: "Any normal person could survive for up to 60 days without food on just water." Without sleep people can break much quickly. In 1964 high school student Randy Gardner attempted to break the Guinness Book of World Records for the longest time awake - 260 hours. Stanley Coren describes the day-by-day impact on Randy in the book Sleep Thieves, as documented by John Ross of the US Navy Medical europsychiatric Research Unit in San Diego. Randy had trouble focusing his eyes on day 2, hallucinations on day 4, and slurred speech and a short attention span by the last day.


Do not expect high productivity if you tired. Sleep recharges our brains and helps us think more clearly. Plan your day adequately, do not save on sleeping.


Developers are usually sitting 8 hours a day and more in a work place near computer. This leads to emotional and physical diseases as well. One of our exposed organs is eyes. Looking to the monitor for a long time, even expensive one, will ruin our eyesight. To reduce pernicious influence for our eyes there are many techniques of eyes training. Type "training eyes" in google and find suitable training for you. Schedule it daily, just before a dinner, or to any other convenient time.


Do not hesitate to ask friends or colleagues for helpful advice


Almost every IT project have risks, they can be hidden or visible at the beginning. Developers have to resolve them. Working on any, even small risk, can take days even weeks. To avoid these time spending just ask advice or help from your colleagues or friends. I have many examples of how this rule reduced time on difficult tasks and prevented project failure. An example from my practice: customers of our recent project required extra safety of application from possible cracks. One part of protection was downloading a component from a server and loading this DLL to the application without writing on a disk. After two hours of research I didn't find any useful information. I paused for a minute and tried to recall who can help me with it. I asked a friend who worked as developer in another company and he helped me. He sent me a link to a tutorial I'm looking for. Problem was resolved.


Reward yourself


We are all expecting a reward or praise for completed work. Lack of reward will kill our desire to work what leads to reduced productivity. This is why we prefer working for others than doing something for ourselves. Promise yourself a reward for completing each task or finishing the total job. For example let yourself watch an interesting movie when you finish developing page or new feature.


Conclusion


This list of time management tips is just a starting point to the new improved life. Leading by these principles from day to day will show a way to successful career, robust health and welfare.


My university teacher always told me, what every detail is important. In most cases if we did not achieve something, this happens due a little, but important thing, what we forgot or skipped. Help yourself reaching your dreams. Avoid chaotic motion, plan and manage your life time.



2005年9月1日

Anaphora Resolution Research Friend

When it was 21:40, my msn flickered. I found one of my research friends send me a salutation. He was a Ph.D candidate of Beijing Normal University.

I phoned him. He would graduate next year on anaphora resolution research. Now he needed some little help. He invited me to Beijing Normal University someday. We would discuss some topics on anaphora resolution. He said he had the book Anaphora Resolution which had been written by Mitkov. I knew it was very good for our research. He said he could let to me.

Yeah. When I discussed some questions on anaphora resolution with my research friends, I felt very happy. I knew I had loved this topic. I could do more on it.