2005年9月20日

Some experience on text processing

This morning, my Giza++ program had been down. After check, I found the two files for comparing had different number of lines. There were three lines missing in one file. Giza++ was robust. By its error file, I had known this. So one day when I write some toolkit, I should consider its robust error reporting ability.

There were some useful techniques on text processing. I just record them here and sharing with you.

1. Merging many files into one.
You can use the classical dos command COPY to do you. I first copied all the files which I wanted to merge into a folder. And then
"copy *.txt final.txt"
could manage this task. You also could use similar command, such as
"copy one_*.txt final.txt".

2. Compute the number of lines of a text file
Somebody told me dos had not any such ability. But under Linux, you could use command "wc". It could print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. I used this command under cygwin. I used "wc *.txt -l".

3. Batch processing
I had heard that console program had one advantage that you can use batch processing. I had not understood it clearly before these days.
Maybe you had similar experience of me. You had written a console program. There were many parameters you should type in dos interface. After you tried once, you should type same parameters again. It was boring. Batch processing meant you could list some command stream in a txt file named as xx.bat. You could config all the parameters into it. Then at each time, you can run the bat file directly. It was very convenient. I believed it was like macro in Ultraedit, office. It was very useful.
There were some related materials about batch processing.
Batch processing From Wikipedia
Batch processing command in detail

It's never too late to learn so.

To each text concerning researcher, Ultraedit was very powerful for us. It had many nice features, such as macro, regular matching, format transferring, and so on.

没有评论:

发表评论