Chinese Article Evaluation Tool

Chinese Article Evaluation Tool will evalute your article by counting number of unique Chinese characters in the article, how many of them is not in the first 500 character list (cover 72.1% usages in classical and modern Chinese texts so to learn Chinese effectively, we’d better first learn those characters), thus to determine if this is an easy article for beginers or not. It is a good Chinese Character Counting Tool too.

The stripped text with puctuation marks removed can be used as materials to have students practice Chinese style punctuation marks.

Version 0.2 may include unspecified updates, enhancements, or bug fixes. More anhancements are coming soon by adding word counting and level determining engine so it makes it easier for teachers/tutors to select appropriate reading materials for their students, and to evaluate how well their students perform in their Chinese writing. The output format will be refined once I have a little bit more time.
Try it at www.dengsoft.com/pub/xuezhongwen/evaluate.php?enc=gb2312

———————————————–
For more information, visit http://china.sytes.net/forums

Chinese Article Evaluation Tool

Chinese Article Evaluation Tool will evalute your article by counting number of unique Chinese characters in the article, how many of them is not in the first 500 character list (cover 72.1% usages in classical and modern Chinese texts so to learn Chinese effectively, we’d better first learn those characters), thus to determine if this is an easy article for beginers or not. It is a good Chinese Character Counting Tool too. The stripped text with puctuation marks removed can be used as materials to have students practice Chinese style punctuation marks.

Version 0.2 may include unspecified updates, enhancements, or bug fixes. More anhancements are coming soon by adding word counting and level determining engine so it makes it easier for teachers/tutors to select appropriate reading materials for their students, and to evaluate how well their students perform in their Chinese writing. The output format will be refined once I have a little bit more time.
Try it at www.dengsoft.com/pub/xuezhongwen/evaluate.php?enc=gb2312

Wireless Setup

Setting up network can be very tricky. One missing step can take you a sleepless night to pull your hairs. Here is what I was. I havv set up wireless access point several times and recently I thought setting up another PC will be just a snap. However I forgot to add that one in my assess point’s access list and hours and hours I just got fainted. 🙂

Here are some hints to set up another PC into your wireless network:

1. Add that PC in your router’s access list. Remmber this is the MAC addess oy sometimes called physical address, not IP address. Sometimes the “properties” show something like 1a.20.1d.2a.32.a2. Make sure to change the . to :.

2. Copy ONLY the HEX key to your wireless adapter’s configuration (if this is a secured network and I bet yours surely is), NOT the passphrase, at least for Windows XP. Otherwise you’ll have difficult time to connect.

Footer

Next Research

Wanted to find out AJAX technology (async Javascript and XML), uniserver and easyPHP, and PHProxy and see how they work and fit.

Footer

中国计算语言学综述 – 资料汇编(未完成稿)

人文背景:

上海师范大学应用语言学研究所

教育部语言文字应用研究所计算语言学研究室

中国社科院语言所应用语言学研究室

国家语言文字工作委员会语言文字应用研究所,计算语言学研究室

北大计算语言学所

南师大语言科技系

IBM、微软、富士通、东芝、TRS、哈工大惠通

中国社会科学院民族学与人类学研究所语音学与实验语言学研究室

中国民族语言研究中心及语音学与计算语言学重点实验室

纯理工背景:

哈工大计算机学院 (李生)

上海交通大学计算语言学虚拟信息中心

教学:

Shanghai Normal University

会议:

汉语字本位理论专题研讨会

全国计算语言学联合学术会议

全国学生计算语言学研讨会(SWCL)

学会:

中国中文信息学会

中华民国计算语言学学会

http://www.aclclp.org.tw/index_c.php

期刊:

《中文信息学报》

人物:

陆汝占

詹卫东的[语言学光标]个人主页

俞士汶

冯志伟,1939年4月15日生,云南省昆明市人,1957年考入北京大学地球化学专业本科学习,1959年转入北京大学中文系汉语专业本科学习,1964年考入北京大学中文系语言学专业读研究生,1967年毕业,改行到昆明五中当物理教员,1978年考入中国科学技术大学研究生院信息科学系机器?shy;译专业学习,接着被选派到法国格勒诺布尔理科医科大学应用数学研究所(IMAG)自动?shy;译中心(CETA)师从法国著名数学家、国际计算语言学委员会主席沃古瓦(B.Vauquois)教授专门研究数理语言学和机器?shy;译问题。1981年回国,在中国科学技术情报研究所计算中心担任机器?shy;译研究组组长,1985年调入国家语言文字工作委员会语言文字应用研究所,担任计算语言学研究室主任。1986-1988年在德国夫琅禾费研究院(FhG)新信息技术与通信系统研究所担任客座研究员,1990-1993年在德国特里尔大学担任客座教授,1996年在德国康斯坦茨高等技术学院国际术语学和应用语言学中心(CiTaL)担任技术顾问。现为语言文字应用研究所研究员、博士生导师(与北京广播学院联合建立博士点)。1998年5月退休。1999年10月-2000年8月再次在德国特里尔大学担任客座教授。2000年8月-2001年8月担任桑夏自然语言处理研究院高级研究员。2001年9月到现在担任韩国科学技术大学(KAIST)电子工程与计算机科学系(EECS)教授。

http://www.china-language.gov.cn/jgsz/jss/images/feng/feng.htm

计算语言学文献选录

陈力为,袁琦主编.《中文信息处理应用平台工程》,北京:电子工业出版1995.

俞士汶. 关于现代汉语词语的语法功能分类.  
张普. 论语义场. 又见:<<机器?shy;译研究进展>>,电子工业出版社,1992年8月.
张普. 信息处理用现代汉语语义分析的理论与方法. 又见:《中文信息学报》,1991年第3期Vol.5,-No.3
陈群秀,张普. 信息处理用现代汉语语义分类体系:属性分类.  
陈群秀,张普. 信息处理有现代汉语语义词典支撑环境的初步构想.  
陈群秀. 有关语义分类体系研究的几个问题.  
鲁川. 现代汉语的语义网络.

http://www.hackchi.com/hnc/papers/compulin/paperml.htm)

计算语言学和自然语言信息处理研究和应用综述(http://www.cass.net.cn/chinese/s18_yys/yingyong/courses/nlpbase.htm)

俞士汶《计算语言学论文集(4)》(北京大学计算语言研究所)

Web server sometimes failed to transfer files – a potentail MTU issue

I was once puzzled several months by my web server with an issue. It SOMETIMES can not load even a mid-sized files although the small files can work pretty well. It is so frustrating as some of my friends can read my files while others claim they can not.

It takes me long time to figure out why. Lots of research and experiments. I exposed my web server outside of my filewall, reinstalled web server, reset modem/router, … none of them seems to help. All of a suden one day, I figured out it is caused by an MTU issue. It should be 1492 for DSL in router settings. After I reset MTU value correctly in router, server works perfect.
If your web server sometimes work and sometimes not, it is more likely an MTU issue.

I’ll detail this later, if I have more time and if I can still remember all the details.

Footer

MediaWiki – multiple installations

Back to the multiple installtions of wiki. Here is what I did.
In installing the second wiki, lets say wiki2. I symbol linked everything from the first installation, except file LocalSettings.php and directory config. I Created it’s own config directory. Then installed the second mediawiki using different database.

The benefits are you do not need to keep many shared files/directories.
Another insteresting thing when playing with mediawiki. How about just two installations sharing same database. NO PROBLEM. You gained a lot by sharing many common files/directories as well as database. I checked that mediawiki databse diffentiates the data from two places pretty well using unique Ids so that if you browse one wiki it will not show up wiki you configured in another place. Nice feature.
However, if your wiki tends to be really big, you may choose to install it in a separate database though.

Footer

Community building: a wiki or a forum?

If you are installing midiawiki the first time, you may end up installing several just like me. The rich features of mediawiki impressed me and I decided to move some of my forums to mediawiki. Why? You may ask? I think wiki is promoting a more interactive community than forums. How many of you are fed up with a long long forums posts – you serached through it trying to find a final answser. You may just jump into the last several posts, but they are coments like “Thank you”, “That really helps”, “Please check another thread at bla bla bla”, etc. You just wanted to get the final answer to this issue in the thread but was overwelmed by many long unrelated posts. In wiki, you are always presnted with the final anser from the community, all the changes are kept in the “History” if you are insterested. Neat.

In forums, you may easily read who said what, when. In wiki, you always see the most recent version of the current discussion, the “who” and “when” and even “what” are located in “history” section. (“history” kept all versions of the pages)

Footer

MediaWiki Installation Tricks regarding MySql database versions

When I installed mediawiki version 1.5 the first time and was prompted for database features, I used the default option – backward compatibility. _ I later guess that it used features of MySql prior to version 4.1. That caused the problem:

First, after installation, you get SQL error 1271 when hitting pages like “Recent Changes”. I later looked into the codes and found the issue. Basically, if you have the query like this when your default character set of MySql is utf-8 you’ll get this SQL error.

select * from some_table where some_col =’test’

The reason is, some_col is utf-8 collation, while string literal ‘test’ was treated as collation latin1. So the comparison failed, prior to MySql version 4.

I got around the pronblem by inserting something like this in includes/Database.php in mediawiki installation source codes:

select * from some_table where convert(some_col using latin1) =’test’

SQL error disappeared. However, after that, I had the second problem. My link and categories all show red, regradless if the linked pages/categories were defined or not.

I suspect that is the same issue with the first as comparison failed due to collation conflicts. I ddin’t want to spend too much on it so I want ahaed to reinstall it and this time I chosed database MySql version 4/5 (NOT backward compatiable).

All issues were resolved and mediawiki worked like a charm.

Footer