Analyzing Linguistic Structure

NTT Systems did some exploratory research for Dr. Martin Taylor at DCIEM, into the Finch-Chater algorithm, which uses a neural net algorithm, along with some more traditional statistical analysis, to group words into linguistic categories.

The goal is to eventually use something similar in a hypertext system (such as the World Wide Web) to find whole paragraphs, sections or pages that are semantically related. A user might request, for instance, to see sections of text with a similar argumentative style.

So far, we have duplicated Finch-Chater's results, and performed some initial experiments at the next level (replacing words with the word categories determined by the first level). The C programming language was used.

Following is an example of some word categories from a standard Finch-Chater word-level analysis. Words side-by-side in this listing were also very close together in the original Finch-Chater analysis:

the	it	all	god	none	position	bug
my	this	both	jesus	lots	relationship	feature
your	me		saddam		behavior
their	them		bush		price
his	him		israel		solution
our	her		iraq		method
its	us		india		tool
a	myself		america		option
an	yourself	china		word
any	themselves	japan
some	itself 		kuwait
several	himself		europe
another			canada
every			taiwan
these			death
those			sex
such			abortion
each			freedom
no			peace
many			birth
most			democracy
certain
one's
god's


Other Neural Net projects   Other C/C++ projects
Please visit our home page: NTT Systems Inc.