NTT Systems did some exploratory research for Dr. Martin Taylor at DCIEM, into the Finch-Chater algorithm, which uses a neural net algorithm, along with some more traditional statistical analysis, to group words into linguistic categories.
The goal is to eventually use something similar in a hypertext system (such as the World Wide Web) to find whole paragraphs, sections or pages that are semantically related. A user might request, for instance, to see sections of text with a similar argumentative style.
So far, we have duplicated Finch-Chater's results, and performed some initial experiments at the next level (replacing words with the word categories determined by the first level). The C programming language was used.
Following is an example of some word categories from a standard Finch-Chater word-level analysis. Words side-by-side in this listing were also very close together in the original Finch-Chater analysis:
the it all god none position bug my this both jesus lots relationship feature your me saddam behavior their them bush price his him israel solution our her iraq method its us india tool a myself america option an yourself china word any themselves japan some itself kuwait several himself europe another canada every taiwan these death those sex such abortion each freedom no peace many birth most democracy certain one's god's