A5 - Automatic classification of concept types

The goal of the project is the development of an automatic classification of types of nouns in text corpora by statistical methods. The classification exploits the fact that nouns of the relevant types (sortal, individual, relational and functional) differ in the grammatical contexts in which they occur. This is due to their semantic properties of inherent uniqueness (individual and functional concepts) and inherent relationality (relational and functional concepts). While relational nouns are much more frequently used with a possessor specification, inherently unique nouns occur significantly more often with definite determination and in the singular. These uses can be automatically recognized by computational methods requiring a certain degree of parsing, but no semantic analysis. Distributional differences between the four types allow the classification of a given noun, if it occurs with sufficient frequency. Refinement of the contextual criteria, a growing lexicon with relevant information, and advances in the statistical procedure itself will allow for a reduction of the number of occurrences needed for classification. Part of the research to be pursued in the project is the development of appropriate tagging software, a combination of existing tools with supplementary programming. This work is supported by a theoretical analysis of the contexts and constructions in which different types of nouns occur (a) if they are used in accordance with their underlying type and (b) if they undergo certain kinds of type shifts. The analyses of type shifts are important since these occur quite frequently and tend to blur the distributional characteristics of the nouns being classified. The object language is German, but later the project goals will be extended to French and English.