We present two axiomatic and three conjectural conditions which a model inducing natural language categories should dispose of, if ever it aims to be considered as “cognitively plausible”. 1st axiomatic condition is that the model should involve a bootstrapping component. 2nd axiomatic condition is that it should be data-driven. 1st conjectural condition demands that the model integrates the surface features – related to prosody, phonology and morphology – somewhat more intensively than is the case in existing Markov-inspired models. 2nd conjectural condition demands that asides integrating symbolic and connectionist aspects, the model under question should exploit the global geometric and topologic properties of vector-spaces upon which it operates. At last we shall argue that model should facilitate qualitative evaluation, for example in form of a POS-i restricted Turing Test. In order to support our claims, we shall present a POS-induction model based on trivial k-way clustering of vectors representing suffixal and co-occurrence information present in parts of Multext-East corpus. Even in very initial stages of its development, the model succeeds to outperform some more complex probabilistic POS-induction models for lesser computational cost.
category construction, part-of-speech induction, surface features, vector spaces, bootstrapping, geometry of thought, categorization-oriented Turing Test, partitioning of grammatical feature space, K-means clustering, cognitive plausibility
Published in Proceedings of 15th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), July 2014, Montpellier, France