login::  password::




cwbe coordinatez:
101
63532
683817
683643
683657
5122430
7726364

ABSOLUT
KYBERIA
permissions
you: r,
system: public
net: yes

neurons

stats|by_visit|by_K
source
tiamat
K|my_K|given_K
last
commanders
polls

total descendants::1
total children::1
show[ 2 | 3] flat


We introduce a novel method for transformation of texts into short binary vectors which can be subsequently compared by means of Hamming distance measurement. Similary to other semantic hashing approaches, the objective is to perform radical dimensionality reduction by putting texts with similar meaning into same or similar buckets while putting the texts with dissimilar meaning into different and distant buckets. First, the method transforms the texts into complete TF-IDF, than implements Reflective Random Indexing in order to fold both term and document spaces into low-dimensional space. Subsequently, every dimension of the resulting low-dimensional space is simply thresholded along its 50th percentile so that every individual bit of resulting hash shall cut the whole input dataset into two equally cardinal subsets. Without implementing any parameter-tuning training phase whatsoever, the method attains, especially in the high-precision/low-recall region of 20newsgroups text classification task, results which are comparable to those obtained by much more complex deep learning techniques.

Keywords: Random Indexing, unsupervised Locality Sensitive Hashing, Dimensionality Reduction, Hamming Distance, Nearest-Neighbor Search

BibTex Citation:

@inproceedings{hromada2014empiric,
title={Empiric Introduction to Light Stochastic Binarization},
author={Hromada, Daniel Devatman},
booktitle={Text, Speech and Dialogue},
pages={37--45},
year={2014},
organization={Springer}
}





0000010100063532006838170068364300683657051224300772636407726367
Prospero
 Prospero      10.10.2014 - 09:25:36 , level: 1, UP   NEW
http://wizzion.com/papers/2014/TSD-stochastic-binarization.pdf