login::  password::




cwbe coordinatez:
101
63532
683817
683643
683657
5122430
7364388

ABSOLUT
KYBERIA
permissions
you: r,
system: public
net: yes

neurons

stats|by_visit|by_K
source
tiamat
K|my_K|given_K
last
commanders
polls

total descendants::1
total children::1
1 K

show[ 2 | 3] flat


Edit distance is not the only approach how distance between two character sequences can be calculated. Strings can be also compared in somewhat subtler geometric ways. A procedure inspired by Random Indexing can attribute an D-dimensional geometric coordinate to any character N-gram present in the corpus and can subsequently represent the word as a sum of N-gram fragments which the string contains. Thus, any word can be described as a point in a dense N-dimensional space and the calculation of their distance can be realized by applying traditional Euclidean measures. Strong correlation exists, within the Keats Hyperion corpus, between such cosine measure and Levenshtein distance. Overlaps between the centroid of Levenshtein distance matrix space and centroids of vectors spaces generated by Random Projection were also observed. Contrary to standard non-random “sparse” method of measuring cosine distances between two strings, the method based on Random Projection tends to naturally promote not the shortest but rather longer strings. The geometric approach yields finer output range than Levenshtein distance and the retrieval of the nearest neighbor of text’s centroid could have, due to limited dimensionality of Randomly Projected space, smaller complexity than other vector methods.

Published in Proceedings of Student Workshop of RANLP 2013 (Hissar, Bulgaria) conference.





0000010100063532006838170068364300683657051224300736438807399983
Prospero[Locked_OUT]
 Prospero[Locked_OUT]      19.11.2013 - 22:27:21 , level: 1, UP   NEW
http://aclweb.org/anthology//R/R13/R13-2012.pdf‎