node viewed 4226 times
Autor/Author: Daniel Hromada
Nazov/Title: Quantitative intercultural comparison by means of parallel pageranking of diverse national wikipedias
Velkost/size: 218 KiB
Popis/Comment: The aim of our study was to show that distributions of hyperlinks within wikipedia corpora implicitly contain information about cultural preferences of its authors. We have transformed wikipedia corpora written in 27 different languages into graph structures whose vertices correspond to wikipedia articles and edges to hyperlinks between these articles. Afterwards we have calculated PageRank vectors for every one of these graphs, thus obtaining so-called “intracultural importance list” for every linguistic community under study. Two datamining experiments were performed with obtained data: “the top country” study indicated that labels of articles concerning countries, related to linguistic community that created these articles are to be found in the top parts of their respective intracultural lists and inversely that the top parts of these lists can be potentially used as a stylometric method of identification of the community which created the corpus. “The world&corpus” study revealed that majority of rankings of articles concerning the countries of reference within intracultural list of a given community significantly correlates with a factual geographic distance between the country of reference and a supposed home country of a linguistic community. Both experiments have indicated presence of morphism between wikipedia hyperlink graph and a factual world of its authors.
download here: Daniel Hromada - Quantitative intercultural comparison by means of parallel pageranking of diverse national wikipedias