Parallel Retrieval of Dense Vectors in the Vector Space Model

keywords: Vector space model, symmetric multiprocessing, dense vector computations, message passing interface
Modern information retrieval systems use distributed and parallel algorithms to meet their operational requirements, and commonly operate on sparse vectors; but dimensionality-reducing techniques produce dense and relatively short feature vectors. Motivated by this relevance of dense vectors, we have parallelized the vector space model for dense matrices and vectors. Our algorithm uses a hybrid partitioning splitting documents and features and operates on a mesh of hosts holding a block partitioned corpus matrix. We show that the theoretic speed-up is optimal. The empirical evaluation of an MPI-based implementation reveals that we obtain a super-linear speed-up on a cluster using Nehalem Xeon CPUs.
mathematics subject classification 2000: 15-04, 65F99, 68P20
reference: Vol. 30, 2011, No. 2, pp. 247–265