I'm interested in designing new distributed and parallel algorithms, the distributed processing of big data, achieving fault-tolerance in networks, and secure distributed computing in dynamic environments such as peer-to-peer networks and mobile ad-hoc networks.


Publications tagged with "Big Data" (Show all)

  • Tight Bounds for Distributed Graph Computations PDF
    Gopal Pandurangan, Peter Robinson, Michele Scquizzato. (under review)
  • Fast Distributed Algorithms for Connectivity and MST in Large Graphs DOI
    Gopal Pandurangan, Peter Robinson, Michele Scquizzato. 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2016).
    Motivated by the increasing need to understand the algorithmic foundations of distributed large-scale graph computations, we study a number of fundamental graph problems in a message-passing model for distributed computing where $k \geq 2$ machines jointly perform computations on graphs with $n$ nodes (typically, $n \gg k$). The input graph is assumed to be initially randomly partitioned among the $k$ machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main result is an (almost) optimal distributed randomized algorithm for graph connectivity. Our algorithm runs in $\tilde{O}(n/k^2)$ rounds ($\tilde{O}$ notation hides a $\text{polylog}(n)$ factor and an additive $\text{polylog}(n)$ term). This improves over the best previously known bound of $\tilde{O}(n/k)$ [Klauck et al., SODA 2015], and is optimal (up to a polylogarithmic factor) in view of an existing lower bound of $\tilde{\Omega}(n/k^2)$. Our improved algorithm uses a bunch of techniques, including linear graph sketching, that prove useful in the design of efficient distributed graph algorithms. We then present fast randomized algorithms for computing minimum spanning trees, (approximate) min-cuts, and for many graph verification problems. All these algorithms take $\tilde{O}(n/k^2)$ rounds, and are optimal up to polylogarithmic factors. We also show an almost matching lower bound of $\tilde{\Omega}(n/k^2)$ for many graph verification problems using lower bounds in random-partition communication complexity.
  • Distributed Computation of Large-scale Graph Problems PDF DOI
    Hartmut Klauck, Danupon Nanongkai, Gopal Pandurangan, Peter Robinson. 26th ACM-SIAM Symposium on Discrete Algorithms (SODA 2015).
    Motivated by the increasing need for fast distributed processing of large-scale graphs such as the Web graph and various social networks, we study a number of fundamental graph problems in the message-passing model, where we have $k$ machines that jointly perform computation on an arbitrary $n$-node (typically, $n \gg k$) input graph. The graph is assumed to be randomly partitioned among the $k \geq 2$ machines (a common implementation in many real world systems). The communication is point-to-point, and the goal is to minimize the time complexity, i.e., the number of communication rounds, of solving various fundamental graph problems. We present lower bounds that quantify the fundamental time limitations of distributively solving graph problems. We first show a lower bound of $\Omega(n/k)$ rounds for computing a spanning tree (ST) of the input graph. This result also implies the same bound for other fundamental problems such as computing a minimum spanning tree (MST), breadth-first tree (BFS), and shortest paths tree (SPT). We also show an $\Omega(n/k^2)$ lower bound for connectivity, ST verification and other related problems. Our lower bounds develop and use new bounds in random-partition communication complexity. To complement our lower bounds, we also give algorithms for various fundamental graph problems, e.g., PageRank, MST, connectivity, ST verification, shortest paths, cuts, spanners, covering problems, densest subgraph, subgraph isomorphism, finding triangles, etc. We show that problems such as PageRank, MST, connectivity, and graph covering can be solved in $\tilde{O}(n/k)$ time (the notation $\tilde O$ hides $\text{polylog}(n)$ factors and an additive $\text{polylog}(n)$ term); this shows that one can achieve almost linear (in $k$) speedup, whereas for shortest paths, we present algorithms that run in $\tilde{O}(n/\sqrt{k})$ time (for $(1+\epsilon)$-factor approximation) and in $\tilde{O}(n/k)$ time (for $O(\log n)$-factor approximation) respectively. Our results are a step towards understanding the complexity of distributively solving large-scale graph problems.


I'm interested in parallel and distributed programming and related technologies such as software transactional memory. Below is a (non-comprehensive) list of software that I have written.
  • I extended Cabal, for using a "world" file to keep track of installed packages. (Now part of the main distribution.)
  • data dispersal: an implementation of an (m,n)-threshold information dispersal scheme that is space-optimal.
  • secret sharing: an implementation of a secret sharing scheme that provides information-theoretic security.
  • dice-entropy: a library that provides cryptographically secure dice rolls implemented by bit-efficient rejection sampling.
  • TSkipList: a data structure with range-query support for software transactional memory.
  • stm-io-hooks: An extension of Haskell's Software Transactional Memory (STM) monad with commit and retry IO hooks.
  • Mathgenealogy: Visualize your (academic) genealogy! A program for extracting data from the Mathematics Genealogy project.
  • In my master thesis I developed a system for automatically constructing events out of log files produced by various system programs. One of the core components of my work was a part-of-speech (POS) tagger, which assigns word classes (e.g. noun, verb) to the previously parsed tokens of the log file. To cope with noisy input data, I modeled the POS tagger as a hidden Markov model. I developed (and proved the correctness of) a variant of the maximum likelihood estimation algorithm for training the Markov model and smoothing the state transition distributions.


  • Conferences that I attended so far: PODC 2008 (Toronto, Canada); SSS 2008 (Detroit, USA); OPODIS 2009 (Nimes, France); ALGOSENSORS 2010 (Bordeaux, France); DISC 2010; (Boston, USA) IPDPS 2011 (Anchorage, USA); FOMC 2011 (San Jose, USA); SODA 2012 (Kyoto, Japan); SIROCCO 2012 (Reykjavik, Iceland); ICDCN 2013 (Mumbai, India); ICALP 2013 (Riga, Latvia); SPAA 2013 (Montreal, Canada); PODC 2013 (Montreal, Canada); Shonan Workshop (Shonan Village, Japan); DISC 2015 (Tokyo, Japan); ICDCN 2016 (Singapore); SPAA 2016 (Monterey, California); DISC 2016 (Paris, France).
  • Program committee membership: BGP 2017, ICDCN 2016, SPAA 2016, SIROCCO 2016, ICDCN 2015, SIROCCO 2014, FOMC 2014
  • DBLP entry.
  • Google Scholar profile.
  • Profile on StackExchange.