I'm interested in designing new distributed and parallel algorithms, the distributed processing of big data, achieving fault-tolerance in networks, and secure distributed computing in dynamic environments such as peer-to-peer networks and mobile ad-hoc networks.


Publications tagged with "Distributed Storage" (Show all)

  • Search and Storage in Dynamic Peer-to-Peer Networks PDF DOI
    John Augustine, Anisur Molla, Ehab Morsy, Gopal Pandurangan, Peter Robinson, Eli Upfal. 25th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2013).
    We study robust and efficient distributed algorithms for searching, storing, and maintaining data in dynamic Peer-to-Peer (P2P) networks. P2P networks are highly dynamic networks that experience heavy node churn (i.e., nodes join and leave the network continuously over time). Our goal is to guarantee, despite high node churn rate, that a large number of nodes in the network can store, retrieve, and maintain a large number of data items. Our main contributions are fast randomized distributed algorithms that guarantee the above with high probability even under high adversarial churn. In particular, we present the following main results: \begin{enumerate} \item A randomized distributed search algorithm that with high probability guarantees that searches from as many as $n - o(n)$ nodes ($n$ is the stable network size) succeed in ${O}(\log n )$-rounds despite ${O}(n/\log^{1+\delta} n)$ churn, for any small constant $\delta > 0$, per round. We assume that the churn is controlled by an oblivious adversary (that has complete knowledge and control of what nodes join and leave and at what time and has unlimited computational power, but is oblivious to the random choices made by the algorithm). \item A storage and maintenance algorithm that guarantees, with high probability, data items can be efficiently stored (with only $\Theta(\log{n})$ copies of each data item) and maintained in a dynamic P2P network with churn rate up to ${O}(n/\log^{1+\delta} n)$ per round. Our search algorithm together with our storage and maintenance algorithm guarantees that as many as $n - o(n)$ nodes can efficiently store, maintain, and search even under ${O}(n/\log^{1+\delta} n)$ churn per round. Our algorithms require only polylogarithmic in $n$ bits to be processed and sent (per round) by each node. \end{enumerate} To the best of our knowledge, our algorithms are the first-known, fully-distributed storage and search algorithms that provably work under highly dynamic settings (i.e., high churn rates per step). Furthermore, they are localized (i.e., do not require any global topological knowledge) and scalable. A technical contribution of this paper, which may be of independent interest, is showing how random walks can be provably used to derive scalable distributed algorithms in dynamic networks with adversarial node churn.


I'm interested in parallel and distributed programming and related technologies such as software transactional memory. Below is a (non-comprehensive) list of software that I have written.
  • I extended Cabal, for using a "world" file to keep track of installed packages. (Now part of the main distribution.)
  • data dispersal: an implementation of an (m,n)-threshold information dispersal scheme that is space-optimal.
  • secret sharing: an implementation of a secret sharing scheme that provides information-theoretic security.
  • dice-entropy: a library that provides cryptographically secure dice rolls implemented by bit-efficient rejection sampling.
  • TSkipList: a data structure with range-query support for software transactional memory.
  • stm-io-hooks: An extension of Haskell's Software Transactional Memory (STM) monad with commit and retry IO hooks.
  • Mathgenealogy: Visualize your (academic) genealogy! A program for extracting data from the Mathematics Genealogy project.
  • In my master thesis I developed a system for automatically constructing events out of log files produced by various system programs. One of the core components of my work was a part-of-speech (POS) tagger, which assigns word classes (e.g. noun, verb) to the previously parsed tokens of the log file. To cope with noisy input data, I modeled the POS tagger as a hidden Markov model. I developed (and proved the correctness of) a variant of the maximum likelihood estimation algorithm for training the Markov model and smoothing the state transition distributions.


  • Conferences that I attended so far: PODC 2008 (Toronto, Canada); SSS 2008 (Detroit, USA); OPODIS 2009 (Nimes, France); ALGOSENSORS 2010 (Bordeaux, France); DISC 2010; (Boston, USA) IPDPS 2011 (Anchorage, USA); FOMC 2011 (San Jose, USA); SODA 2012 (Kyoto, Japan); SIROCCO 2012 (Reykjavik, Iceland); ICDCN 2013 (Mumbai, India); ICALP 2013 (Riga, Latvia); SPAA 2013 (Montreal, Canada); PODC 2013 (Montreal, Canada); Shonan Workshop (Shonan Village, Japan); DISC 2015 (Tokyo, Japan); ICDCN 2016 (Singapore); SPAA 2016 (Monterey, California); DISC 2016 (Paris, France).
  • Program committee membership: BGP 2017, ICDCN 2016, SPAA 2016, SIROCCO 2016, ICDCN 2015, SIROCCO 2014, FOMC 2014
  • DBLP entry.
  • Google Scholar profile.
  • Profile on StackExchange.