I'm interested in designing new distributed and parallel algorithms, the distributed processing of big data, achieving fault-tolerance in networks, and secure distributed computing in dynamic environments such as peer-to-peer networks and mobile ad-hoc networks.

News

Publications tagged with "Machine Learning" (Show all)

2006
  • Log File Processing by Machine Learning and Information Extraction
    Peter Robinson. Master Thesis. TU Vienna, Institute of Computer Languages, 2006. Nominated for Distinguished Young Alumnus Award.
    Abstract...
    In today's computer network systems lots of events are constantly written to log files. Unfortunately there is no common standard defining the structure of these event messages which are partly in human readable natural language form. Obviously, this lack of structure makes automatic processing a lot more difficult. This master thesis describes the architecture and implementation of the LoP-System, a system that attempts to create machine readable event structures from ordinary log file events by natural language processing. The thesis explains implementational details as well as the theoretical concepts used. The core of the system consists of a series of cascaded but independent components, partly enhanced with machine learning techniques. The raw input is first processed by a simple recursive descent parser which recognizes syntactical features (e.g. IP addresses) and is then passed on to a part-of-speech tagger based on a hidden Markov model. Applying regular expression patterns to the tagged words is used to combine them to basic word groups (e.g. noun groups), which are subsequently semantically analyzed. The final step is the construction of the output events by a rule based event constructor. All components are implemented in Haskell, a purely functional programming language. Some of the components developed during this thesis, especially the part-of-speech tagger, are general natural language processing tools and can be applied to other domains.

Code

I'm interested in parallel and distributed programming and related technologies such as software transactional memory. Below is a (non-comprehensive) list of software that I have written.
  • I extended Cabal, for using a "world" file to keep track of installed packages. (Now part of the main distribution.)
  • data dispersal: an implementation of an (m,n)-threshold information dispersal scheme that is space-optimal.
  • secret sharing: an implementation of a secret sharing scheme that provides information-theoretic security.
  • dice-entropy: a library that provides cryptographically secure dice rolls implemented by bit-efficient rejection sampling.
  • TSkipList: a data structure with range-query support for software transactional memory.
  • stm-io-hooks: An extension of Haskell's Software Transactional Memory (STM) monad with commit and retry IO hooks.
  • Mathgenealogy: Visualize your (academic) genealogy! A program for extracting data from the Mathematics Genealogy project.
  • In my master thesis I developed a system for automatically constructing events out of log files produced by various system programs. One of the core components of my work was a part-of-speech (POS) tagger, which assigns word classes (e.g. noun, verb) to the previously parsed tokens of the log file. To cope with noisy input data, I modeled the POS tagger as a hidden Markov model. I developed (and proved the correctness of) a variant of the maximum likelihood estimation algorithm for training the Markov model and smoothing the state transition distributions.

Misc

  • Conferences that I attended so far: PODC 2008 (Toronto, Canada); SSS 2008 (Detroit, USA); OPODIS 2009 (Nimes, France); ALGOSENSORS 2010 (Bordeaux, France); DISC 2010; (Boston, USA) IPDPS 2011 (Anchorage, USA); FOMC 2011 (San Jose, USA); SODA 2012 (Kyoto, Japan); SIROCCO 2012 (Reykjavik, Iceland); ICDCN 2013 (Mumbai, India); ICALP 2013 (Riga, Latvia); SPAA 2013 (Montreal, Canada); PODC 2013 (Montreal, Canada); Shonan Workshop (Shonan Village, Japan); DISC 2015 (Tokyo, Japan); ICDCN 2016 (Singapore); SPAA 2016 (Monterey, California); DISC 2016 (Paris, France).
  • Program committee membership: BGP 2017, ICDCN 2016, SPAA 2016, SIROCCO 2016, ICDCN 2015, SIROCCO 2014, FOMC 2014
  • DBLP entry.
  • Google Scholar profile.
  • Profile on StackExchange.