Skip to content
Distributed Stream Processing
Branch: master
Clone or download
Latest commit 096cdc5 Jun 13, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.ci-dockerfiles/ci-standard Update CI docker and circleci config May 22, 2019
.circleci Update CI docker and circleci config May 22, 2019
.for_maintainers Update Documentation UI (#2744) Dec 21, 2018
.github Update PR template per Seans suggestion Oct 10, 2018
.release Update release bootstrap.sh for ponyc 0.28.0 Apr 8, 2019
book/getting-started Resolve merge conflicts between release-0.6.1 and release Dec 31, 2018
connectors Fix the use of socket server in connectors example (#2754) Dec 27, 2018
demos Fix worker commands and update README for demo (#2882) May 6, 2019
docker Update Docker image to support py3 via virtualenv Oct 24, 2018
docs/proposals Update ConnectorFramedSourceNotify to always require the ConnectorSou… Apr 3, 2019
documentation Add `--ponypin` to any pony command using `--ponypinasio` Apr 8, 2019
examples Log rotation - new tests + validations (#2885) May 8, 2019
giles Deprecate giles receiver in favor of data receiver (#2341) Oct 3, 2018
lib Ensure producers register as producer on construction Jun 13, 2019
machida Update test harness to pause ALO sources Apr 14, 2019
machida3 Code updates for Ponyc 0.27.0/0.28.0 Apr 8, 2019
misc Update default Wallaroo Up version to latest Apr 8, 2019
monitoring_hub pin phoenix js until path issue is resolved Jun 12, 2019
orchestration Terraform cluster name fix: . breaks aws.py inventory script; move hy… Apr 30, 2019
testing WIP Jun 13, 2019
travis Fix broken metrics UI AppImage creation during CI Apr 1, 2019
utils Update apps for new API Nov 3, 2018
vagrant Update version for 0.6.1 release Dec 31, 2018
.gitignore Log rotation - new tests + validations (#2885) May 8, 2019
.gitmodules Remove unused ananke git submodule Jan 23, 2019
.travis.yml Update travis CI distro Apr 8, 2019
CHANGELOG.md Add unreleased section to CHANGELOG Dec 31, 2018
CODE_OF_CONDUCT.md Fix formatting in COC Sep 7, 2017
CONTRIBUTING.md Update CONTRIBUTING.md Nov 29, 2018
CONTRIBUTORS Add latest contributors Nov 30, 2018
Dockerfile Update version for 0.6.1 release Dec 31, 2018
LICENSE Update license to Apache 2.0 (#2428) Sep 21, 2018
LIMITATIONS.md Update LIMITATIONS doc Sep 28, 2018
MONOREPO.md Update Documentation UI (#2744) Dec 21, 2018
Makefile Have `make test` run `make unexam` and `make integration-tests` Oct 2, 2017
README.md Place source_name as first arg for Configs Apr 3, 2019
ROADMAP.md Update docs to conform to new API Nov 15, 2018
SUPPORT.md Fix incorrect IRC link Nov 1, 2017
VERSION Update version for 0.6.1 release Dec 31, 2018
rules.mk Fix broken metrics UI AppImage creation during CI Apr 1, 2019
wallaroo-logo.png Bring the Wallaroo Logo file back to life. Apr 11, 2019

README.md

WallarooLabs logo

Build and scale real-time applications as easily as writing a script


CircleCI GitHub license GitHub version IRC Groups.io

A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler.

What is Wallaroo?

When we set out to build Wallaroo, we had several high-level goals in mind:

  • Create a dependable and resilient distributed computing framework
  • Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic
  • Provide high-performance & low-latency data processing
  • Be portable and deploy easily (i.e., run on-prem or any cloud)
  • Manage in-memory state for the application
  • Allow applications to scale as needed, even when they are live and up-and-running

You can learn more about Wallaroo from our "Hello Wallaroo!" blog post and the Wallaroo overview video.

What makes Wallaroo unique

Wallaroo is a little different than most stream processing tools. While most require the JVM, Wallaroo can be deployed as a separate binary. This means no more jar files. Wallaroo also isn't locked to just using Kafka as a source, use any source you like. Application logic can be written in Python 2, Python 3, or Pony.

Getting Started

Wallaroo can either be installed via Docker, Vagrant or (on Linux) via our handy Wallaroo Up command.

As easy as:

docker pull wallaroo-labs-docker-wallaroolabs.bintray.io/release/wallaroo:latest

Check out our installation options page to learn more.

Usage

Once you've installed Wallaroo, Take a look at some of our examples. A great place to start are our word_count or market spread examples in Python.

"""
This is a complete example application that receives lines of text and counts each word.
"""
import string
import struct
import wallaroo

def application_setup(args):
    in_name, in_host, in_port = wallaroo.tcp_parse_input_addrs(args)[0]
    out_host, out_port = wallaroo.tcp_parse_output_addrs(args)[0]

    lines = wallaroo.source("Split and Count",
                        wallaroo.TCPSourceConfig(in_name, in_host, in_port,
                            decode_line))
    pipeline = (lines
        .to(split)
        .key_by(extract_word)
        .to(count_word)
        .to_sink(wallaroo.TCPSinkConfig(out_host, out_port, 
            encode_word_count)))

    return wallaroo.build_application("Word Count Application", pipeline)

@wallaroo.computation_multi(name="split into words")
def split(data):
    punctuation = " !\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~"

    words = []

    for line in data.split("\n"):
        clean_line = line.lower().strip(punctuation)
        for word in clean_line.split(" "):
            clean_word = word.strip(punctuation)
            words.append(clean_word)

    return words

class WordTotal(object):
    count = 0

@wallaroo.state_computation(name="count word", state=WordTotal)
def count_word(word, word_total):
    word_total.count = word_total.count + 1
    return WordCount(word, word_total.count)

class WordCount(object):
    def __init__(self, word, count):
        self.word = word
        self.count = count

@wallaroo.key_extractor
def extract_word(word):
    return word

@wallaroo.decoder(header_length=4, length_fmt=">I")
def decode_line(bs):
    return bs.decode("utf-8")

@wallaroo.encoder
def encode_word_count(word_count):
    output = word_count.word + " => " + str(word_count.count) + "\n"
    return output.encode("utf-8")

Documentation

Are you the sort who just wants to get going? Dive right into our documentation then! It will get you up and running with Wallaroo.

More information is also on our blog. There you can find more insight into what we are working on and industry use-cases.

Wallaroo currently exists as a mono-repo. All the source that is Wallaroo is located in this repo. See application structure for more information.

Need Help?

Trying to figure out how to get started?

Contributing

We welcome contributions. Please see our Contribution Guide

For your pull request to be accepted you will need to accept our Contributor License Agreement

License

Wallaroo is licensed under the Apache version 2 license.

You can’t perform that action at this time.