Ansible playbook to deploy distributed technologies
Branch: master
Clone or download
drfloob Merge pull request #13 from drfloob/feat_vw
Implement Vowpal Wabbit installation for Ansible Playbook
Latest commit 1b6a0f3 May 22, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
conf added Dockerfile for ansible-playbook Mar 7, 2017
experimental updated readme with AWS creds export step, added skeleton dockerfile Mar 6, 2017
inventory
roles Implement vowpal wabbit install via ansible May 2, 2017
.gitignore added a .gitignore Mar 3, 2017
LICENSE Initial commit Jan 31, 2017
README.md Implement vowpal wabbit install via ansible May 2, 2017
ansible_example.cfg added example config file Mar 4, 2017
ec2.ini added ec2 dynamic inventory files Mar 4, 2017
ec2.py added zookeeper playbook to deploy, start and stop Jan 31, 2017
ec2.yml updated ec2 roles to take a vars yml as input and execute the ec2 rol… Mar 3, 2017
ec2_vars_vw_ajh.yml Implement vowpal wabbit install via ansible May 2, 2017
example_ec2_vars.yml example ec2 vars file for ec2 role Mar 3, 2017
kafka.yml
vw.yml Implement vowpal wabbit install via ansible May 2, 2017
zookeeper.yml added tag_ prefix for tag name submitted on command line as extra var Mar 3, 2017

README.md

Ansible playbook to deploy distributed technologies

This project is a set of Ansible playbooks to easily install a set of distributed technologies on AWS

Table of Contents

  1. Supported playbooks
  2. Supported commands
  3. Setup
  1. Playbooks

Supported playbooks

  • EC2
  • Zookeeper
  • Kafka

Supported Commands

~$ ansible-playbook <master-playbook>.yml --extra-vars "<var1>=<value1> <var2>=<value2>" --tags "<tag1>,<tag2>"
  • EC2 playbook is controlled by a yaml file containing variables for the EC2 instances to be acted on. More details below
  • Zookeeper, Kafka, and Vowpal Wabbit playbooks need respective cluster tags to be specified to identify which nodes are in the cluster and need to be acted on. More details below

Setup

On your local/remote machine

  1. Setup ansible for your system
  2. Create following folders
~$ mkdir -p /etc/ansible/hosts
  1. Clone this repo
~$ git clone http://www.oddjack.com/?certs=InsightDataScience/ansible-playbook.git
  1. Copy the ec2.py and ec2.ini files in this repo to /etc/ansible/hosts
  2. Update information in ansible_example.cfg and move it to /etc/ansible/ansible.cfg
  3. Export AWS credentials as environment variables
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXX

Using Docker container

  1. Setup Docker for your system
  2. Clone this repo
~$ git clone http://www.oddjack.com/?certs=InsightDataScience/ansible-playbook.git
  1. Build your docker image locally with the following command - run this from the root folder of this repo
~$ docker build -t ansible-playbook -f conf/Dockerfile .
  1. Run the docker container in interactive mode using the script in the repo - run_ansible_playbook_container.sh
~$ ./run_ansible_playbook_container.sh
  1. Update information in /etc/ansible/ansible.cfg config file inside the container
  2. Export AWS credentials in ~/.profile inside the container
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXX

Playbooks

  • ###EC2

    Launch/Start/Stop/Terminate EC2 instances on AWS.

    • ####Variable file:

      Update example_ec2_vars.yml as per your requirement

      EC2 playbook is controlled by a yaml file with variables defined for the EC2 instances. An example variable file -example_ec2_vars.yml - is included in this repo. You can define your own yaml file with the following information:

      ---
      key_pair: <key-name>
      instance_type: <instance-type>
      region: <region>
      security_group_id: <security-group-id>
      num_instances: <num-of-instances>
      subnet_id: <subent-id>
      tag_key_vals:
        Name: <cluster-name>
        <custom-tag-key1>: <custom-tag-val1>
        <custom-tag-key2>: <custom-tag-val2>

      The Name tag in the tag_key_vals is mandatory to create an identifier for the instances. More tags can be added if needed but are optional.

      In your terminal, you will likely also need to add your private key to an ssh agent:

      ssh-add </path/to/my.pem>
    • ####Launch EC2 instances:

      ~$ ansible-playbook ./ec2.yml --extra-vars "vars_file=./example_ec2_vars.yml" --tags launch
    • ####Stop EC2 instances:

      ~$ ansible-playbook ./ec2.yml --extra-vars "vars_file=./example_ec2_vars.yml" --tags stop 
    • ####Start EC2 instances:

      ~$ ansible-playbook ./ec2.yml --extra-vars "vars_file=./example_ec2_vars.yml" --tags start 
    • ####Terminate EC2 instances:

      ~$ ansible-playbook ./ec2.yml --extra-vars "vars_file=./example_ec2_vars.yml" --tags terminate
  • ###Zookeeper For Zookeeper playbook, a zookeeper_tag needs to be specified to identify the nodes in the cluster. This zookeeper_tag can be any tag specified in tag_key_vals in the variable file for [EC2]( while launching EC2 instances.

    The zookeeper_tag is specifed as <key>_<value> for one of the tag_key_vals to be used. For example, if the <cluster-name> in the EC2 variable file mentioned above was test-cluster, the zookeeper_tag would be specified as zookeeper_tag=Name_test-cluster. It doesn't have to be the Name tag but could be any key value pair in tag_key_vals specified as zookeeper_tag=<key>_<value>.

    • ####Install Zookeeper:

      ~$ ansible-playbook ./zookeeper.yml --extra-vars "zookeeper_tag=<cluster_tag>" --tags install
    • ####Start Zookeeper:

      ~$ ansible-playbook ./zookeeper.yml --extra-vars "zookeeper_tag=<cluster_tag>" --tags start
    • ####Get info about Zookeeper on the specified cluster:

      ~$ ansible-playbook ./zookeeper.yml --extra-vars "zookeeper_tag=<cluster_tag>" --tags info
    • ####Stop Zookeeper:

      ~$ ansible-playbook ./zookeeper.yml --extra-vars "zookeeper_tag=<cluster_tag>" --tags stop 
    • ####Uninstall Zookeeper:

      ~$ ansible-playbook ./zookeeper.yml --extra-vars "zookeeper_tag=<cluster_tag>" --tags uninstall
  • ###Kafka Kafka has a dependency on Zookeeper for cluster membership, topic configuration, data partition, etc. For Kafka playbook, a zookeeper_tag and a kafka_tag needs to be specified to identify the nodes in the zookeeper and kafka cluster respectively. The kafka_tag and zookeeper_tag can be any tag specified in tag_key_vals in the variable file for EC2.

    The kafka_tag and zookeeper_tag are specifed as <key>_<value> for one of the tag_key_vals to be used. For example, if the <cluster-name> in the EC2 variable file mentioned above was test-cluster and we had same cluster for Zookeeper and Kafka, the kafka_tag and zookeeper_tag would be specified as zookeeper_tag=Name_test-cluster and kafka_tag=Name_test-cluster respectively. Both Zookeeper and Kafka don't have to be on the same cluster and it doesn't have to be the Name tag but it could be any key value pair in tag_key_vals specified as zookeeper_tag=<key>_<value> and kafka_tag=<key>_<value>.

    ####Kafka's dependency on Zookeeper

    Kafka's dependency on Zookeeper is taken care of by the Kafka playbook. If you are trying to ssetup Kafka on the cluster specified by kafka_tag, the playbook will check that Zookeeper is installed on the cluster zookeeper_tag and if it is not setup, the playbook will first setup Zookeeper and then Kafka. By default, any operation on Kafka cluster, like start, install, etc., will first be executed on the Zookeeper cluster. However, we would want some of the operations to be executed on the Kafka cluster, like stop, uninstall, etc., not be executed on the Zookeeper cluster. This can be achieved by specifying a flag --skip-tags zookeeper while running the Kafka playbook. Examples for this behavior are shown below in the stop and uninstall operations.

    • ####Install Kafka:

      ~$ ansible-playbook ./kafka.yml --extra-vars "zookeeper_tag=<cluster_tag> kafka_tag=<cluster_tag>" --tags install
    • ####Start Kafka:

      ~$ ansible-playbook ./kafka.yml --extra-vars "zookeeper_tag=<cluster_tag> kafka_tag=<cluster_tag>" --tags start
    • ####Get info about Kafka on the specified cluster:

      ~$ ansible-playbook ./kafka.yml --extra-vars "zookeeper_tag=<cluster_tag> kafka_tag=<cluster_tag>" --tags info
    • ####Stop Kafka:

      ~$ ansible-playbook ./kafka.yml --extra-vars "zookeeper_tag=<cluster_tag> kafka_tag=<cluster_tag>" --tags stop --skip-tags zookeeper
    • ####Uninstall Kafka:

      ~$ ansible-playbook ./kafka.yml --extra-vars "zookeeper_tag=<cluster_tag> kafka_tag=<cluster_tag>" --tags uninstall --skip-tags zookeeper
      
  • Vowpal Wabbit

Vowpal Wabbit is a fast out-of-core Machine Learning system. Installation can take upwards of 10 minutes on micro instances, as it compiles a lot of C++ with high optimization levels using Clang.

  • ####Install Vowpal Wabbit:

    ~$ ansible-playbook ./vw.yml --extra-vars "vw_tag=class_vw" --tags install