Rust CSS JavaScript
Clone or download
valeriansaliou Bump
Signed-off-by: Valerian Saliou <valerian@valeriansaliou.name>
Latest commit d62a0ee Jul 3, 2018
Permalink
Failed to load latest commit information.
dev Reporter notes Jan 23, 2018
res/assets Fixes #5 Mar 8, 2018
src Split notifiers in features (exclude XMPP from default features) Mar 31, 2018
.gitignore Initialize Jan 10, 2018
.travis.yml Fix Jan 15, 2018
Cargo.lock Bump deps Apr 16, 2018
Cargo.toml Bump deps Apr 16, 2018
Dockerfile Adjust packaging for v1.4.0 Mar 31, 2018
LICENSE.md Initialize Jan 10, 2018
PACKAGING.md Bump Apr 13, 2018
README.md Bump Jul 3, 2018
config.cfg Bump Mar 31, 2018

README.md

Vigil

Build Status Dependency Status

Microservices Status Page. Monitors a distributed infrastructure and sends alerts to Slack.

Vigil is an open-source Status Page you can host on your infrastructure, used to monitor all your servers and apps, and visible to your users (on a domain of your choice, eg. status.example.com).

It is useful in microservices contexts to monitor both apps and backends. If a node goes down in your infrastructure, you receive a status change notification in a Slack channel, Email, Twilio SMS or/and XMPP.

πŸ‘‰ See a live demo of Vigil on Crisp Status Page.

πŸ“° The Vigil project was announced in a post on my personal journal.

Vigil

Who uses it?

Crisp

πŸ‘‹ You use Vigil and you want to be listed there? Contact me.

Features

  • Monitors automatically your infrastructure services
  • Notifies you when a service gets down or gets back up (via a configured channel: Slack, Email, Twilio SMS or/and XMPP)
  • Generates a status page, that you can host on your domain for your public users (eg. https://status.example.com)

How does it work?

Vigil monitors all your infrastructure services. You first need to configure target services to be monitored, and then Vigil does the rest for you.

There are two kinds of services Vigil can monitor:

  • HTTP / TCP services: Vigil frequently probe a HTTP or TCP target and checks for reachability
  • Application services: Install the Vigil Reporter library eg. on your NodeJS app and get reports when your app gets down, as well as when the host server system is overloaded

It is recommended to configure Vigil or Vigil Reporter to send frequent probe checks, as to ensure you are quickly notified when a service gets down (thus to reduce unexpected downtime on your services).

Hosted alternative to Vigil

Vigil needs to be hosted on your own systems, and maintained on your end. If you do not feel like managing yet another service, you may use Crisp Status instead. Crisp status is a direct port of Vigil to the Crisp customer support platform.

Crisp Status hosts your status page on Crisp systems, and is able to do what Vigil does (and even more!). Crisp Status is integrated to other Crisp products (eg. Crisp Chatbox & Crisp Helpdesk). It warns your users over chatbox and helpdesk if your status page reports as dead for an extended period of time.

As an example of a status page running Crisp Status, check out Enrich Status Page.

How to use it?

Installation

Install from releases:

The best way to install Vigil is to pull the latest release from the Vigil releases page.

Make sure to pick the correct server architecture (eg. Intel 32 bits).

Install from Cargo:

If you prefer managing vigil via Rust's Cargo, install it directly via cargo install:

cargo install vigil-server

Ensure that your $PATH is properly configured to source the Crates binaries, and then run Vigil using the vigil command.

Install from source:

The last option is to pull the source code from Git and compile Vigil via cargo:

cargo build --release

You can find the built binaries in the ./target/release directory.

Install libssl-dev (ie. OpenSSL headers) and libstrophe-dev (ie. XMPP library headers; only if you need the XMPP notifier) before you compile Vigil. SSL dependencies are required for the HTTPS probes and email notifications.

Install from Docker Hub:

You might find it convenient to run Vigil via Docker. You can find the pre-built Vigil image on Docker Hub as valeriansaliou/vigil.

First, pull the valeriansaliou/vigil image:

docker pull valeriansaliou/vigil:v1.4.0

Then, seed it a configuration file and run it (replace /path/to/your/vigil/config.cfg with the path to your configuration file):

docker run -p 8080:8080 -v /path/to/your/vigil/config.cfg:/etc/vigil.cfg valeriansaliou/vigil:v1.4.0

In the configuration file, ensure that:

  • server.inet is set to 0.0.0.0:8080 (this lets Vigil be reached from outside the container)
  • assets.path is set to ./res/assets/ (this refers to an internal path in the container, as the assets are contained there)

Vigil will be reachable from http://localhost:8080.

Configuration

Use the sample config.cfg configuration file and adjust it to your own environment.

Available configuration options are commented below, with allowed values:

[server]

  • log_level (type: string, allowed: debug, info, warn, error, default: warn) β€” Verbosity of logging, set it to error in production
  • inet (type: string, allowed: IPv4 / IPv6 + port, default: [::1]:8080) β€” Host and TCP port the Vigil public status page should listen on
  • workers (type: integer, allowed: any number, default: 4) β€” Number of workers for the Vigil public status page to run on
  • reporter_token (type: string, allowed: secret token, default: no default) β€” Reporter secret token (ie. secret password)

[assets]

  • path (type: string, allowed: UNIX path, default: ./res/assets/) β€” Path to Vigil assets directory

[branding]

  • page_title (type: string, allowed: any string, default: Status Page) β€” Status page title
  • page_url (type: string, allowed: URL, no default) β€” Status page URL
  • company_name (type: string, allowed: any string, no default) β€” Company name (ie. your company)
  • icon_color (type: string, allowed: hexadecimal color code, no default) β€” Icon color (ie. your icon background color)
  • icon_url (type: string, allowed: URL, no default) β€” Icon URL, the icon should be your squared logo, used as status page favicon (PNG format recommended)
  • logo_color (type: string, allowed: hexadecimal color code, no default) β€” Logo color (ie. your logo primary color)
  • logo_url (type: string, allowed: URL, no default) β€” Logo URL, the logo should be your full-width logo, used as status page header logo (SVG format recommended)
  • website_url (type: string, allowed: URL, no default) β€” Website URL to be used in status page header
  • support_url (type: string, allowed: URL, no default) β€” Support URL to be used in status page header (ie. where users can contact you if something is wrong)
  • custom_html (type: string, allowed: HTML, default: empty) β€” Custom HTML to include in status page head (optional)

[metrics]

  • poll_interval (type: integer, allowed: seconds, default: 120) β€” Interval for which to probe nodes in poll mode
  • poll_retry (type: integer, allowed: seconds, default: 2) β€” Interval after which to try probe for a second time nodes in poll mode (only when the first check fails)
  • poll_http_status_healthy_above (type: integer, allowed: HTTP status code, default: 200) β€” HTTP status above which poll checks to HTTP replicas reports as healthy
  • poll_http_status_healthy_below (type: integer, allowed: HTTP status code, default: 400) β€” HTTP status under which poll checks to HTTP replicas reports as healthy
  • poll_delay_dead (type: integer, allowed: seconds, default: 30) β€” Delay after which a node in poll mode is to be considered dead (ie. check response delay)
  • poll_delay_sick (type: integer, allowed: seconds, default: 10) β€” Delay after which a node in poll mode is to be considered sick (ie. check response delay)
  • push_delay_dead (type: integer, allowed: seconds, default: 20) β€” Delay after which a node in push mode is to be considered dead (ie. time after which the node did not report)
  • push_system_cpu_sick_above (type: float, allowed: system CPU loads, default: 0.90) β€” System load indice for CPU above which to consider a node in push mode sick (ie. UNIX system load)
  • push_system_ram_sick_above (type: float, allowed: system RAM loads, default: 0.90) β€” System load indice for RAM above which to consider a node in push mode sick (ie. percent RAM used)

[plugins]

[plugins.rabbitmq]

  • api_url (type: string, allowed: URL, no default) β€” RabbitMQ API URL (ie. http://127.0.0.1:15672)
  • auth_username (type: string, allowed: username, no default) β€” RabbitMQ API authentication username
  • auth_password (type: string, allowed: password, no default) β€” RabbitMQ API authentication password
  • virtualhost (type: string, allowed: virtual host, no default) β€” RabbitMQ virtual host hosting the queues to be monitored
  • queue_ready_healthy_below (type: integer, allowed: any number, no default) β€” Maximum number of payloads in RabbitMQ queue with status ready to consider node healthy.
  • queue_nack_healthy_below (type: integer, allowed: any number, no default) β€” Maximum number of payloads in RabbitMQ queue with status nack to consider node healthy.

[notify]

[notify.email]

  • to (type: string, allowed: email address, no default) β€” Email address to which to send emails
  • from (type: string, allowed: email address, no default) β€” Email address from which to send emails
  • smtp_host (type: string, allowed: hostname, IPv4, IPv6, default: localhost) β€” SMTP host to connect to
  • smtp_port (type: integer, allowed: TCP port, default: 587) β€” SMTP TCP port to connect to
  • smtp_username (type: string, allowed: any string, no default) β€” SMTP username to use for authentication (if any)
  • smtp_password (type: string, allowed: any string, no default) β€” SMTP password to use for authentication (if any)
  • smtp_encrypt (type: boolean, allowed: true, false, default: true) β€” Whether to encrypt SMTP connection with STARTTLS or not

[notify.twilio]

  • to (type: string, allowed: phone number, no default) β€” Phone number to which to send text messages
  • from (type: string, allowed: phone number, no default) β€” Phone number from which to send text messages (this number must be available for use in your Twilio account)
  • account_sid (type: string, allowed: any string, no default) β€” Twilio account identifier (ie. Sid)
  • auth_token (type: string, allowed: any string, no default) β€” Twilio authentication token (ie. AuthToken)

[notify.slack]

  • hook_url (type: string, allowed: URL, no default) β€” Slack hook URL (ie. https://hooks.slack.com/[..])

[notify.xmpp]

Notice: the XMPP notifier requires libstrophe (libstrophe-dev package on Debian) to be available when compiling Vigil, with the feature notifier-xmpp enabled upon Cargo build.

  • to (type: string, allowed: Jabber ID, no default) β€” Jabber ID (JID) to which to send messages
  • from (type: string, allowed: Jabber ID, no default) β€” Jabber ID (JID) from which to send messages
  • xmpp_password (type: string, allowed: any string, no default) β€” XMPP account password to use for authentication

[probe]

[[probe.service]]

  • id (type: string, allowed: any unique lowercase string, no default) β€” Unique identifier of the probed service (not visible on the status page)
  • label (type: string, allowed: any string, no default) β€” Name of the probed service (visible on the status page)

[[probe.service.node]]

  • id (type: string, allowed: any unique lowercase string, no default) β€” Unique identifier of the probed service node (not visible on the status page)
  • label (type: string, allowed: any string, no default) β€” Name of the probed service node (visible on the status page)
  • mode (type: string, allowed: poll, push, no default) β€” Probe mode for this node (ie. poll is direct HTTP or TCP poll to the URLs set in replicas, while push is for Vigil Reporter nodes)
  • replicas (type: array[string], allowed: TCP or HTTP URLs, default: empty) β€” Node replica URLs to be probed (only used if mode is poll)
  • http_body_healthy_match (type: string, allowed: regular expressions, no default) β€” HTTP response body for which to report node replica as healthy (if the body does not match, the replica will be reported as dead, even if the status code check passes; the check uses a GET rather than the usual HEAD if this option is set)
  • rabbitmq_queue (type: string, allowed: RabbitMQ queue names, no default) β€” RabbitMQ queue associated to node, which to check against for pending payloads via RabbitMQ API (this helps monitor unacked payloads accumulating in the queue)

Run Vigil

Vigil can be run as such:

./vigil -c /path/to/config.cfg

Usage recommendations

Consider the following recommendations when using Vigil:

  • Vigil should be hosted on a safe, separate server. This server should run on a different physical machine and network than your monitored infrastructure servers.
  • Make sure to whitelist the Vigil server public IP (both IPv4 and IPv6) on your monitored HTTP services; this applies if you use a bot protection service that challenges bot IPs, eg. Distil Networks or Cloudflare. Vigil will see the HTTP service as down if a bot challenge is raised.

What status variants look like?

Vigil has 3 status variants, either healthy (no issue ongoing), sick (services under high load) or dead (outage):

Healthy status variant

Status Healthy

Sick status variant

Status Sick

Dead status variant

Status Dead

What do alerts look like?

When a monitored backend or app goes down in your infrastructure, Vigil can let you know by Slack, Twilio SMS, Email and XMPP:

Vigil alert in Slack

You can also get nice realtime down and up alerts on your eg. iPhone and Apple Watch:

Vigil down alert on iPhone (Slack) Vigil up alert on Apple Watch (Slack) Vigil alerts on iPhone (Twilio SMS)

How can I integrate Vigil Reporter in my code?

Vigil Reporter is used to actively submit health information to Vigil from your apps. Apps are best monitored via application probes, which are able to report detailed system information such as CPU and RAM load. This lets Vigil show if an application host system is under high load.

Vigil Reporter Libraries

πŸ‘‰ Cannot find the library for your programming language? Build your own and be referenced here! (contact me)

Manual reporting

In case you need to manually report node metrics to the Vigil endpoint, use the following HTTP configuration (adjust it to yours):

Endpoint URL:

HTTP POST https://status.example.com/reporter/<probe_id>/<node_id>/

Where:

  • node_id: The parent node of the reporting replica
  • probe_id: The parent probe of the node

Request headers:

  • Add an Authorization header with a Basic authentication where the password is your configured reporter_token.

Request data:

Adjust the request data to your replica context and send it as HTTP POST:

{
  "replica": "<replica_id>",
  "interval": 30,

  "load": {
    "cpu": 0.30,
    "ram": 0.80
  }
}

Where:

  • replica: The replica unique identifier (eg. the server LAN IP)
  • interval: The push interval (in seconds)
  • load.cpu: The general CPU load, from 0.00 to 1.00 (can be more than 1.00 if the CPU is overloaded)
  • load.ram: The general RAM load, from 0.00 to 1.00

πŸ”₯ Report A Vulnerability

If you find a vulnerability in Vigil, you are more than welcome to report it directly to @valeriansaliou by sending an encrypted email to valerian@valeriansaliou.name. Do not report vulnerabilities in public GitHub issues, as they may be exploited by malicious people to target production servers running an unpatched Vigil server.

⚠️ You must encrypt your email using @valeriansaliou GPG public key: πŸ”‘valeriansaliou.gpg.pub.asc.

🎁 Based on the severity of the vulnerability, I may offer a $100 (US) bounty to whomever reported it.