List of libraries, tools and APIs for web scraping and data processing.
Updated Nov 17, 2018
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Updated Nov 14, 2018
Web Scraping Framework
Updated Sep 15, 2018
General Assembly's 2015 Data Science course in Washington, DC
Updated Apr 18, 2016
A framework for creating semi-automatic web content extractors
Updated May 1, 2018
Random User-Agent middleware based on fake-useragent
An unofficial API for Quora.
Updated Oct 9, 2016
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Updated Sep 1, 2018
Modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or si…
Updated Oct 12, 2018
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
Updated Mar 22, 2017
ACHE is a web crawler for domain-specific search.
Simple Query Scraping with CSS and Go Reflection
Updated Oct 21, 2016
Scrapy crawler to collect data on the back catalog of songs listed for sale.
Updated May 24, 2017
Updated Nov 19, 2018
Kantu for Chrome and Firefox - Modern Web Browser Automation plus Selenium IDE
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Updated Feb 12, 2017
MetaData html scraper and parser for Node.js (supports Promises and callback style)
Updated Oct 5, 2018
Updated Jul 17, 2018
Android app for saving webpages for offline reading.
Updated Jun 22, 2018
Python bindings to Modest engine (fast HTML5 parser with CSS selectors).
Updated Sep 30, 2018
Zillow Scraper for Python using Selenium
Updated Nov 4, 2018
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Updated Apr 6, 2017
Tutorial: Web scraping in Python with Beautiful Soup
Updated Nov 18, 2018
Scrapy Training companion code
Updated Jul 19, 2017
Twitter Intelligence OSINT project performs tracking and analysis of the Twitter
Updated Oct 31, 2018
Collection of scripts corresponding to LucidProgramming YouTube tutorials
Updated Nov 5, 2018
Web scrapping and related analytics using Python tools
Updated Nov 17, 2018
Updated Aug 14, 2018
Headless 'Chrome' Orchestration in R
Updated Nov 11, 2018