List of libraries, tools and APIs for web scraping and data processing.
Updated Sep 19, 2018
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Updated Sep 9, 2018
Web Scraping Framework
Updated Sep 15, 2018
General Assembly's 2015 Data Science course in Washington, DC
Updated Apr 18, 2016
A framework for creating semi-automatic web content extractors
Updated May 1, 2018
Random User-Agent middleware based on fake-useragent
An unofficial API for Quora.
Updated Oct 9, 2016
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Updated Sep 1, 2018
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
Updated Mar 22, 2017
ACHE is a web crawler for domain-specific search.
Simple Query Scraping with CSS and Go Reflection
Updated Oct 21, 2016
Modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or si…
Updated Sep 21, 2018
Scrapy crawler to collect data on the back catalog of songs listed for sale.
Updated May 24, 2017
Kantu for Chrome and Firefox - Modern Web Browser Automation plus Selenium IDE
Updated Sep 23, 2018
MetaData html scraper and parser for Node.js (supports Promises and callback style)
Updated Mar 1, 2018
Updated Jul 17, 2018
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Updated Feb 12, 2017
Android app for saving webpages for offline reading.
Updated Jun 22, 2018
Python bindings to Modest engine (fast HTML5 parser with CSS selectors).
Updated Aug 28, 2018
Scrapy Training companion code
Updated Jul 19, 2017
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Updated Apr 6, 2017
Tutorial: Web scraping in Python with Beautiful Soup
Updated Aug 26, 2017
Zillow Scraper for Python using Selenium
Updated Mar 23, 2018
Web scrapping and related analytics using Python tools
Updated Aug 17, 2018
Updated Aug 14, 2018
Updated Sep 23, 2018
PHP Library for detecting CMS
Updated Jun 18, 2018
Headless 'Chrome' Orchestration in R
Updated Aug 23, 2018
Basic setup with random user agents and IP addresses for Python Scrapy Framework.
Updated Dec 19, 2017