Beginners Guide To Web Scraping With A Raspberry Pi
Introduction
The internet is full of data. Lots and lots of data. Data I can haz. But manually going to a website and copying and pasting the data into a spreadsheet or database is tedious and a time consuming. Enter web scraping! This guide will show you how to get started in scraping web data to your hearts content in 8 minutes!
Setup
I will be using a Raspberry Pi to do my web scraping because by default it has everything you need pre-installed to start programming. You're welcome to use other platforms for web scraping, but you will most likely have to install all of the programs and libraries for it to work, including:
Background
Web scraping is the act of programmatically gathering information from websites and the web. The precursor to web scraping was called "crawling", and it's how search engines indexed the web. Web scraping didn't really become popular until a web scraping library called BeautifulSoup was developed. It was made to be used with several different programming languages, but we will be using it with Python.
Legality Concerns
Is web scraping legal? I will go with the best non-answer answer..."it depends". Some websites don't allow web scraping, some are completely fine with it, and some allow it with stipulations. If you aren't sure whether a website allows scraping, you can check out it's "robots.txt" file. A websites "robots.txt" file tells us what is allowed by crawling/scraping robots and what isn't. If it is on the list, then you should be in the clear. To make sure I'm not infringing on the law, I will be using quotes.toscrape.com, a website built intentionally to test web scrapers.
The Code
Open up a Python editor (such as Thonny) and type in this code
Conclusion
This gives you a taste of what's possible with web scrapers. You could scrape move info, sports stats, etc. Have fun! But be responsible. If you want a more advanced lesson in web scraping (to scrape behind logins or paginated pages), check out my more advanced tutorial!
Comments
This post currently has no responses.