I started learning Python because I wanted to expand my own development career to new languages, I was told it would be the easiest way, and it actually seems about right. By learning to program in Python , you become accustomed to the many aspects of programming, all of which are similar among the vast amount of programming languages.
You’ll find those who’re more interested in learning Python programming language for the sake of tinkering with web frameworks, but then there are those – presumably You – who’re more into scraping things from the web , and then making that data look beautiful for everyone else to enjoy. Python is acclaimed as the perfect language to learn when it comes for quick and easy web scraping .
I recently published an article on web scraping tools, in which I discuss some of the most popular scraping apps and tools that have a GUI (Graphics User-Interface), so in turn being very accessible to beginners and lesser educated developers. But, the feedback I received implied that I should make another post – dedicated specifically to tutorials on how to scrape in Python . Here we are, ready to explore some examples of how to scrape the web using a simple Python script.
Jake Austwick has put together a great tutorial (resource) on how to get started with scraping in Python . The whole tutorial is based (mainly) on two libraries: lxml, and Requests. Jake will guide you through the most common misconceptions and pitfalls that many young scrapers experience, but there is also plenty of sound advice to be found. Remember, if a platform has an API – it’s probably best to use that for gathering info, building a separate scraper can be time costly!
Extracting NBA data from ESPN
Right, nothing teaches better than practice, and tiny snippets! I feel this quick tutorial from Daniel Rodriguez is perfect for learning and seeing how quickly you can build a scraper to scrape anything you like. In this sample, Daniel is scraping some NBA player information from ESPN, alongside the information for player stats, the teams that are playing in the NBA right now, and also the game schedules.
In this Python scraping tutorial , Greg Reda is teaching us how to use lxml, and BeautifulSoup combined! The tutorial is for Python 2.7 users, it’s a fairly low-level introduction for those who want to see how to select HTML elements, and how to put data back together using database libraries.
I really like this tutorial , it’s small, but complex at the same time. Daniel Forsyth gives us some insight on how to scrape famous ticket selling websites for the latest tickets! Imagine that, being able to scrape tickets as soon as they come available! Surely, you could outperform some human behaviour, and perhaps even snag a ticket you’ve been meaning to snag for so long? Either way, great tutorial on how simple Python can be.
Python 3.4 added a new asynchronous I/O module named asyncio (formerly known as Tulip). The asyncio module provides a new infrastructure with a plugabble event loop, transport and protocol abstractions, a Future class (adapted for use within the event loop), coroutines, tasks, threadpool management, and synchronization primitives to simplify coding concurrent code. — Dr Dobb’s
Here we have Georges Dubus takes us through the new Python module asyncio, the objective of his tutorial is to scrape a few torrents, and then sort them by their magnet links. Whether you use the scraper for yourself or not, it still has some value for those who’re just starting out.
I hope this tutorials on how to scrape the web with Python are going to prove useful to you. I couldn’t find any more that were of bigger scope than a few lines of code, do you know of any good scraping tutorials ( in Python !) that I may have missed? Please, look in your saved links and drop a comment with what you’ve got, I’m sure the community will appreciate more resources.