Let’s Scrape Twitter! (w/ Python)
Introduction:
This webpage contains materials for a simple introduction to scraping content from Twitter. I created it present to colleagues and friends at Cornell in 2021 Spring (so if you are seeing this many moons in the future… brace yourself for this relic). A quick snippet about me – I am a PhD Student in the Information Science Department who’s main research interest is how activist and extremist ideologies are developed and manifested online, particularly using social media. Moreover, I’ve been particularly interested in how folks moralize acts of harm.
Overview
Why scrape?
Sometimes we are interested in the conversations, content, and context of interactions happening on the internet. Often, this involves finding ways to extract this information in organized and informative ways based on the source we are examining – Facebook, Twitter, Reddit, Gab, etc.
As a point of necessary discussion – some might find these ways to be legally or morally questionable. This is a conversation certainly worth exploring and I implore you to always keep this in mind.
Talk to your IRB, experts in the field, the actual website of interest, and/or Google.
Tell me the ways!
There are too many to name!
But some of my favorites involve:
But people might build automatic web scrapers or crawlers (Spiders) unique to their needs, or publically available options such as:
- Scrapy
- Bathyscaphe (an intersting one I found for the DarkWeb)
How are we going to do this?
Table of Contents:
Set Up Twitter API
Twitter API Demo
Requests and BeautifulSoup
BeautifulSoup Demo
Acknowledgments:
I only learned how to develop this page after seeing the magnificent course materials of Melanine Walsh, who themselves cite the works of Lauren Klein, David Mimno, and Allison Parrish. Many, many thanks for such a beautiful and informative site!
Second, many thanks to friends in the THE SOCIAL PERCEPTION AND INTERGROUP (IN)EQUALITY LAB for whom I was motivated by to work on this! Otherwise, it might have been a powerpoint.
Crowd boos in dissatisfaction.