Let’s Scrape Twitter! (w/ Python)

Author

A demo by Breanna E. Green // Powered by Quarto

Cartoon version of B. E. Green

Introduction:

This webpage contains materials for a simple introduction to scraping content from Twitter. I created it present to colleagues and friends at Cornell in 2021 Spring (so if you are seeing this many moons in the future… brace yourself for this relic). A quick snippet about me – I am a PhD Student in the Information Science Department who’s main research interest is how activist and extremist ideologies are developed and manifested online, particularly using social media. Moreover, I’ve been particularly interested in how folks moralize acts of harm.

Overview

Why scrape?

Sometimes we are interested in the conversations, content, and context of interactions happening on the internet. Often, this involves finding ways to extract this information in organized and informative ways based on the source we are examining – Facebook, Twitter, Reddit, Gab, etc.

As a point of necessary discussion – some might find these ways to be legally or morally questionable. This is a conversation certainly worth exploring and I implore you to always keep this in mind.

Talk to your IRB, experts in the field, the actual website of interest, and/or Google.

Tell me the ways!

There are too many to name!

But some of my favorites involve:

But people might build automatic web scrapers or crawlers (Spiders) unique to their needs, or publically available options such as:

How are we going to do this?

Table of Contents:

Set Up Twitter API

Twitter API Demo

Requests and BeautifulSoup

BeautifulSoup Demo

Acknowledgments:

Blog Home Page | Home