If you need help with deployment, ask me before Tuesday.
Here’s a sample app:
How weird is your household? In-class critique, pitting the New York Times data team vs a former Apple designer.
Data and some lessons here: congress-data-taster
Here’s something you should be able to do with the APIs: Find the least partisan congressmembers by voting record.
In class, consider how money spent is a proxy for competitive races.
Explore the FEC Expenditures with Python and answer these questions:
Then, apply the same process to candidate disbursement data. Pay particular attention to how different the fields are for independent expenditures.
For Wednesday: We’ll build an app in class. Sign up for the following APIs:
And before class (i.e. Tuesday night), email me with 10 interesting mashups of Congress data.
Homework: (due next Monday and Wednesday) - Mini-Project: A Flask App That Filters
Compare how the different organizations rank and filter data, and the pros and minuses:
Send me two deliverables:
Build a Flask app similar to the news app described in NICAR’s First News App tutorial. Deploy it to Heroku.
Warning: the First News App tutorial contains a number of unnecessary steps that you don’t need to follow. It also does not contain any of the steps needed to get on to Heroku.
You don’t have to use the LA Riots data, but I want you to be able to create an app that has at least these components:
Note: while we haven’t covered JavaScript explicitly, you should be able to create an interactive JavaScript map by following the tutorial and making adjustments as needed.
FWIW, here’s my hot take on the First News App. The main change is that I use the Google Street View API to display a picture of the incident address.
When you get to the Hello, JavaScript section of the First News App tutorial, it will have some example HTML for including the external JavaScript and style files for its Leaflet interactive maps:
<link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet/v0.7.7/leaflet.css" />
<script src="http://cdn.leafletjs.com/leaflet/v0.7.7/leaflet.js"></script>
Don’t copy that code – it will be non-functional on a Heroku site, because Heroku uses https
and browsers, such as Chrome, will not allow the importing of files from http
(non-secure) URLs.
Instead, include this HTML snippet – it will import the same Leaflet code, but will actually work on Heroku:
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.css" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.js"></script>
Relevant repos:
Clone the repo here: https://github.com/datademofun/matplotlibsampler and work your way through the lessons.
In your cj-2016 repo, create a folder named 3-charts
It should contain three charts:
Your cj-2016/3-charts folder should contained saved images of the charts (i.e. .png files) and the code to generate those files.
Also, please read:
Pre-built Flask apps to fork, clone, and improve
Worth reading on your own: What I Learned About the Washington Post From Four Years Collecting Data on Police Violence
David Yanofsky of Quartz will talk about his entrepreneurial work in data visualization and investigations.
Homework due April 20: Scrape and Count Webpages
Going to pivot to web scraping and HTML parsing. First lesson is here:
Collect the lists of White House press briefings
In class exercise on building a Flask app
Finish these series of exercises, make sure you can produce a simple Flask app:
Introduction to Simple Web Applications with Flask
We'll tentatively start on this series of Flask app building
Introduction to Building Web Applications from Data
Practice examining NYPD Stop and Frisk Data using interactive Python.
Focuses on using data and algorithms to lower the cost of discovering stories or telling stories in more engaging and personalized ways. Project based assignments based on real-world challenges faced in newsrooms. Prior experience in journalism or computational thinking helpful. Prerequisite: Comm 273D, COMM 113/213, or the consent of instructor.
Dan Nguyen, dun@stanford.edu
There are no required books, but I'll likely make frequent references to:
We'll be using Python 3.5 and Github. You should be using a text editor for writing your programs: either Sublime Text 3 (3, not 2) or Atom will do.
A Github repository named cj2016
, i.e. you should have a Github URL that looks like:
https://github.com/whatevyourname/cj-2016
Due on Monday, via today's lesson plan:
Practice examining NYPD Stop and Frisk Data using interactive Python.
We'll continue reviewing the Python programming fundamentals, in the service of deserializing text into data structures, and, when necessary, turning data structures into text files, particularly formatted as CSV and JSON.
Our problem is not lack of information. It's lack of attention span. Data is not much good to us if we can't sort it the way we need it to be sorted. Hence, the need to scrape webpages and PDFs.
We'll use ProPublica's Dollars for Docs as a case study.
By now, we'll have written a fair amount of HTML. Web-scraping generally involves learning one more kind of text parser, such as lxml or BeautifulSoup, and writing the automated logic to navigate a website.
David Yanofsky of Quartz will talk about his entrepreneurial work in data visualization and investigations.
Building a better Recalls site: studying the Recalls dataset.
A walkthrough of HTML scraping and regexes
Introduction to Simple News Apps based on CSPC Recall Data
Homework: Build out the Recalls app as far as making it a table and adding product images.
Building multi-page Flask apps; Examples:
Studying both the technique and theory of effective data visualization, and how to use Python's matplotlib to efficiently produce charts.
Contrast/compare examples of real-world news applications and data portals, including ProPublica's Represent and Socrata.
Learn how to deploy a basic app to Heroku (and maybe AWS).
Steps:
Study APIs and datasets focused on U.S. Congress, including:
The Itemizer (thescoop.org) by Derek Willis:
Why he made it:
There’s one thing that has always bugged me about how we reference campaign finance data online: the best that most of us can do when we link to a campaign filing is to link to a particular page, whether that’s a list of contributors or a summary page. Yet often we’re referencing a single transaction or line-item.
via Derek Willis: The Data-Driven Congressional Reporter (thescoop.org)
Maybe you don’t have time to read the Record every day; wouldn’t it be great if you could set some simple rules for things of interest and have a computer do it for you? Wouldn’t it make sense that a computer could find the exception to the rule among a series of House votes that occurred while you were out interviewing people?
Here are some screenshots from the NYT's internal Congress app that give an idea of the "views" into Congressional voting data that is interesting to New York Times political reporters:
In-class time to work on and share projects.