Final Project

Final requirements and deadlines

README.md due Monday, May 30
Last class on June 1 to work on projects and get feedback
Deployed project due Tuesday, June 7

If you need help with deployment, ask me before Tuesday.

Here’s a sample app:

Congressmembers and their fancy colleges (best colleges rankings mashed with biographical information)
Check out the repo for some example code, including how to deploy a “frozen app”: https://github.com/dannguyen/congress-colleges

Last week

How weird is your household? In-class critique, pitting the New York Times data team vs a former Apple designer.

Data and some lessons here: congress-data-taster

Here’s something you should be able to do with the APIs: Find the least partisan congressmembers by voting record.

May 16: Congressional data

Readme: https://github.com/datademofun/fec-catalog
Lesson: Exploring FEC Expenditures

In class, consider how money spent is a proxy for competitive races.

Explore the FEC Expenditures with Python and answer these questions:

How much independent expenditure money was spent on Trump vs opposing Trump? How about Clinton?
Which candidates had the most oppo money spent against them?
How much money was spent on Facebook advertising in 2014 vs 2010?

Then, apply the same process to candidate disbursement data. Pay particular attention to how different the fields are for independent expenditures.

For Wednesday: We’ll build an app in class. Sign up for the following APIs:

Homework

And before class (i.e. Tuesday night), email me with 10 interesting mashups of Congress data.

Last week, May 9

Homework: (due next Monday and Wednesday) - Mini-Project: A Flask App That Filters

Topic: The work of ranking and filtering data

Compare how the different organizations rank and filter data, and the pros and minuses:

College ranking sites

College Score Card

US News Best Colleges

Last time, May 9

Deploying a web application on Heroku

Lessons

Start out by creating a Heroku account, installing the Heroku toolbelt, and deploying the most basic of Heroku Flask applications.
Then, try creating a Flask app that uses USGS earthquake data. You can start from this model repo. Get it deployed onto Heroku, and make it polished looking, like this polished version.

Homework due Wednesday, May 11

Readings

Deploy your own Flask news app

Send me two deliverables:

a URL to your live app on Heroku
Create a new Github repo (do not put it in cj-2016) named myfirstnewsapp that contains the code to your Flask app.

Build a Flask app similar to the news app described in NICAR’s First News App tutorial. Deploy it to Heroku.

Warning: the First News App tutorial contains a number of unnecessary steps that you don’t need to follow. It also does not contain any of the steps needed to get on to Heroku.

You should be able to skip most of the installation instructions.
Skip the instructions involving virtualenv and/or git – we don’t need the former, and you already know how to do the latter from the other tutorials.
Skip Act 5: Hello Internet, because it describes an alternate way of deploying to the Internet.

You don’t have to use the LA Riots data, but I want you to be able to create an app that has at least these components:

A homepage route/view that lists all of the data records and a “broad view” of those records. Here’s the homepage from the First News App example.
A route/view for each record that shows more detail of the individual record. Here’s an detailed record view from the First News App example.

Note: while we haven’t covered JavaScript explicitly, you should be able to create an interactive JavaScript map by following the tutorial and making adjustments as needed.

FWIW, here’s my hot take on the First News App. The main change is that I use the Google Street View API to display a picture of the incident address.

A note about including external files

When you get to the Hello, JavaScript section of the First News App tutorial, it will have some example HTML for including the external JavaScript and style files for its Leaflet interactive maps:

  <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet/v0.7.7/leaflet.css" />
  <script src="http://cdn.leafletjs.com/leaflet/v0.7.7/leaflet.js"></script>

Don’t copy that code – it will be non-functional on a Heroku site, because Heroku uses https and browsers, such as Chrome, will not allow the importing of files from http (non-secure) URLs.

Instead, include this HTML snippet – it will import the same Leaflet code, but will actually work on Heroku:

  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.css" />
   <script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.js"></script>

Last week, May 4

Continuing with the wrangling and visualization with pandas and matplotlib

Relevant repos:

tweepy-congress-collector - a repo containing code to fetch data from the Twitter API and use pandas to join and analyze the data. Also contains all the actual data fetched (300MB)
python-notebooks-data-wrangling - this dataset contains several notebooks and several hundreds of MB of data. You can follow the lessons on how to fetch the data yourself, or clone the repo and access the data locally. Some lessons:
- Practice with subplots using earthquake data and pandas
- Partial lesson using financial data
Last week, May 2

Visualization with matplotlib; data wrangling with pandas

Clone the repo here: https://github.com/datademofun/matplotlibsampler and work your way through the lessons.

Homework (due May 4): "3-charts"

In your cj-2016 repo, create a folder named 3-charts

It should contain three charts:
A line chart comparing the stock performance of various tech companies (see the data/stocks directory in the matplotlibsampler repo
A scatterplot showing the relationship between two independent variables. Check out the data/schools or data/congress directory. Easiest example is: are high SAT reading scores related to high SAT math scores? (duh) You don’t have to join two different datasets.
A stacked bar chart with categorical variables (any of the data files will work, but data/congress might be easiest). An example: Number of twitter followers by congressional party and gender

Your cj-2016/3-charts folder should contained saved images of the charts (i.e. .png files) and the code to generate those files.

Readings

Also, please read:

Tidy Data - This will seem like an overly dense file…but not reading it will make the next lesson on data-wrangling (which includes the concepts of pivoting and “melting”) data somewhat difficult for you.
Word clouds considered harmful
Introduction to Data Visualization: Visualization Types
Design Principles for News Apps & Graphics

Last Week, April 27

Pre-built Flask apps to fork, clone, and improve

Spotify browser
Congress Legislator data
Death Penalty tracker...actually, you probably won't find this one much fun.

Worth reading on your own: What I Learned About the Washington Post From Four Years Collecting Data on Police Violence

Monday, April 25

Wednesday, April 20: Texas Web Scraping

Previous class

Monday, April 18: guest speaker

David Yanofsky of Quartz will talk about his entrepreneurial work in data visualization and investigations.

Homework due April 20: Scrape and Count Webpages

Going to pivot to web scraping and HTML parsing. First lesson is here:

Collect the lists of White House press briefings

In class exercise on building a Flask app

Finish these series of exercises, make sure you can produce a simple Flask app:

Introduction to Simple Web Applications with Flask

We'll tentatively start on this series of Flask app building

Introduction to Building Web Applications from Data

Last Week

April 4 Lecture and Homework

Dollars for Docs prehistory

Two Weeks Ago

Practice examining NYPD Stop and Frisk Data using interactive Python.

COMM 177A/277A

Focuses on using data and algorithms to lower the cost of discovering stories or telling stories in more engaging and personalized ways. Project based assignments based on real-world challenges faced in newsrooms. Prior experience in journalism or computational thinking helpful. Prerequisite: Comm 273D, COMM 113/213, or the consent of instructor.

Instructor

Dan Nguyen, dun@stanford.edu

Meeting times

Building 120, Room 410
Monday and Wednesdays, 9:30AM to 11:20AM

Office hours

Mondays and Wednesdays, 1PM to 3 PM, or by appointment
McClatchy Hall 342

Objectives

To count what interests us.
To count it efficiently.
To communicate new insights to the public. Most likely via building web application.
"To do what's right and to do it now"

Grading

Attendance: 10%
Homework: 50%
Projects: 40%

There is no final.
Please let me know several days in advance if you cannot make class.
There is a final project that will consist of a public-facing web application. Here's a nice example from a student last year.
There will be 2 smaller projects, some of which will be worked on in-class and in groups.
There will be readings/case studies every week.
There will be challenges every week.

Books and Resources

There are no required books, but I'll likely make frequent references to:

We'll be using Python 3.5 and Github. You should be using a text editor for writing your programs: either Sublime Text 3 (3, not 2) or Atom will do.

Syllabus

Week 1

March 28

Get your development environment set up. Read the instructions here and make sure you have the following done:
- Python 3.5 (Anaconda recommended) installed
- A Github account
- A Github repository named cj2016, i.e. you should have a Github URL that looks like:
```
https://github.com/whatevyourname/cj-2016
```
- A text editor, such as Sublime Text 3 or Atom.

Homework

Due on Monday, via today's lesson plan:

Practice examining NYPD Stop and Frisk Data using interactive Python.

Week 2 - Text and Visualizing Text

April 4 / April 6

April 4 Lecture and Homework

We'll continue reviewing the Python programming fundamentals, in the service of deserializing text into data structures, and, when necessary, turning data structures into text files, particularly formatted as CSV and JSON.

We'll also create our first static charts using matplotlib.
And we'll create our very first news app, courtesy of NICAR's First News App tutorial.

Week 3 - Filtering noise / Web scraping

April 11 / April 13

Our problem is not lack of information. It's lack of attention span. Data is not much good to us if we can't sort it the way we need it to be sorted. Hence, the need to scrape webpages and PDFs.

We'll use ProPublica's Dollars for Docs as a case study.

By now, we'll have written a fair amount of HTML. Web-scraping generally involves learning one more kind of text parser, such as lxml or BeautifulSoup, and writing the automated logic to navigate a website.

Week 4 - APIs

April 18 / April 20

David Yanofsky of Quartz will talk about his entrepreneurial work in data visualization and investigations.

Building a better Recalls site: studying the Recalls dataset.

A walkthrough of HTML scraping and regexes

Introduction to Simple News Apps based on CSPC Recall Data

Homework: Build out the Recalls app as far as making it a table and adding product images.

Week 5 - Intermediate Flask App construction

April 25 / April 27

Building multi-page Flask apps; Examples:

Spotify browser
Congress Legislator data
Death Penalty tracker...actually, you probably won't find this one much fun.

Week 6 - Data Visualization

May 2 / May 4

Studying both the technique and theory of effective data visualization, and how to use Python's matplotlib to efficiently produce charts.

Readings

Week 7 - News application critiques, Application Deployment

May 9 / May 11

Contrast/compare examples of real-world news applications and data portals, including ProPublica's Represent and Socrata.

Learn how to deploy a basic app to Heroku (and maybe AWS).

Steps:

Making your own requirements.txt (a list of third-party libraries that you import)
Creating a runtime.txt that specifies Python 3.5.1

Week 8 - Congressional and other Public Data

May 16 / May 18

Study APIs and datasets focused on U.S. Congress, including:

Readings

The Itemizer (thescoop.org) by Derek Willis:

Why he made it:

There’s one thing that has always bugged me about how we reference campaign finance data online: the best that most of us can do when we link to a campaign filing is to link to a particular page, whether that’s a list of contributors or a summary page. Yet often we’re referencing a single transaction or line-item.

via Derek Willis: The Data-Driven Congressional Reporter (thescoop.org)

Maybe you don’t have time to read the Record every day; wouldn’t it be great if you could set some simple rules for things of interest and have a computer do it for you? Wouldn’t it make sense that a computer could find the exception to the rule among a series of House votes that occurred while you were out interviewing people?

Here are some screenshots from the NYT's internal Congress app that give an idea of the "views" into Congressional voting data that is interesting to New York Times political reporters:

Computational Journalism, Spring 2016

Final Project

Final requirements and deadlines

Last week

May 16: Congressional data

Homework

Last week, May 9

Topic: The work of ranking and filtering data

College ranking sites

Last time, May 9

Deploying a web application on Heroku

Lessons

Homework due Wednesday, May 11

Readings

Deploy your own Flask news app

A note about including external files

Last week, May 4

Continuing with the wrangling and visualization with pandas and matplotlib

Last week, May 2

Visualization with matplotlib; data wrangling with pandas

Homework (due May 4): "3-charts"

Readings

Last Week, April 27

Monday, April 25

Wednesday, April 20: Texas Web Scraping

Previous class

Monday, April 18: guest speaker

Last Week

Two Weeks Ago

COMM 177A/277A

Instructor

Meeting times

Office hours

Objectives

Grading

Books and Resources

Syllabus

Week 1

March 28

Homework

Week 2 - Text and Visualizing Text

April 4 / April 6

Week 3 - Filtering noise / Web scraping

April 11 / April 13

Week 4 - APIs

April 18 / April 20

Week 5 - Intermediate Flask App construction

April 25 / April 27

Week 6 - Data Visualization

May 2 / May 4

Readings

Week 7 - News application critiques, Application Deployment

May 9 / May 11

Week 8 - Congressional and other Public Data

May 16 / May 18

Readings

Week 9 - News application critique and deployment (continued)

May 23 / May 25

Week 10 - Project work week

May 30 / June 1