Spatial Data Science (GEO511)

geo511.wilsonlab.io

Today’s plan

  1. Who am I?
  2. Who are You?
  3. Course Overview
  4. What is R?
  5. Who uses R?

Introductions

Adam M. Wilson

Associate Professor of Global Environmental Change
Geography Department

I Use R:

  • GIS (with a little GRASS)
  • Statistics
  • Visualizations
  • HTML/Websites (including this one!)

Who are you?

  • Name
  • Degree and Department
    • e.g. Masters in GIS in Geography
  • Something about you (< 1 minute, please)
    • Maybe:
      • Why you are taking this class
      • Something you want to learn / do
      • What you did over the summer

Course Overview

Spatial Data Science

–Grolemund & Wickham, R for Data Science, O’Reilly 2016

Course Learning Objectives

  1. Convert data from varied formats/structures to desired format for analysis and visualization
  2. Clean, transform, and merge data attributes/variables appropriately
  3. Effectively display and communicate meaning from spatial, temporal, and textual data
  4. Use current analysis, presentation, and collaboration tools in the spatial data science field.

This course is NOT a statistics course (see GEO 505, etc.).

In other words:
become an R wizard!

Tools

  • R
  • RStudio
  • Slack
  • DataCamp
  • Git (Version Control)

Course Structure

source

Time committment

Plan to spend approximately 5-11 hours each week:

  • DataCamp courses: 3-5
  • Weekly Tasks: 1
  • Case Studies: 1-5
  • Project will require more time near end of semester.

Course Schedule

source

Weekly Rhythm

  1. Preparation
    1. Read Assigned Material
    2. Work on class tasks and DataCamp assignments
    3. Submit questions for class discussion on Slack
  2. Class Time Tuesday
    1. Case Study Team Meetings, problem solving, etc.
  3. Continue Preparation
    1. Finish case studies and prepare to present
    2. Prepare Resource Presentation (once / semester)
  4. Class Time Thursday
    1. Updates & Questions from reading and daily class tasks [~10 minutes]
    2. Student Resource Presentation(s) [20 minutes]
    3. Case Study Presentation [30 minutes]
      • One group [randomly] selected to share solution
      • Other groups share other approaches / solutions
    4. Case Study Introduction (for following week) [20 minutes]
  5. Rinse and Repeat

Covid Protocol

  • Not able to socially-distance in classroom
  • Masks Optional
  • When working in groups (Tuesdays), feel free to move somewhere (outside?)
  • Please stay home if sick!

Tasks for this week

source

Resource Presentations

source

1st Case Study (“due” next week)

source

Groups in this course

  • 5 groups of ~3-4 students
  • Case Study #1 groups in UBlearns
  • Slack for communication (or UBlearns, etc. )
  • Objectives:
    • build community
    • work as a team on case studies

Group Leader Sign ups

Meet your group

  • 10 minute breakout session
  • Topics
    • Introduce again
    • Do you have any experience writing code? (if so, which languages, etc.)?
    • What do you hope to learn from this course?
    • Perhaps identify leader for next week (who will be prepared to explain Case Study 1)
  • After ~10 minutes, we’ll return to single group to summarize

Selecting the Case Study Presentation

What is R and who uses it?

R Project for Statistical Computing

  • Free and Open source
  • Data manipulation
  • Data analysis tools
  • Great graphics
  • Programming language
  • 6,000+ free, community-contributed packages
  • A supportive and increasing user community

R is a dialect of the S language developed at Bell Laboratories (formerly AT&T) by John Chambers et. al. (same group developed C and UNIX©)

What is the R environment?

  • effective data handling and storage facility
  • suite of operators for (vectorized) calculations
  • large, coherent, integrated collection of tools for data analysis
  • graphical capabilities (screen or hardcopy)
  • well-developed, simple, and effective programming language which includes:
    • conditionals
    • loops
    • user defined functions
    • input and output facilities

Custom Visualizations

source

Spatial Visualizations

source

Spatial Data Processing in R

Packages: sf, sp, maptools, rgeos, raster, terra, ggmap

R can perform many GIS functions and workflows (often better than a GIS!).

Strengths & Limitations

  • Just-in-time compilation
    • Slower than compiled languages (C+, etc.)
    • Faster to compose
  • Many available packages
  • Most operations conducted in RAM
    • RAM can be limiting and/or expensive
    • Error: cannot allocate vector of size X Mb
    • Various packages and clever programming can overcome this…
  • Free like beer AND speech!

R Interface

But there are other options…

RStudio

Mac, Windows, Linux, and over the web…

Who uses R

Popularity of Data Science Software

source

Scholarly Articles

Number of scholarly articles found in the most recent complete year (2018) for the more popular data science software. To be included, software must be used in at least 750 scholarly articles.

Interface to R

R vs Python

source

Many ‘Cheatsheets’ available

source

Why code at all?

Why write code when you can click?

Graphical User Interfaces are useful, especially when you are learning…

Typical GUI Workflow

Organized and repeatable workflow

Learning a programming language can help you learn how to think logically.

A man who does not know foreign language is ignorant of his own.
– Johann Wolfgang von Goethe (1749 - 1832)

Programming gives you access to more computer power.

The computer is incredibly fast, accurate, and stupid. Man is unbelievably slow, inaccurate, and brilliant. The marriage of the two is a force beyond calculation.
– Leo Cherne

From Graphical User Interface (GUI) to Scripting

Parallel Processing

For BIG jobs: multi-core processors / high performance computing with foreach.

Case Study 1

Resource Presentations

” width=“100%” height=“800”> source

Case Study Presentations - Let’s pick a winner!

Case Study Motivations

  1. Best way to learn something is to (prepare to) teach it
  2. Develop confidence presenting your method/solution/approach
  3. Learn tools for transparent collaboration

The first ‘Case Study’

  • Formative assessments - not tests
  • Work with group mates but submit alone
  • No stupid questions!
  • If you finish early -
    • Ask about anything (related to the course)!
    • Start thinking about project?!?

Do’s and Don’ts

  1. Don’t be scared - this is a supportive environment.
  2. Do share stories about how you figured it out:
    • other things you tried
    • things that didn’t work
  3. Do mention your group mates and any contributions they made (if they did)