This is the GitHub repository for the workshop series called First Steps in R and RStudio, given at the Children's Hospital of Philadelphia (CHOP) by Arcus Education.
Welcome to First Steps in R and RStudio!
This series is intended to be a gentle introduction to using R and RStudio for people who interact with data and want to work in the R statistical programming language. This course is geared towards beginners who are comfortable doing basic tasks with data that comes in rows and columns (for example, organizing data in Excel) but have no programming background.
The workshop will cover how to get started using the R statistical programming language in your work. We'll talk about how to import data, transform data, and create data visualizations in R. To keep this workshop series short, our scope is limited, and we won't go into details that are specific to the conduct of research, like modeling and statistical tests. For that, we are planning a future Skills Series we're going to call Next Steps in R for Research. For this workshop series, we assume you know what R and RStudio are and have some ideas about why they're useful. If you don't know what R or RStudio are, we suggest you view the slides and recordings from Demystifying R and RStudio, or attend the next time that two-session workshop series is offered.
Before attending this series, you should be able to perform most or all the following skills. If you're not sure you can, check out our Demystifying R and RStudio Skills Series. The slideshows, which have ample speaker notes, and/or the recordings of the talks (if available at the time you're viewing this document) will be sufficient to help you acquire these skills. And don't worry, we won't quiz you!
- Be able to describe the difference between R and RStudio
- Be able to give one advantage for using scripts written in R for data analysis
- Know a little about how to get access to R and RStudio at CHOP
- Describe what makes programming "literate" (like a notebook)
- Explain the real-life consequences of irreproducible research
- Name one way Quarto documents can be helpful
Before attending a workshop session, we suggest that you do the following. It will make your experience of the workshop series smoother. If you don't get a chance to do this before attending a workshop, you will have time to do it during the session, but we won't necessarily be able to stop our presentation to help you if you get stuck.
- Create a free Posit.cloud account. We will use this as our training environment and you will have continued access to your code and materials after the workshop, through your account at Posit.cloud. Don't use this for any patient or other CHOP data, though!
- If you haven't already, please consider joining CHOP's R User Group. It's not necessary for the workshops but you might find it useful or even fun.
We suggest requesting these programs be installed on your CHOP device(s):
- R -- the language we use to clean, analyze, and visualize data
- RStudio Desktop -- an IDE for writing R
- Git -- version control software that will allow you to easily get the latest version of our course materials and will also be helpful for tracking changes in your own projects
- GitHub Desktop -- a helper, or "client" software that makes working with Git easier
Even though all of these software are free, you'll need a Cost Center (or grant fund) to add to your request. Get that from your manager, administrative staff, or other leadership within your area. There will be no charge, but DTS uses this information for tracking resource utilization.
You'll also need the MAC address of the device you need the software installed to.
Having R, RStudio, Git, and GitHub installed locally on your CHOP-issued device is not the only way to work with R and RStudio, but it can be the most convenient, and will be compliant with the constraints of working with real CHOP data. You won't want to rely on RStudio on your personal computing device or on the cloud when it comes to working with real CHOP data!
On the day of your workshop
We suggest the following for virtual webinars:
- If available to you, use two monitors (or another two-screen setup such as a laptop and a tablet or two laptops). This Skills Series is hands-on, so you will want to have extra space for working on code while also looking at slides or the chat window.
Material in later sessions does build on work done in earlier sessions, but don't let missing a session keep you away from attending later sessions. We try to overlap material to help keep everyone caught up!
-
Session 1: Review and Setup
- Quick review of R and RStudio
- R Markdown and Quarto: methods for "literate statistical programming"
- Posit.cloud: our environment for this course
- Git and GitHub: Out of scope but very useful!
- Getting R and RStudio at CHOP
-
Session 2: Projects and File Ingestion
- File systems can be challenging to navigate
- Projects in RStudio
- Installing and loading packages
- Tabular data ingestion from .csv files
-
Session 3: Exploring Data Visually, Using ggplot2
- ggplot2 syntax
- Mapping Aesthetics
- Setting Visuals
- Color Palettes
-
Session 4: Selecting Data Using dplyr
- Selecting columns
- Filtering rows
- Creating new columns
-
Session 5: Putting it All Together: Communicating
All of the material in this GitHub repository is copyrighted under the Creative Commons BY-SA 4.0 copyright to make the material easy to reuse. We encourage you to reuse it and adapt it for your own teaching as you like!