Purpose

Cover image

This book contains my solutions and notes to Garrett Grolemund and Hadley Wickham’s excellent book, R for Data Science (Grolemund and Wickham 2017). R for Data Science (R4DS) is my go-to recommendation for people getting started in R programming, data science, or the “tidyverse”.

First and foremost, this book was set-up as a resource and refresher for myself1. If you are looking for a reliable solutions manual to check your answers as you work through R4DS, I would recommend using the solutions created and mantained by Jeffrey Arnold, R for Data Science: Exercise Solutions2. Though feel free to use Yet another ‘R for Data Science’ study guide as another point of reference3.

Origin

I first read and completed the exercises to R4DS in early 2017 on the tail-end of completing a Master’s in Analytics program. My second time going through R4DS came in early 2018 when myself and Stephen Kimel organized an internal “R for Data Science” study group with our colleagues4. In June of 2019 I published my solutions and notes into this book.

Organization and features

Chapters start with the following:

  • A list of “Key exercises” deemed good for discussion in a study group
  • A list of functions (and sometimes notes) from the chapter5

Chapters also contain:

  • Solutions to exercises
    • Exercise subsections are arranged in the same chapter –> section –> subsection as the original book
    • Chapters, sections, and subsections without exercises are usually not included
    • The beginning of sections may occassionally contain additional notes, e.g. 3.8: Position Adjustment
  • The “Appendix” sections in chapters typically contain alternative solutions to problems or additional notes/thoughts pertaining to the chapter or a related topic
    • I use the numbering scheme {chapter}.{section}.{subsection}.{problem number} to refer to exercise solutions in “Appendix” sections
  • There are a few cautions with using this book6

Acknowledgements

Thank you:

  • Garrett Grolemund and Hadley Wickham for writing a phenomenal book!
  • The various tidyverse and RStudio developers for producing outstanding packages, products, as well as resources for learning
  • R for Data Science Online Learning Community and #rstats communities for creating inspiring, safe places to post ideas, ask questions, and grow your R skills
  • Stephen Kimel, who has co-organized a data science study group with me and also provided feedback on my R4DS solutions. In many cases I changed my solution to an exercise to a method that mirrored his approach.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 1st ed. O’Reilly Media.


  1. And as a chance to experiment with using bookdown.

  2. Jeffrey Arnold has done an excellent job of getting concise solutions and community feedback. Learn more about his project here.

  3. I worked through the problems independently, so for open-ended questions you’ll likely see slightly different solutions from Jeffrey Arnold’s.

  4. Here is part of an internal talk I gave plugging “tidy” data science, and implicitly, our R4DS study group.

  5. When functions show up in multiple locations I typically only note them the first time they appear.

  6. Cautions with book:
    * Beyond basic formatting clean-up, I did not substantially update the solutions from my first time going through the book. Therefore, some of the solutions and syntax may be different from how I would approach a problem now (with a couple more years coding experience).
    * “Appendix” sections in particular received only cursory edits.
    * Occassionally I use slightly different (or newer), methods than are shared in the book (e.g. using mutate_at(), mutate_if(), mutate_all() and not just mutate()), this is mostly confined to “Appendix” sections.
    * Some methods in functions may be (or may become) deprecated, e.g. using fun() within mutate_at() rather than ~.
    * The chapter and exercise numbers are hard-coded, so if R4DS exercise order changes, the exercise solutions will no longer correspond perfectly with the R4DS source.
    * Formatting is not always consistent between chapters, e.g. the first 14 chapters italicize or bold questions, whereas later chapters do not.
    * Notes containing functions are usually highlighted solely with backticks, e.g. foo, though occassionally also have parentheses, e.g. foo() – there is no logic to these differences.
    * More formatting differences can be seen if inspecting the specific .Rmd files for each chapter.