Pandas Cheatsheet helps memory retention

A cheatsheet is like a reference page, with the difference that it provides a quick lookup of the most common features. I really like this pandas cheatsheet for that reason. When I haven't done data-wrangling or cleaning in a while, it helps me to get a quick answer. One other…

Hypothesis Generation vs. Hypothesis Confirmation

I'm just starting to get back to data science and glad to have found Hadley Wickham's and Garrett Grolemund's book, R for Data Science. It looks to be a good refresher for me. In reading the Introduction, I'm immediately struck with how helpful this book is going to be. In…

Biodiversity and Data Science

From a quick review of measurements of biodiversity that I found on Wikipedia, there appear to be at least a couple of mathematical formulas for calculating it. Yet I don't think of biodiversity in such an abstract way. To me the term relates more to how natural our earth environment…

TIL: recodes and imputations

When I get a data set to explore, the first thing I think about is the cleanliness of the data. If it's survey data, there may be inconsistencies in the collection method, especially if the data spans years of surveys. From reading the freely-downloadable textbook, Think Stats, I discovered the…

Posting Style

I've been getting back into exercising my data science muscles, and as I do continue, I'd like to put together some interesting blog posts. But I don't want to wait until I've got a full data story together. So I'm going to start blogging shorter learnings in addition to longer…