by Ben | Apr 27, 2018 | R |

Working with dates and times in R can be frustrating! This isn’t R’s fault – dates and times are naturally complicated. One must consider time zones, leap years, leap seconds, Daylight Savings, hundreds of potential date and time formats, and other...
by Ben | Aug 1, 2017 | R |

In this tutorial you will learn how to define geometries (points, lines, polygons) plot those geometries execute spatial joins (which points are contained in a polygon?) get the distance between a set of points do all of the above within the context of geospatial data...
by Ben | Jul 19, 2016 | gradient-boosting, logistic-regression, Machine Learning, Python, R, random-forest |

The Problem You sell software that helps stores manage their inventory. You collect leads on thousands of potential customers, and your strategy is to cold-call them and pitch your product. You can only make 100 phone calls per day, so you want to identify leads with...
by Ben | Apr 2, 2016 | R |

This guide is to help bridge the gap between understanding what a regular expression is and understanding how to use them in R. If you’re brand new to regular expressions, I highly recommend checking out RegexOne. Hadley Wickham’s stringr package makes...
by Ben | Aug 31, 2014 | Decision Trees, Machine Learning, R |

The rpart package in R provides a powerful framework for growing classification and regression trees. To see how it works, let’s get started with a minimal example. First let’s define a problem. There’s a common scam amongst motorists where a person...
by Ben | Jul 26, 2014 | R |

Rolling joins are commonly used for analyzing data involving time. A simple example – suppose you have a table of product sales and a table of commercials. You might want to associate each product sale with the most recent commercial that aired prior to the...
by Ben | Jul 25, 2014 | R |

The data.table package in R provides fast methods for handling large tables of data with very simplistic syntax. The following is an introduction to the basic join operations available using the data.table package. Suppose you have two data.tables – a table of...
by Ben | Jul 21, 2014 | R |

A factor variable (commonly called a categorical variable outside of R) is a variable that takes on a limited set of values. For example, days of the week {Sunday, Monday, etc.} or the set of colors {Red, Blue, Green} should be a factor. By contrast, a vector of...