## Project

# mltools

## About mltools

mltools is an R package with tools for

– data cleansing

– exploratory data analysis

– evaluating machine-learning models

** mltoos** (published on CRAN in 2016) boasts a variety of useful methods that help practitioners do rapid and meaningful exploratory data analysis, particularly with high dimensional data.

For example, raw data will often have columns which are masked duplicates (e.g. *StateID* and *StateName*) and columns with hierarchical relationships (e.g. *StateID* and *CityID*) but these relationships are not always obvious and can be hard to find amongst data with hundreds of columns and values which may be encoded or anonymized. * mltools* can quickly recognize structures like this in a dataset.

Another common phenomena in data is when a small group accounts for a large portion of a dependent variable (e.g. one customer accounts for 5% of sales or one exposure accounts for 10% of insurance losses). Finding these phenomena is very important, but the procedure is often very tedious and overlooked. * mltools* helps simplify this analysis which can pay dividends during the modeling phase.

In addition to its variety of exploratory methods, * mltools* offers a number of convenient machine-learning based functions which are either unsatisfactory or missing from other R packages. These include:

– **auc_roc**: A *fast* method for calculating Area Under the ROC Curve

– **roc_scores**: A method for ranking and evaluating cross validated predictions of a ML model

– **sparisfy**: A helper method that converts a data.table into a sparse matrix

– **relative_position**: A helper method for ranking a set of values, scaled between 0 and 1

– **exponential_weights**: Generates weights based on exponential decay