David Smith

Statistics has many canonical data sets. For classification statistics, we have the Fisher's iris data. For Big Data statistics, the canonical data set used in many examples is the Airlines data. And for dotplots, we have the barley data, first popularized by Bill Cleveland in the landmark 1993 text Visualizing Data. Cleveland's innovations in data visualiation were hugely influential in the S language and (later) R's lattice and ggplot2 packages, and the panel chart of the barley data shown below is one of the best known.  The chart above shows the yields for several different varieties of barley (Trebi, Glabron and so on) planted at each of six different sites in Minnesota (Duluth, Grand Rapids, etc.) in the years 1931 (pink) and 1932 (blue). The reason this data set has become legendary appears in the "Morris" panel, where unlike all other sites the yields in 193... (more)

July 22: Applications in R Webinar

Just a quick heads-up that I'll be presenting with Neera Talbert (VP Professional Services, Revolution Analytics) in a free webinar on Tuesday, July 22 on Applications in R: Success and Lessons Learned from the Marketplace. I'll describe several R applications from well-known companies (some of which can be seen in the presentation I gave at the China R User Conference), and Neera will present a few case studies of how the Revolution Analytics consulting group has helped companies using R in areas such supply chain analytics, sensor data analysis, and R package validation and c... (more)

In case you missed it: June 2014 Roundup

In case you missed them, here are some articles from June of particular interest to R users:   The useR! 2014 conference in Los Angeles opened with 16 tutorials.  DataInformed published an article by David Smith on how various companies use R. Joe Rickert reviews the new book "Applied Predictive Modeling" by Max Kuhn and Kjell Johnson, which is rich with examples in R and the "caret" package. Hadley Wickham's new ggvis package features a new syntax to create interactive ggplot2-style graphics. Guest poster Wayne Smith reviews the R and Statistics presentations at the Intel Intern... (more)

Diving into H2O

by Joseph Rickert One of the remarkable features of the R language is its adaptability. Motivated by R’s popularity and helped by R’s expressive power and transparency developers working on other platforms display what looks like inexhaustible creativity in providing seamless interfaces to software that complements R’s strengths. The H2O R package that connects to 0xdata’s H2O software (Apache 2.0 License) is an example of this kind of creativity. According to the 0xdata website, H2O is “The Open Source In-Memory, Prediction Engine for Big Data Science”. Indeed, H2O offers an i... (more)

Dependencies of popular R packages

With the growing popularity of R, there is an associated increase in the popularity of online forums to ask questions. One of the most popular sites is StackOverflow, where more than 60 thousand questions have been asked and tagged to be related to R. On the same page, you can also find related tags. Among the top 15 tags associated with R, several are also packages you can find on CRAN: ggplot2 data.table plyr knitr shiny xts lattice It very easy to install these packages directly from CRAN using the R function install.packages(), but this will also install all these package d... (more)