This book tries to be less mathematical as possible, diving directly into practice with the R language functionality.
After explain the functionality of the R language, the author explains the main categories of machine learning algorithms as Supervised and Unsupervised learning.
- Categorization: using a Bayesian Classifier wrote using R, we will be able to categorize Spam messages from good messages
- Priority Ranking: in this chapter the author describes what features are used by google to define a priority in the received emails and provides the code for write your own ranker based on this features.
- Linear Regression: this chapter explains one of the powerful tools of Machine Learning, Linear Regression. Using Linear Regression the case of study explains how to create a system that predicts page views of the 1000 top websites. This chapter explains also different measures of error that can be performed to test how our model works.
- Nonlinear Regression and Regularization: the first part of the chapter explains Nonlinear regression that means when the prediction can't be mapped in the linear formula: Prediction= Constant + X+Y+Z. The second part of the chapter talks about Regularization that explains how to prevent the overfitting, that means our model fits so good the training data but is no able to fit the test data in the long run.This is due to the fitting of the training data that is so precise that starts to model also the noise (outliers) in the training data.
- Optimization: After we define a measure of error for the model, we can tune it, optimizing some parameter. This parameters are the result of the minimization of the error measure for example. The case of study of this chapter is to build a Code breaking system.
- PCA: This technique allows to reduce the complexity of the data reducing the dimension of the problem and is useful to extract a vectore that resumes all the data set.
- MDS: Sometimes is difficult to see the relations between different features without a graphical help. The MDS algorithm allows to create a clustering of all the data and is able to plot this clusters in a meaningful way.
- K-nn nearest neighborhood: Sometimes isn't possible to define a model general for the whole dataset so, we can make a prediction using the nearest possible choices and give a score for each one. This system is used in the case study to realize a recommandations system that suggests packages based on the installed packages.
- Social Graph: this chapter explains how social website define their connectivity model and explains how to study the relationships realizing a "Who to follow" system for twitter.
- SVM: the support vector machine is another powerful tool for machine learning and is widely used when isn't possible to define a linear decision boundary. The case of study of this chapter is a comparison of the prediction algorithms used in the previous chapters.
Even if this is not a R programming book, the language is not clearly explain in the text and a previous study of the language is probably necessary or translating the code in a more common language as Python will help understand the problems. I've also notice some incoherence between the text and the code, in fact I've faced some errors using the code in the text while the code in the CD-Rom work perfectly.
As conclusion, I will suggest this book to someone who wants to learn autonomously concepts of machine learning but, I suggest to use the online courses for study the theory behind these algorithms.