# Tree Misclassification Error

years sibsp: number of siblings or spouses aboard parch: number of parents or children aboard

## Classification Error Rate Decision Tree

class="kw">library(rpart) library(rpart.plot) data(ptitanic) str(ptitanic) ## 'data.frame': 1309 obs. what is root node error of 6 variables: ## \$ pclass : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 how to calculate accuracy of a decision tree 1 1 1 1 1 1 ... ## \$ survived: Factor w/ 2 levels "died","survived": 2 2 1 1 1 2 2 1 2 1 ...

## Root Node Error Decision Tree

## \$ sex : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ... ## \$ age :Class 'labelled' atomic [1:1309] 29 0.917 2 30 25 ... ## .. ..- attr(*, "units")= chr "Year" ## .. ..- attr(*, "label")= chr "Age" ## \$ sibsp :Class 'labelled' atomic [1:1309]

## Decision Tree Classification In Data Mining Example

0 1 1 1 1 0 1 0 2 0 ... ## .. ..- attr(*, "label")= chr "Number of Siblings/Spouses Aboard" ## \$ parch :Class 'labelled' atomic [1:1309] 0 2 2 2 2 0 0 0 0 0 ... ## .. ..- attr(*, "label")= chr "Number of Parents/Children Aboard" CART Modeling Make sure all the categorical variables are converted into factors. The function rpart will run a regression tree if the response variable is numeric, and a classification tree if it is a factor. See here for a detailed introduction on tree-based modeling with rpart package. # Step1: Begin with a small cp. set.seed(123) tree <- rpart(survived ~ ., data = ptitanic, control = rpart.control(cp = 0.0001)) # Step2: Pick the tree size that minimizes misclassification rate (i.e. prediction error). # Prediction error rate in training data = Root node error * rel er

resubstitution error decision tree
root node error