{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DECISION TREE HYPER-PARAMETERS. TUNING DECISION TREES" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- ** max_depth : int or None, optional (default=None)**\n", " The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. Ignored if max_leaf_nodes is not None.\n", " \n", "- **min_samples_split : int, optional (default=2)**\n", " The minimum number of samples required to split an internal node.\n", "\n", "- There are more hyper-parameters: \n", " - help(\"sklearn.tree.DecisionTreeRegressor\")\n", " - help(\"sklearn.tree.DecisionTreeClassifier\")\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from sklearn.datasets import load_boston\n", "from sklearn import tree\n", "from scipy.stats import sem\n", "from sklearn.cross_validation import cross_val_score, KFold\n", "\n", "boston = load_boston()\n", "X = boston.data\n", "y = boston.target\n", "\n", "#np.random.seed(0)\n", "cv = KFold(X.shape[0], 10, shuffle=True, random_state=0)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's see what happens if we change max_depth parameter **" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "#for max_depth in [2,4,6,8,10,12,14,16]:\n", "np.random.seed(0)\n", "mds = range(2,16,2)\n", "results = []\n", "for max_depth in mds:\n", " clf = tree.DecisionTreeRegressor(max_depth=max_depth)\n", " scores = -cross_val_score(clf, \n", " X, y, \n", " scoring='mean_squared_error', \n", " cv = cv)\n", " \n", " results.append(scores.mean())\n", " print (\"Max_depth={0:d} :Mean score: {1:.3f} (+/-{2:.3f})\").format(max_depth, scores.mean(), sem(scores))\n", "\n", "plt.plot(np.array(mds,dtype=float), results)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** We can see that the minimum value is obtained at max_depth = 10, so we should set the hyper-parameter to this value. However, it is important to see that if the random seed is changed from 0 to other values (try it by changing x in np.seed(x)), slightly different plots (and minima) are obtained, because the algorithm that builds decision trees is stochastic. **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Let's see now what happens if we change the other hyperparameter: min_samples_split hyper-parameter **" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "np.random.seed(0)\n", "mds = range(2,16,2)\n", "results = []\n", "for min_samples_split in mds:\n", " clf = tree.DecisionTreeRegressor(min_samples_split=min_samples_split)\n", " scores = -cross_val_score(clf, \n", " X, y, \n", " scoring='mean_squared_error', \n", " cv = cv)\n", " \n", " results.append(scores.mean())\n", " print (\"min_samples_split={0:d} :Mean score: {1:.3f} (+/-{2:.3f})\").format(min_samples_split, scores.mean(), sem(scores))\n", "\n", "plt.plot(np.array(mds,dtype=float), results)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The minimum for min_samples_split is obtained at 12, but this could change slightly if the random seed is altered, because decision tree construction is an stochastic process.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# GRID SEARCH\n", "**What if we want to find the best combination of hyper-parameters? (and not individual parameters as we did above). The process that performs a crossvalidation for all possible combinations of two (or more) hyper-parameters is called *grid-search* **\n", "\n", "Note: in priciple, n_jobs can be used to run the process in parallel. , "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }