{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SECOND ASSIGNMENT. MACHINE LEARNING WITH SCIKIT-LEARN. \n", "## PART I (1.5 POINTS)\n", "\n", "The aim of part I of the Scikit-learn assignment is for you to self-learn and get used to this Machine Learning tool. The main part (part II) of the assignment will be explained next week (11/12). \n", "\n", "Here, you will learn to:\n", "\n", "- Perform a crossvalidation on the iris classification problem with decision trees (so far, we have only done regression)\n", "- Perform a crossvalidation on the iris classification problem **with KNN** (I haven't explained this, you will have to learn how to use it from the web)\n", "- Perform grid search in order to determine the best value for hyper-parameter K\n", "\n", "You will also have to go through two notebooks I have prepared for you in order to see how crossvalidation and hyper-parameter tuning are used in Scikit-learn\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 0. Carry out the \"DECISION TREES WITH A TRAINING AND A TESTING SET AND CROSSVALIDATION\" notebook and understand the main ideas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Perform a crossvalidation on the iris classification problem with decision trees:\n", "\n", "** It is important to remember that for classification, you have to use**\n", "- clf = tree.DecisionTreeClassifier() # for constructing the classifier\n", "- metrics.accuracy # for computing error" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Write code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Perform a crossvalidation on the iris classification problem with KNN\n", "\n", "I haven't explained how to use KNN in Scikit-learn. You will have to read and obtain the relevant information [here](http://scikit-learn.org/stable/modules/neighbors.html)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Write code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Try different values for K (KNN) - change them by hand- and see if you obtain a better result than with KNN default value. Always use crossvalidation." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Write code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Carry out THE \"DECISION TREE HYPER-PARAMETERS. TUNING DECISION TREES\" notebook and understand the main ideas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. USE GRID SEARCH AND RANDOMIZED SEARCH TO FIND THE OPTIMAL VALUE FOR K" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Write code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6. OPTIONAL (you may get 0.25 extra points if you decide to do this). \n", "\n", "K is the main hyper-parameter of KNN. Find another hyper-parameter that you consider relevant, and try to optimize both K and the other parameter using grid-search. Are you able to improve on previous results?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Write code here (optional)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }