{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [cbe67701-uncertainty-quantification](https://ndcbe.github.io/cbe67701-uncertainty-quantification);\n", "content is available [on Github](https://github.com/ndcbe/cbe67701-uncertainty-quantification.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [5.1 Ridge Regression](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.01-Contributed-Example.html) | [Contents](toc.html) | [5.3 Elastic Net Regression](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.03-Contributed-Example.html)

\"Open

\"Download\"" ] }, { "cell_type": "markdown", "metadata": { "nbpages": { "level": 1, "link": "[5.2 Lasso Regression](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2-Lasso-Regression)", "section": "5.2 Lasso Regression" } }, "source": [ "# 5.2 Lasso Regression\n", "\n", "Created by Haimeng Wang (hwang22@nd.edu)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "nbpages": { "level": 1, "link": "[5.2 Lasso Regression](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2-Lasso-Regression)", "section": "5.2 Lasso Regression" } }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.linear_model import Lasso\n", "from sklearn.linear_model import LassoCV\n", "from matplotlib.ticker import MultipleLocator, FormatStrFormatter" ] }, { "cell_type": "markdown", "metadata": { "nbpages": { "level": 1, "link": "[5.2 Lasso Regression](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2-Lasso-Regression)", "section": "5.2 Lasso Regression" } }, "source": [ "This example was adapted from:\n", "\n", "McClarren, Ryan G (2018). Uncertainty Quantification and Predictive Computational Science: A Foundation for Physical Scientists and Engineers, Chapter 4: Local Sensitivity Analysis Based on Derivative Approximations, Springer, https://doi.org/10.1007/978-3-319-99525-0_4" ] }, { "cell_type": "markdown", "metadata": { "nbpages": { "level": 2, "link": "[5.2.1 Lasso Regression Basics](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2.1-Lasso-Regression-Basics)", "section": "5.2.1 Lasso Regression Basics" } }, "source": [ "## 5.2.1 Lasso Regression Basics\n", "\n", "Lasso stands for the Least Absolute Shrinkage and Selection Operator. Unlike the ridge regression, the lasso regression make the penealty to be *1-norm* ($L_1$) of the coefficient\n", "\n", "$$\\hat{\\beta}_{lasso} = \\min_{\\beta} \\sum^{I}_{i=1}(y_i-\\boldsymbol{\\beta} \\cdot \\boldsymbol{X}_i)^2 + \\lambda \\Vert \\boldsymbol{\\beta} \\Vert_1 $$\n", "\n", "Using the $L_1$ penalty tends to make some of the coefficients to be zero, which is also called a sparse model. In a sparse model, many of the coefficients are close to zero, and there are a few large non-zero coeffcients. In other words, this model does not include variables that are not important (variables with zero coeffcients)." ] }, { "cell_type": "markdown", "metadata": { "nbpages": { "level": 2, "link": "[5.2.2 An example of the lasso regression ](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2.2-An-example-of-the-lasso-regression)", "section": "5.2.2 An example of the lasso regression " } }, "source": [ "## 5.2.2 An example of the lasso regression ##\n", "This is an example adapted from the example in Section 5.3 of the book. Let us assume that we have a simulation that has 200 input variables and only 120 simulations can be afforded. The *sklearn.linbear_model* is used to do the fit.\n", "\n", "Firstly, we used a model with 5 large non-zero ceofficients (between 5~30) and the rest coefficients are 0.1" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "nbpages": { "level": 2, "link": "[5.2.2 An example of the lasso regression ](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2.2-An-example-of-the-lasso-regression)", "section": "5.2.2 An example of the lasso regression " }, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Residue of the fiting\n", "0.9999789441481239\n", "Weight\n", "0.01873762187420794\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "

" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Generate some sparse data\n", "np.random.seed(711)\n", "\n", "# number of samples is 120 and the dimension of varaibles os 200\n", "n_samples, n_features = 100, 200\n", "X = np.random.randn(n_samples, n_features)\n", "\n", "# generate coefficients\n", "LargeCoef = 5\n", "SenCoef = np.zeros(n_features)\n", "for i in range(LargeCoef):\n", " SenCoef[i] = 5 + 20*np.random.rand()\n", "for i in range(LargeCoef, n_features):\n", " SenCoef[i] = 0.1*np.random.rand()\n", " \n", "# generate y\n", "y = np.dot(X, SenCoef) + np.random.normal(loc=0, scale=0.001, size=n_samples)\n", "\n", "# cross-validation and fit\n", "reg = LassoCV(cv=50).fit(X, y)\n", "print(\"Residue of the fiting\")\n", "print(reg.score(X, y))\n", "print(\"Weight\")\n", "print(reg.alpha_)\n", "\n", "# plot the results\n", "fig, ax = plt.subplots(figsize=(12, 9))\n", "fig.subplots_adjust(bottom=0.15, left=0.2)\n", "\n", "ax.scatter(np.where(reg.coef_)[0], reg.coef_[reg.coef_ != 0], label='Lasso coefficients', linewidth=4)\n", "ax.scatter(np.where(SenCoef)[0], SenCoef[SenCoef != 0], label='Actual coefficients', linewidth=4)\n", "\n", "# ax.set_xlim([0,n_features])\n", "# ax.set_ylim([0, 0])\n", "ax.set_xlabel('variables', fontsize=25)\n", "ax.set_ylabel('coefficients', fontsize=25)\n", "ax.legend(fontsize=24,loc='upper right',frameon=False)\n", "\n", "ax.tick_params(which = 'both',direction='in',colors='black',\n", " bottom = True,top=True,left=True, right=True,pad=15)\n", "ax.tick_params(which = 'major',direction='in',length=15,labelsize=20,width=1.5)\n", "ax.tick_params(which = 'minor',direction='in',length=6,width = 1.5)\n", "\n", "for axis in ['top','bottom','left','right']:\n", " ax.spines[axis].set_linewidth(2.0)\n", " \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "nbpages": { "level": 2, "link": "[5.2.2 An example of the lasso regression ](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2.2-An-example-of-the-lasso-regression)", "section": "5.2.2 An example of the lasso regression " } }, "source": [ "Then, we used a model with 20 large non-zero ceofficients (between 5~30) and the rest coefficients are 0.1" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "nbpages": { "level": 2, "link": "[5.2.2 An example of the lasso regression ](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2.2-An-example-of-the-lasso-regression)", "section": "5.2.2 An example of the lasso regression " } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Residue of the fiting\n", "0.9999855603124613\n", "Weight\n", "0.033012951782999095\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Generate some sparse data\n", "np.random.seed(6250)\n", "\n", "# number of samples is 120 and the dimension of varaibles os 200\n", "n_samples, n_features = 100, 200\n", "X = np.random.randn(n_samples, n_features)\n", "\n", "# generate coefficients\n", "LargeCoef = 20\n", "SenCoef = np.zeros(n_features)\n", "for i in range(LargeCoef):\n", " SenCoef[i] = 5 + 20*np.random.rand()\n", "for i in range(LargeCoef, n_features):\n", " SenCoef[i] = 0.1*np.random.rand()\n", "\n", "# generate y\n", "y = np.dot(X, SenCoef) + np.random.normal(loc=0, scale=0.001, size=n_samples)\n", "\n", "# cross-validation and fit\n", "reg = LassoCV(cv=50).fit(X, y)\n", "print(\"Residue of the fiting\")\n", "print(reg.score(X, y))\n", "print(\"Weight\")\n", "print(reg.alpha_)\n", "\n", "# plot the results\n", "fig, ax = plt.subplots(figsize=(12, 9))\n", "fig.subplots_adjust(bottom=0.15, left=0.2)\n", "\n", "ax.scatter(np.where(reg.coef_)[0], reg.coef_[reg.coef_ != 0], label='Lasso coefficients', linewidth=4)\n", "ax.scatter(np.where(SenCoef)[0], SenCoef[SenCoef != 0], label='Actual coefficients', linewidth=4)\n", "\n", "# ax.set_xlim([0,n_features])\n", "# ax.set_ylim([0, 0])\n", "ax.set_xlabel('variables', fontsize=25)\n", "ax.set_ylabel('coefficients', fontsize=25)\n", "ax.legend(fontsize=24,loc='upper right',frameon=False)\n", "\n", "ax.tick_params(which = 'both',direction='in',colors='black',\n", " bottom = True,top=True,left=True, right=True,pad=15)\n", "ax.tick_params(which = 'major',direction='in',length=15,labelsize=20,width=1.5)\n", "ax.tick_params(which = 'minor',direction='in',length=6,width = 1.5)\n", "\n", "for axis in ['top','bottom','left','right']:\n", " ax.spines[axis].set_linewidth(2.0)\n", " \n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbpages": { "level": 2, "link": "[5.2.2 An example of the lasso regression ](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.02-Contributed-Example.html#5.2.2-An-example-of-the-lasso-regression)", "section": "5.2.2 An example of the lasso regression " } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [5.1 Ridge Regression](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.01-Contributed-Example.html) | [Contents](toc.html) | [5.3 Elastic Net Regression](https://ndcbe.github.io/cbe67701-uncertainty-quantification/05.03-Contributed-Example.html)

\"Open

\"Download\"" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }