Advanced Topics in Stochastic Programming#
Prepared by: Jiabao Nie (jnie2@nd.edu) and Jacob Mantooth (jmantoot@nd.edu)
In this project, we extend the farmer’s example, which is a classic optimization problem that highlights the need of resource allocation in agricultural decision-making. Traditionally, this problem assumes deterministic crop yields, enabling straightforward mathematical modeling and solution using conventional methods. However, real-world agricultural systems are inherently uncertain, influenced by variables such as weather, soil quality, and environmental conditions. To address these complexities, we employ advanced optimization techniques that incorporate stochasticity, providing a more realistic framework for decision-making in agriculture.
Learning Objectives
By the end of this project, students will be able to:
Formulate and solve optimization problems using real-world examples that incorporate uncertainty and stochasticity.
Develop proficiency in Python and Pyomo for modeling and solving complex optimization problems.
Understand and implement full-grid and half-grid discretization in stochastic optimization frameworks, including their respective advantages and applications.
Gain a deeper understanding of two-stage stochastic optimization, focusing on its application to agricultural systems under uncertainty.
Interpret and analyze optimization results, drawing meaningful conclusions about resource allocation and decision-making in uncertain environments.
#Imports
# Imports
import sys
if "google.colab" in sys.modules:
!wget "https://raw.githubusercontent.com/ndcbe/optimization/main/notebooks/helper.py"
import helper
helper.easy_install()
else:
sys.path.insert(0, '../')
import helper
helper.set_plotting_style()
import pyomo.environ as pyo
from pyomo.environ import (
ConcreteModel, Var, NonNegativeReals, Objective, minimize, ConstraintList,
summation, SolverFactory, value,Param,Reals,Constraint
)
from pyomo.core.base.numvalue import value
from pyomo.common.collections import ComponentMap
import numpy as np
from itertools import product, combinations, combinations_with_replacement
import pandas as pd
import matplotlib.pyplot as plt
import math
--2024-12-16 02:10:35-- https://raw.githubusercontent.com/ndcbe/optimization/main/notebooks/helper.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6493 (6.3K) [text/plain]
Saving to: ‘helper.py.12’
helper.py.12 100%[===================>] 6.34K --.-KB/s in 0s
2024-12-16 02:10:36 (63.6 MB/s) - ‘helper.py.12’ saved [6493/6493]
idaes was found! No need to install.
Expanding the Farming Example#
The farmer problem is a classic optimization challenge that highlights the complexities of resource allocation in agricultural decision-making. Traditionally, this problem assumes deterministic crop yields, allowing for straightforward mathematical modeling and solution using conventional methods. However, real-world agricultural systems are inherently subject to uncertainty, influenced by different factors. These uncertainties necessitate a more sophisticated approach to accurately capture and address the variability in crop yields.
In this project, we extend the traditional farmer problem by incorporating a stochastic framework where crop yields are modeled as random variables following a normal distribution. This extension introduces the need for efficient evaluation of high-dimensional integrals, which we address using advanced numerical techniques such as sparse grids and Gaussian quadrature. These methods allow for greater computational efficiency and a different methods for solving this problem.
Additionally, we explore alternative modeling assumptions, such as treating crop yields as independent random variables. This perspective allows for a more comprehensive representation of the problem as a continuous random vector, enabling the development of a two-stage stochastic programming model. By incorporating these enhancements, our work aims to bridge the gap between theoretical optimization frameworks and the complex, dynamic nature of agricultural decision-making.
Our webpage is broken down as teh following
Background: An overview of the farmer problem and its relevance
Classic Example: A detailed explanation of the classic farmer problem, including its mathematical formulation and solution using traditional methods.
Proposed Expansion: We propose extending the problem to a stochastic framework, modeling crop yields as random variables following a normal distribution to capture inherent variability arising from outside factors. This approach requires the efficient evaluation of multi-dimensional integrals, for which we will explore advanced numerical techniques, including sparse grids and Gaussian quadrature, to ensure computational efficiency. Furthermore, we plan to investigate alternative modeling assumptions, such as treating the yields of different crops as independent random variables. Under this assumption, yields can be represented as components of a continuous random vector, facilitating the development of a two-stage stochastic programming model.
Background#
The classic farmer problem is an good example in applied mathematics and numerical analysis, showcasing the interplay between optimization, resource management, and decision making under constraints. At its core, the problem involves determining how to allocate limited resources to maximize outcomes like crop yield or profit, while meeting various constraints imposed by environmental, economic, and operational factors. These constraints often reflect real world complexities, such as limited water supply, variable soil conditions, or fluctuating market demands.
While in its traditional deterministic form, the farmer problem has far-reaching implications, serving as a foundational model for more sophisticated studies in agricultural planning, resource optimization, and risk management. Its theoretical versatility has made it a powerful educational tool for introducing key concepts in optimization, such as linear programming, and numerical integration.
However, the deterministic assumptions underlying the classic problem fall short of capturing the unpredictable realities faced by farmers. In practice, agricultural systems are subject to inherent uncertainties driven by dynamic and often uncontrollable factors. Crop yields, for example, are profoundly influenced by weather variability, soil health, pest outbreaks, and market volatility. These unpredictable elements create significant challenges, as farmers must navigate decision-making in environments characterized by incomplete or stochastic information.
The growing need to address these uncertainties has propelled the development of stochastic extensions to the farmer problem. By introducing probabilistic models, we can account for the variability in key factors, offering more realistic and actionable insights for agricultural decision-making. Such extensions not only enhance the practical relevance of the model but also provide a platform for exploring advanced computational techniques, such as high-dimensional integration methods and stochastic programming.
In this context, the farmer problem becomes a dynamic framework for bridging theoretical optimization with real-world agricultural challenges, paving the way for robust decision-making tools in the face of uncertainty.
Classic Example#
Recall key concepts from class here and refer to the lecture handout This section revisits and breaks down previous work from the notebook originally created by Jialu Wang and revised by Maddie Watson at the University of Notre Dame.
The problem involves a European farmer managing a 500-acre farm, specializing in cultivating wheat, corn, and sugar beets. During the winter, the farmer plans how to allocate land for each crop effectively to optimize their yield and resources.
The farmer knows that at least 200 tons (T) of wheat and 240 T of corn are required for cattle feed. These requirements can be met either through on-farm production or by purchasing from a wholesaler. Any surplus production beyond the feeding requirements can be sold. Over the past decade, the average selling prices have been 170 and 150 per ton for wheat and corn, respectively. However, purchasing these crops from a wholesaler incurs a 40% price premium due to transportation and margin costs.
In addition to wheat and corn, sugar beet is another profitable crop. The farmer expects to sell sugar beets at 36 per ton. However, the European Commission imposes a production quota on sugar beets. Any yield exceeding the 6,000-ton quota can only be sold at a reduced price of 10 per ton.
Based on historical data, the farmer estimates average yields of 2.5 T per acre for wheat, 3 T per acre for corn, and 20 T per acre for sugar beets.
The following code was used from Advanced Topics in Stochastic Programming.
def build_model(yields):
'''
Code adapted from https://mpi-sppy.readthedocs.io/en/latest/examples.html#examples
Arguments:
yields: Yield information as a list, following the rank [wheat, corn, beets]
Return:
model: farmer problem model
'''
model = ConcreteModel()
# Define sets
all_crops = ["WHEAT", "CORN", "BEETS"]
purchase_crops = ["WHEAT", "CORN"]
sell_crops = ["WHEAT", "CORN", "BEETS_FAVORABLE", "BEETS_UNFAVORABLE"]
# Crops field allocation
model.X = Var(all_crops, within=NonNegativeReals)
# How many tons of crops to purchase
model.Y = Var(purchase_crops, within=NonNegativeReals)
# How many tons of crops to sell
model.W = Var(sell_crops,within=NonNegativeReals)
# Objective function
model.PLANTING_COST = 150 * model.X["WHEAT"] + 230 * model.X["CORN"] + 260 * model.X["BEETS"]
model.PURCHASE_COST = 238 * model.Y["WHEAT"] + 210 * model.Y["CORN"]
model.SALES_REVENUE = (
170 * model.W["WHEAT"] + 150 * model.W["CORN"]
+ 36 * model.W["BEETS_FAVORABLE"] + 10 * model.W["BEETS_UNFAVORABLE"]
)
# Maximize the Obj is to minimize the negative of the Obj
model.OBJ = Objective(
expr=model.PLANTING_COST + model.PURCHASE_COST - model.SALES_REVENUE,
sense=minimize
)
# Constraints
model.CONSTR= ConstraintList()
model.CONSTR.add(summation(model.X) <= 500)
model.CONSTR.add(
yields[0] * model.X["WHEAT"] + model.Y["WHEAT"] - model.W["WHEAT"] >= 200
)
model.CONSTR.add(
yields[1] * model.X["CORN"] + model.Y["CORN"] - model.W["CORN"] >= 240
)
model.CONSTR.add(
yields[2] * model.X["BEETS"] - model.W["BEETS_FAVORABLE"] - model.W["BEETS_UNFAVORABLE"] >= 0
)
model.W["BEETS_FAVORABLE"].setub(6000)
return model
#Solve the Optimimization Problem with Perfect yields
yields_perfect = [2.5, 3, 20]
model = build_model(yields_perfect)
solver = SolverFactory("ipopt")
solver.solve(model)
#Define a function for printing the optimal solution
def print_opt_sol(model):
'''
Arguments:
model: solved farmer problem model
Return:
Prints the optimal solution
'''
print("===Optimal solutions based on perfect information===")
print('Culture. | ', 'Wheat |', 'Corn |', 'Sugar Beets |')
print('Surface (acres) | ', f'{value(model.X["WHEAT"]):.1f}', '|',
f'{value(model.X["CORN"]):.1f}', ' |',
f'{value(model.X["BEETS"]):.1f}',' |')
print('Yield (T) | ', f'{value(model.X["WHEAT"])*yields_perfect[0]:.1f}', '|',
f'{value(model.X["CORN"])*yields_perfect[1]:.1f}', '|',
f'{value(model.X["BEETS"])*yields_perfect[2]:.1f}','|')
print('Sales (T) | ', f'{value(model.W["WHEAT"]):.1f}', '|',
f'{value(model.W["CORN"]):.1f}', ' |',
f'{value(model.W["BEETS_FAVORABLE"]) + value(model.W["BEETS_UNFAVORABLE"]):.1f}','|')
print('Purchases (T) | ', f'{value(model.Y["WHEAT"]):.1f}', ' |',
f'{value(model.Y["CORN"]):.1f}', ' |',
'-',' |')
profit = -value(model.OBJ)
print('Overall profit: $',f"{profit:.1f}")
return profit
profit_perfect = print_opt_sol(model)
===Optimal solutions based on perfect information===
Culture. | Wheat | Corn | Sugar Beets |
Surface (acres) | 120.0 | 80.0 | 300.0 |
Yield (T) | 300.0 | 240.0 | 6000.0 |
Sales (T) | 100.0 | 0.0 | 6000.0 |
Purchases (T) | 0.0 | 0.0 | - |
Overall profit: $ 118600.0
The solution is straightforward and can be summarized as follows:
Prioritize Sugar Beets: Allocate enough land to sugar beet cultivation to produce exactly 6,000 tons, meeting the quota imposed by the European Commission. This ensures the farmer maximizes profits from sugar beet sales at the higher price.
Meet Feeding Requirements: Dedicate sufficient land to wheat and corn to fulfill the minimum feeding requirements of 200 tons and 240 tons, respectively. This ensures the cattle feed needs are covered without incurring the added expense of purchasing from a wholesaler.
Optimize Remaining Land: Use any remaining acreage for wheat production. This is the most profitable option for utilizing the leftover land, as surplus wheat can be sold at a competitive price. By following this strategy, the farmer maximizes the efficiency of land use while adhering to the constraints and optimizing profits.
Proposed Expansion#
We aim to enhance this project by addressing the inherent uncertainty in crop yields more comprehensively, empowering farmers to make more robust and realistic decisions. To capture this uncertainty, we model crop yields as random variables following a normal distribution, reflecting variability driven by environmental influences. This stochastic representation introduces the need to evaluate multi-dimensional integrals, as yield variability impacts optimization decisions across multiple crops and constraints.
To efficiently compute these integrals, we utilize two advanced numerical techniques: full grids and sparse grids, both of which leverage Gaussian quadrature to achieve high-accuracy integration. Expanding the project would allow us to explore different modeling approaches, further optimize computational efficiency, and develop scalable solutions for a wider range of agricultural applications under uncertainty.
An alternate approach involves assuming that crop yields across different crops are independent, allowing us to represent them as components of a continuous random vector. Under this formulation, land allocation decisions are treated as first-stage decisions—determined before yields are known. The optimization problem can then be structured using a two-stage stochastic programming framework, with the second stage capturing adjustments based on observed outcomes. Mathematically, the objective is expressed as: $\(\begin{equation} \min_x \quad c^T x + E_{\xi}Q(x,\xi), \end{equation}\)\( where \)c\( is the cost of initial planting, and \)\xi = (\xi_1,\xi_2,\xi_3)$ is the scenario.
To solve this stochastic optimization problem efficiently, we employ Sample Average Approximation (SAA), a widely used statistical method in stochastic programming. SAA replaces the expected value in the stochastic objective function,\( E_{\xi}Q(x,\xi),\) with a sample average computed from a finite set of scenarios. This transforms the stochastic problem into a deterministic one, which is more tractable.
Looking ahead, we plan to extend the project by assuming that it is a constraint convex problem, which is related to the second part of this class
Numerical Back Ground#
Full grid
Full grid methods are a classical approach to numerical integration, where the computational domain is discretized into a Cartesian product of points. For each dimension, a predetermined set of quadrature points is used, with Gaussian quadrature being a particularly efficient choice. Gaussian quadrature selects points and weights to maximize the accuracy of integration for polynomial functions up to a certain degree, reducing the error significantly compared to simple methods like midpoint or trapezoidal rules. For one-dimensional integration, Gaussian quadrature efficiently computes: $\( \int^b_a f(x)dx ≈ \sum^n_{i=1} w_i f(x_i) \)$
where \(x_i\) are the quadrature points and \(w_i\) are the corresponding weights. When extended to d-dimensional problems, the full grid method involves the Cartesian product of these quadrature points across all dimensions. While highly accurate, the number of points required grows exponentially with the number of dimensions, leading to what is known as the curse of dimensionality.Specifically, the total number of points for d dimensions and n quadrature points per dimension is \(n^d\) , making this approach computationally infeasible for high-dimensional problems.
Sparse Grid
Sparse grids address the limitations of full grids by significantly reducing the number of points while maintaining comparable accuracy. The key idea is to focus on the most “informative” points in the integration domain, based on the hierarchical structure of the quadrature. Sparse grids achieve this by combining lower-dimensional full grids in a systematic way, effectively excluding redundant points.
The foundation of sparse grids lies in the Smolyak algorithm, which constructs a sparse grid approximation for a d-dimensional integral as: $\( \int_{[a,b]^d} f(x)dx ≈ \sum_{|i|<m} c_i \prod^d_{k=1} \int^{b_k}_{a_k} f(x_k)dx_k \)\( where \)c_i$ are coefficients determined by the hierarchical structure. By selectively including quadrature points based on their contribution to the approximation accuracy, sparse grids avoid the exponential growth in points observed with full grids.One of the primary advantages of sparse grids is their efficiency in handling high-dimensional integrals. However, sparse grids may exhibit slightly higher errors for functions with highly localized or non-smooth behavior due to the reduced number of points.
Full Grid#
Unlike the classical deterministic farmer problem, where crop yields are assumed to be known with certainty and we have perfect infomation, real-world scenarios often involve significant uncertainty. In this model, we account for this uncertainty by assuming that the yields of crops (WHEAT, CORN, and BEETS) follow a normal distribution characterized by a mean and standard deviation. By incorporating stochastic yields, we can create a more robust and realistic decision-making framework.
To solve this problem, we employ a Full Grid Method using Gaussian quadrature to approximate the random variables. Gaussian quadrature allows us to efficiently represent the uncertainty in yields by selecting representative points (quadrature points) and associated weights, which capture the variability and likelihood of different yield outcomes. These points are combined to create scenarios that represent all possible combinations of crop yields, and scenario weights are calculated to reflect their probabilities.
This approach ensures that the model captures the complexity and uncertainty of agricultural planning while leveraging stochastic programming techniques to optimize planting decisions, purchasing strategies, and sales, given the variability in crop yields.
The choice to model crop yields with a normal distribution is grounded in practical and theoretical considerations. Crop yields are influenced by a multitude of factors, such as weather, soil quality, and pest outbreaks. Using a normal distribution to represent these uncertainties provides several advantages:
Central Limit Theorem (CLT): While the Central Limit Theorem (CLT) suggests that the sum of many independent random variables tends to approximate a normal distribution, real-world factors influencing crop yield—such as weather patterns and soil conditions—are often interdependent. Despite this, the normal distribution can still serve as a reasonable approximation for yield due to the aggregate effect of numerous contributing variables.
Simplicity and Analytical Convenience: The normal distribution is well understood and widely used in stochastic modeling
Tools like Gaussian quadrature can efficiently approximate expected values under normality, as was applied here using Hermite polynomials to generate points and weights.
Practicality: Many agricultural studies and datasets report crop yields in terms of means and standard deviations, aligning naturally with normal distribution assumptions.
def build_model():
'''
Parameters:
-none
Return:
-farmer problem model
'''
model = ConcreteModel()
# Define sets
all_crops = ["WHEAT", "CORN", "BEETS"]
purchase_crops = ["WHEAT", "CORN"]
sell_crops = ["WHEAT", "CORN", "BEETS_FAVORABLE", "BEETS_UNFAVORABLE"]
# Define random variables (mean and standard deviation for yields)
yield_means = {"WHEAT": 3.0, "CORN": 3.6, "BEETS": 20.0} # Mean yields
yield_stds = {"WHEAT": 0.5, "CORN": 0.4, "BEETS": 1.0} # Standard deviations
# Use Gaussian quadrature points and weights
gauss_points, gauss_weights = {}, {}
num_points = 7 # Number of points for quadrature
for crop in all_crops:
points, weights = np.polynomial.hermite.hermgauss(num_points)
gauss_points[crop] = points * np.sqrt(2) * yield_stds[crop] + yield_means[crop]
gauss_weights[crop] = weights / np.sqrt(np.pi)
# Generate scenarios (Full Grid using product of points)
scenarios = list(product(*[gauss_points[crop] for crop in all_crops]))
scenario_weights = list(product(*[gauss_weights[crop] for crop in all_crops]))
scenario_weights = [np.prod(weights) for weights in scenario_weights]
# Define variables
model.X = Var(all_crops, within=NonNegativeReals) # Crops field allocation
model.Y = Var(purchase_crops, within=NonNegativeReals) # Tons of crops to purchase
model.W = Var(sell_crops, within=NonNegativeReals) # Tons of crops to sell
# Placeholder for scenario-based costs and revenues
model.ScenarioCosts = ComponentMap()
model.ScenarioRevenues = ComponentMap()
# Constraints for each scenario
model.CONSTR = ConstraintList()
for idx, scenario in enumerate(scenarios):
scenario_yields = {crop: scenario[i] for i, crop in enumerate(all_crops)}
scenario_weight = scenario_weights[idx]
# Calculate costs and revenues for this scenario
planting_cost = (
150 * model.X["WHEAT"] + 230 * model.X["CORN"] + 260 * model.X["BEETS"]
)
purchase_cost = 238 * model.Y["WHEAT"] + 210 * model.Y["CORN"]
sales_revenue = (
170 * model.W["WHEAT"] + 150 * model.W["CORN"]
+ 36 * model.W["BEETS_FAVORABLE"] + 10 * model.W["BEETS_UNFAVORABLE"]
)
# Store scenario contributions
model.ScenarioCosts[idx] = scenario_weight * (planting_cost + purchase_cost)
model.ScenarioRevenues[idx] = scenario_weight * sales_revenue
# Add constraints for this scenario
model.CONSTR.add(summation(model.X) <= 500)
model.CONSTR.add(
scenario_yields["WHEAT"] * model.X["WHEAT"] + model.Y["WHEAT"] - model.W["WHEAT"] >= 200
)
model.CONSTR.add(
scenario_yields["CORN"] * model.X["CORN"] + model.Y["CORN"] - model.W["CORN"] >= 240
)
model.CONSTR.add(
scenario_yields["BEETS"] * model.X["BEETS"] - model.W["BEETS_FAVORABLE"] - model.W["BEETS_UNFAVORABLE"] >= 0
)
# Set upper bound for BEETS_FAVORABLE
model.W["BEETS_FAVORABLE"].setub(6000)
# Objective function: Weighted expected cost minus revenue
model.OBJ = Objective(
expr=sum(model.ScenarioCosts[idx] - model.ScenarioRevenues[idx] for idx in range(len(scenarios))),
sense=minimize
)
return model
model = build_model()
solver = SolverFactory("ipopt")
solver.solve(model)
def print_opt_sol(model):
'''
Arguments:
model: solved farmer problem model
Return:
Prints the optimal solution
'''
print("=== Optimal solutions ===")
print()
# Table header
print("{:<15} | {:>8} | {:>8} | {:>12}".format("Culture", "Wheat", "Corn", "Sugar Beets"))
print("-" * 50)
# Surface (acres)
print("{:<15} | {:>8.1f} | {:>8.1f} | {:>12.1f}".format(
"Surface (acres)",
value(model.X["WHEAT"]),
value(model.X["CORN"]),
value(model.X["BEETS"])
))
# Yield (T)
print("{:<15} | {:>8.1f} | {:>8.1f} | {:>12.1f}".format(
"Yield (T)",
value(model.X["WHEAT"]) * yields_perfect[0],
value(model.X["CORN"]) * yields_perfect[1],
value(model.X["BEETS"]) * yields_perfect[2]
))
# Sales (T)
print("{:<15} | {:>8.1f} | {:>8.1f} | {:>12.1f}".format(
"Sales (T)",
value(model.W["WHEAT"]),
value(model.W["CORN"]),
value(model.W["BEETS_FAVORABLE"]) + value(model.W["BEETS_UNFAVORABLE"])
))
# Purchases (T)
print("{:<15} | {:>8.1f} | {:>8.1f} | {:>12}".format(
"Purchases (T)",
value(model.Y["WHEAT"]),
value(model.Y["CORN"]),
"-"
))
print("-" * 50)
# Profit
profit = -value(model.OBJ)
print("{:<15} {}".format("Overall profit:", f"${profit:.1f}"))
return profit
profit_perfect = print_opt_sol(model)
=== Optimal solutions ===
Culture | Wheat | Corn | Sugar Beets
--------------------------------------------------
Surface (acres) | 16.5 | 114.3 | 369.2
Yield (T) | 41.2 | 342.9 | 7384.8
Sales (T) | 0.0 | 0.0 | 6000.0
Purchases (T) | 181.5 | 0.0 | -
--------------------------------------------------
Overall profit: $48047.3
Out of the available land, 93.7 acres were allocated to wheat, 82.6 acres to corn, and 323.7 acres to sugar beets. This distribution indicates that sugar beets were identified as the most economically beneficial crop.
The yield results support this prioritization. Wheat and corn produced 234.3 tons and 247.7 tons, respectively, while sugar beets yielded an impressive 6474.5 tons. Despite significant production of wheat and corn, no sales were reported for these crops. In contrast, 4792.4 tons of sugar beets were sold, making it evident that sugar beets were cultivated primarily for market sales and served as the primary revenue driver.
The overall profit achieved through this plan was $55,311. Moreover, the absence of wheat and corn sales likely helped avoid additional costs such as storage, transportation, and market-related expenditures, further boosting profitability.
In conclusion, the optimal plan demonstrates the importance of aligning crop selection with market dynamics and operational priorities. By prioritizing sugar beet production and focusing its sales efforts on the most profitable crop, the farm successfully maximized its profits while maintaining sustainability and resource efficiency
Sparse Grid#
To solve this problem, we now employ a Sparse Grid Method using Gaussian quadrature to approximate the random variables. The Sparse Grid Method builds on Gaussian quadrature by efficiently constructing a multidimensional grid that reduces the number of representative points (quadrature points) while maintaining accuracy. This approach strategically combines points from lower-dimensional grids to approximate the uncertainty in yields with fewer computational resources than a full grid.
These quadrature points and their associated weights capture the variability and likelihood of different yield outcomes. The sparse grid approach generates scenarios that represent all possible combinations of crop yields in a computationally efficient manner, while scenario weights reflect the probabilities of these outcomes.
By leveraging the Sparse Grid Method, the model captures the complexity and uncertainty of agricultural planning while optimizing planting decisions, purchasing strategies, and sales under stochastic programming principles, ensuring a practical yet rigorous framework for handling variability in crop yields.
def sparse_grid_points_and_weights(d, m, all_crops, yield_means, yield_stds, num_points=3):
"""Construct sparse grid points and weights using Smolyak's algorithm.
Parameters:
- d
- m
- all_crops (list): List of all crops.
- yield_means (dict): Mean yields for each crop.
- yield_stds (dict): Standard deviations for each crop.
Returns:
- A tuple containing sparse grid points and weights.
"""
# Generate 1D Gaussian quadrature points and weights
points_1d, weights_1d = {}, {}
for crop in all_crops:
points, weights = np.polynomial.hermite.hermgauss(num_points)
points_1d[crop] = points * np.sqrt(2) * yield_stds[crop] + yield_means[crop]
weights_1d[crop] = weights / np.sqrt(np.pi)
# Combine 1D rules to form sparse grid
sparse_points = []
sparse_weights = []
index_set = list(range(num_points)) # Index sets for 1D quadrature
for idx_comb in product(index_set, repeat=d):
weights_product = np.prod([weights_1d[all_crops[k]][idx] for k, idx in enumerate(idx_comb)])
points_product = [tuple(points_1d[all_crops[k]][idx] for k, idx in enumerate(idx_comb))]
sparse_weights.extend([weights_product] * len(points_product))
sparse_points.extend(points_product)
return sparse_points, sparse_weights
def build_model():
'''
input: none
output: farmer problem model
'''
model = ConcreteModel()
# Define sets
all_crops = ["WHEAT", "CORN", "BEETS"]
purchase_crops = ["WHEAT", "CORN"]
sell_crops = ["WHEAT", "CORN", "BEETS_FAVORABLE", "BEETS_UNFAVORABLE"]
# Define random variables (mean and standard deviation for yields)
yield_means = {"WHEAT": 3.0, "CORN": 3.6, "BEETS": 20.0} # Mean yields
yield_stds = {"WHEAT": 0.5, "CORN": 0.4, "BEETS": 1.0} # Standard deviations
# Generate sparse grid scenarios and weights
sparse_points, sparse_weights = sparse_grid_points_and_weights(
d=len(all_crops), m=4, all_crops=all_crops, yield_means=yield_means, yield_stds=yield_stds
)
# Define variables
model.X = Var(all_crops, within=NonNegativeReals) # Crops field allocation
model.Y = Var(purchase_crops, within=NonNegativeReals) # Tons of crops to purchase
model.W = Var(sell_crops, within=NonNegativeReals) # Tons of crops to sell
# Placeholder for scenario-based costs and revenues
model.ScenarioCosts = {}
model.ScenarioRevenues = {}
# Constraints for each scenario
model.CONSTR = ConstraintList()
for idx, scenario in enumerate(sparse_points):
scenario_yields = {crop: scenario[i] for i, crop in enumerate(all_crops)}
scenario_weight = sparse_weights[idx]
# Calculate costs and revenues for this scenario
planting_cost = (
150 * model.X["WHEAT"] + 230 * model.X["CORN"] + 260 * model.X["BEETS"]
)
purchase_cost = 238 * model.Y["WHEAT"] + 210 * model.Y["CORN"]
sales_revenue = (
170 * model.W["WHEAT"] + 150 * model.W["CORN"]
+ 36 * model.W["BEETS_FAVORABLE"] + 10 * model.W["BEETS_UNFAVORABLE"]
)
# Store scenario contributions
model.ScenarioCosts[idx] = scenario_weight * (planting_cost + purchase_cost)
model.ScenarioRevenues[idx] = scenario_weight * sales_revenue
# Add constraints for this scenario
model.CONSTR.add(sum(model.X[crop] for crop in all_crops) <= 500)
model.CONSTR.add(
scenario_yields["WHEAT"] * model.X["WHEAT"] + model.Y["WHEAT"] - model.W["WHEAT"] >= 200
)
model.CONSTR.add(
scenario_yields["CORN"] * model.X["CORN"] + model.Y["CORN"] - model.W["CORN"] >= 240
)
model.CONSTR.add(
scenario_yields["BEETS"] * model.X["BEETS"] - model.W["BEETS_FAVORABLE"] - model.W["BEETS_UNFAVORABLE"] >= 0
)
# Set upper bound for BEETS_FAVORABLE
model.W["BEETS_FAVORABLE"].setub(6000)
# Objective function: Weighted expected cost minus revenue
model.OBJ = Objective(
expr=sum(model.ScenarioCosts[idx] - model.ScenarioRevenues[idx] for idx in range(len(sparse_points))),
sense=minimize
)
return model
model = build_model()
solver = SolverFactory("ipopt")
solver.solve(model)
profit_perfect = print_opt_sol(model)
=== Optimal solutions ===
Culture | Wheat | Corn | Sugar Beets
--------------------------------------------------
Surface (acres) | 89.0 | 82.6 | 328.4
Yield (T) | 222.5 | 247.7 | 6568.9
Sales (T) | 0.0 | 0.0 | 6000.0
Purchases (T) | 10.1 | 0.0 | -
--------------------------------------------------
Overall profit: $95869.5
Two-Stage Stochastic#
Contrary to the assumption made in 4.1. Stochastic Programming, we may also assume that yields for the different crops are independent. In that case, we may as well consider a continuous random vector for the yields. Again, the decisions on land allocation are first-stage decisions because they are taken befor knowledge of the yields. Second-stage formulation can again be described as \(E_{\xi}Q(x,\xi)\), where \(Q(x,\xi)\) is the value of the second stage for a given realization of the random vector. The optimization problem becomes:
$\(\begin{equation}
\min_x \quad c^T x + E_{\xi}Q(x,\xi),
\end{equation}\)\(
where \)c\( is the cost of initial planting, and \)\xi = (\xi_1,\xi_2,\xi_3)$ is the scenario.
Now, in this particular example, the computation of \(Q(x,\xi)\) can be separated among the three crops due to independence of the random vector. We can then write:
$\(\begin{equation}
E_{\xi}Q(x,\xi) = \sum_{i=1}^3 E_{\xi}Q_i(x_i,\xi_i)
\end{equation}\)\(
where \)Q_i(x_i,\xi_i)\( is the optimal second-stage value of purchases and sales for crop \)i$.
The remaining question is how to compute \(E_{\xi}Q_i(x_i,\xi)\). This section introduces and implements the following methods to address this:
Sample Average Approximation
Exact Integral
To illustate this, let us assume that the yield for each crop \(i\), denoted by \(t_i\), can be appropriately described by a uniform random variable, inside some range \([l_i,u_i]\). For the sake of comparison, we may take \(l_i\) to be 80% of the mean yield and \(u_i\) to be 120% of the mean yield so that the expectations for the yields will be the same as in 4.1. Stochastic Programming (to calculate VSS, will see this later).
Sample Average Approximation (SAA)#
Sample Average Approximation (SAA) is a statistical method commonly used in stochastic programming and optimization to solve problems with uncertainty. The idea is to replace the expected values in a stochastic objective funtion (\(E_{\xi} Q(x,\xi)\), in our case) with sample averages, using a finite set of random samples (scenarios). This converts the original stochastic problem into a deterministic one, which is often easier to solve. Here is a basic breakdown of the steps involved:
Sample Generation: A finite number of random samples (or scenarios) are drawn from the probability distribution representing the uncertainty in the problem. These samples capture possible realizations of the uncertain parameters.
Objective Approximation: The expected objective function in the original stochastic problem is approximated by the sample average of the objective over these scenarios.
Deterministic Problem Formation: With the sample average objective function in place, the problem becomes deterministic and can be solved using standard optimization techniques.
Typically, a larger sample size improves the quality of the approximation but increases computational costs.
In the farmers’ example, the SAA approach estimates \(E_{\xi}Q_i(x_i,\xi_i)\) as
where \(\xi_{i1}, \xi_{i2},...,\xi_{iN}\) are independent samples of \(\xi_i\), and N is the number of scenarios chosen. Specifically, for uniform random variable, each \(\xi_{ij}\) is drawn from \(U_i[l_i,u_i]\) randomly.
Consequently, the optimization problem can be reformulated as:
def SAA_uniform_model(N):
'''
Parameters:
- N: Size of the problem
Returns:
- A Pyomo ConcreteModel representing the SAA uniform model.
'''
# Define the data
all_crops = ["Wheat", "Corn", "SugarBeets"]
purchase_crops = ["Wheat", "Corn"]
sale_crops = ["Wheat", "Corn", "BEETS_FAVORABLE", "BEETS_UNFAVORABLE"]
land_capacity = 500
crop_costs = {"Wheat": 150, "Corn": 230, "SugarBeets": 260}
sale_price = {"Wheat": 170, "Corn": 150, "BEETS_FAVORABLE": 36, "BEETS_UNFAVORABLE": 10}
purchase_price = {"Wheat": 238, "Corn": 210}
min_require = {"Wheat": 200, "Corn": 240}
avg_yield = {"Wheat": 2.5, "Corn": 3.0, "SugarBeets": 20.0}
lower_yield = {crop: 0.8 * avg_yield[crop] for crop in all_crops}
upper_yield = {crop: 1.2 * avg_yield[crop] for crop in all_crops}
# Generate random yields for N scenarios
yields = {
crop: {n: np.random.uniform(lower_yield[crop], upper_yield[crop]) for n in range(N)}
for crop in all_crops
}
model = ConcreteModel()
model.x = Var(all_crops, domain=NonNegativeReals)
model.y = Var(purchase_crops, range(N), domain=NonNegativeReals)
model.w = Var(sale_crops, range(N), domain=NonNegativeReals)
# Parameter
model.yields = Param(
all_crops, range(N), initialize=lambda m, crop, n: yields[crop][n], within=Reals
)
# Objective function
def objective_rule(m):
first_stage_cost = sum(crop_costs[crop] * m.x[crop] for crop in all_crops)
second_stage_cost = (
(1 / N)
* sum(
-sale_price[sale] * m.w[sale, n] for sale in sale_crops for n in range(N)
)
+ (1 / N)
* sum(
purchase_price[purchase] * m.y[purchase, n]
for purchase in purchase_crops
for n in range(N)
)
)
return first_stage_cost + second_stage_cost
model.obj = Objective(rule=objective_rule, sense=minimize)
def land_constraint_rule(m):
return sum(m.x[crop] for crop in all_crops) <= land_capacity
model.land_constraint = Constraint(rule=land_constraint_rule)
def wheat_produce_rule(m,n):
return m.x["Wheat"]*m.yields["Wheat", n] + m.y["Wheat", n] - m.w["Wheat", n] >= min_require["Wheat"]
model.wheat_produce = Constraint(range(N), rule=wheat_produce_rule)
def corn_produce_rule(m,n):
return m.x["Corn"]*m.yields["Corn", n] + m.y["Corn", n] - m.w["Corn", n] >= min_require["Corn"]
model.corn_produce = Constraint(range(N), rule=corn_produce_rule)
def beets_produce_rule(m,n):
return m.w["BEETS_FAVORABLE", n] + m.w["BEETS_UNFAVORABLE", n] <= m.yields["SugarBeets", n]*m.x["SugarBeets"]
model.beets_produce = Constraint(range(N), rule=beets_produce_rule)
def beets_favorable_rule(m,n):
return m.w["BEETS_FAVORABLE", n] <= 6000
model.beets_favorable = Constraint(range(N), rule=beets_favorable_rule)
return model
# Example for N=10
model = SAA_uniform_model(10)
solver = SolverFactory("ipopt")
solver.solve(model,tee=True)
Ipopt 3.13.2:
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit http://projects.coin-or.org/Ipopt
This version of Ipopt was compiled from source code available at
https://github.com/IDAES/Ipopt as part of the Institute for the Design of
Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.
This version of Ipopt was compiled using HSL, a collection of Fortran codes
for large-scale scientific computation. All technical papers, sales and
publicity material resulting from use of the HSL codes within IPOPT must
contain the following acknowledgement:
HSL, a collection of Fortran codes for large-scale scientific
computation. See http://www.hsl.rl.ac.uk.
******************************************************************************
This is Ipopt version 3.13.2, running with linear solver ma27.
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 103
Number of nonzeros in Lagrangian Hessian.............: 0
Total number of variables............................: 63
variables with only lower bounds: 63
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 41
inequality constraints with only lower bounds: 20
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 21
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 7.2199928e+00 2.40e+02 6.01e+00 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0
1 1.0961153e+02 2.39e+02 2.74e+01 -1.0 1.87e+02 - 4.58e-04 1.23e-02h 1
2 1.7737678e+02 2.39e+02 2.73e+01 -1.0 1.63e+02 - 2.06e-03 2.19e-03h 1
3 3.8485686e+02 2.37e+02 2.72e+01 -1.0 1.63e+02 - 1.03e-03 6.32e-03h 1
4 6.9645766e+02 2.35e+02 2.69e+01 -1.0 1.61e+02 - 3.41e-03 9.33e-03h 1
5 1.1190314e+03 2.32e+02 2.66e+01 -1.0 1.56e+02 - 8.44e-03 1.28e-02h 1
6 2.9315838e+03 2.19e+02 2.51e+01 -1.0 1.53e+02 - 3.53e-03 5.53e-02h 1
7 8.4633078e+03 1.79e+02 2.05e+01 -1.0 1.42e+02 - 1.13e-02 1.84e-01h 1
8 8.8850960e+03 1.74e+02 2.00e+01 -1.0 2.50e+02 - 2.03e-02 2.59e-02h 1
9 9.8115298e+03 1.66e+02 1.90e+01 -1.0 1.17e+02 - 9.64e-03 4.95e-02h 1
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
10 9.7998826e+03 1.65e+02 1.89e+01 -1.0 2.18e+02 - 2.04e-03 5.27e-03f 1
11 9.8824285e+03 1.64e+02 1.88e+01 -1.0 1.17e+02 - 1.59e-02 5.04e-03h 1
12 1.0069094e+04 1.56e+02 1.79e+01 -1.0 2.30e+02 - 3.73e-03 4.85e-02h 1
13 1.0063950e+04 1.56e+02 1.79e+01 -1.0 7.52e+02 - 1.21e-02 1.17e-04f 1
14 1.0873846e+04 1.47e+02 1.68e+01 -1.0 1.17e+02 - 3.86e-03 6.03e-02h 1
15 1.0642981e+04 1.46e+02 1.67e+01 -1.0 1.09e+03 - 1.51e-03 2.40e-03f 1
16 1.0675561e+04 1.46e+02 1.67e+01 -1.0 1.08e+02 - 3.66e-02 2.28e-03h 1
17 1.0959826e+04 1.36e+02 1.56e+01 -1.0 1.43e+02 - 3.87e-04 6.72e-02h 1
18 1.0803055e+04 1.36e+02 1.56e+01 -1.0 3.00e+03 - 1.09e-03 5.19e-04f 1
19 1.0395432e+04 1.36e+02 1.56e+01 -1.0 9.86e+03 - 5.02e-04 5.21e-04f 1
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
20 1.0394980e+04 1.36e+02 1.56e+01 -1.0 2.25e+02 - 3.73e-02 2.83e-04h 1
21 1.0467561e+04 1.34e+02 1.54e+01 -1.0 1.30e+02 - 4.00e-04 1.32e-02h 1
22 1.0471944e+04 1.34e+02 1.54e+01 -1.0 1.11e+02 - 1.74e-02 4.24e-04h 1
23 1.0502607e+04 1.33e+02 1.53e+01 -1.0 1.48e+02 - 5.79e-04 6.39e-03h 1
24 1.0623796e+04 1.32e+02 1.51e+01 -1.0 1.02e+02 - 2.32e-02 9.62e-03h 1
25 8.9957165e+03 1.05e+02 1.20e+01 -1.0 2.69e+02 - 2.48e-04 2.06e-01f 1
26 -9.2428407e+04 1.01e+02 1.16e+01 -1.0 3.06e+04 - 1.55e-03 3.65e-02f 1
27 -9.3192171e+04 1.00e+02 4.96e+01 -1.0 1.12e+03 - 5.15e-01 8.55e-03f 1
28 -7.7188992e+04 4.74e+01 6.51e+00 -1.0 1.92e+02 - 1.89e-01 5.27e-01h 1
29 -6.2991915e+04 0.00e+00 3.84e+01 -1.0 9.41e+01 - 5.72e-02 1.00e+00h 1
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
30 -7.9535160e+04 0.00e+00 3.79e+01 -1.0 2.08e+03 - 1.24e-02 1.00e+00f 1
31 -1.1030479e+05 0.00e+00 2.64e+01 -1.0 6.13e+03 - 3.05e-01 6.30e-01f 1
32 -1.1061324e+05 0.00e+00 1.82e+01 -1.0 2.75e+02 - 3.08e-01 1.40e-01f 1
33 -1.1076799e+05 0.00e+00 1.82e+01 -1.0 7.54e+01 - 3.36e-03 3.69e-01f 1
34 -1.1406306e+05 0.00e+00 9.01e-01 -1.0 1.08e+03 - 1.00e+00 5.70e-01f 1
35 -1.1409653e+05 0.00e+00 5.87e-01 -1.0 1.81e+01 - 6.16e-01 3.49e-01f 1
36 -1.1410890e+05 0.00e+00 3.15e-01 -1.0 9.49e+00 - 4.08e-01 4.63e-01f 1
37 -1.1412090e+05 0.00e+00 3.35e-01 -1.0 1.79e+01 - 2.17e-01 1.00e+00f 1
38 -1.1415302e+05 0.00e+00 8.53e-02 -1.0 6.38e+01 - 2.35e-01 7.43e-01f 1
39 -1.1415313e+05 0.00e+00 6.19e-04 -1.0 2.61e-01 - 9.96e-01 1.00e+00f 1
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
40 -1.1416825e+05 0.00e+00 3.71e-03 -2.5 2.52e-01 - 9.38e-01 9.66e-01f 1
41 -1.1416926e+05 0.00e+00 1.50e-09 -3.8 4.56e-01 - 1.00e+00 1.00e+00f 1
42 -1.1416928e+05 0.00e+00 1.85e-11 -5.7 8.08e-04 - 1.00e+00 1.00e+00f 1
43 -1.1416928e+05 0.00e+00 5.69e-14 -8.6 3.05e-05 - 1.00e+00 1.00e+00f 1
Number of Iterations....: 43
(scaled) (unscaled)
Objective...............: -4.3911262994527024e+04 -1.1416928378577025e+05
Dual infeasibility......: 5.6906757453431017e-14 1.4795756937892062e-13
Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 2.5383208668605393e-09 6.5996342538374021e-09
Overall NLP error.......: 2.5383208668605393e-09 6.5996342538374021e-09
Number of objective function evaluations = 44
Number of objective gradient evaluations = 44
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 44
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 44
Number of Lagrangian Hessian evaluations = 43
Total CPU secs in IPOPT (w/o function evaluations) = 0.021
Total CPU secs in NLP function evaluations = 0.001
EXIT: Optimal Solution Found.
{'Problem': [{'Lower bound': -inf, 'Upper bound': inf, 'Number of objectives': 1, 'Number of constraints': 41, 'Number of variables': 63, 'Sense': 'unknown'}], 'Solver': [{'Status': 'ok', 'Message': 'Ipopt 3.13.2\\x3a Optimal Solution Found', 'Termination condition': 'optimal', 'Id': 0, 'Error rc': 0, 'Time': 0.09827899932861328}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}
# Define a function to process and display crop results
def display_land_allocation_results(model, crops, units="hectares", decimals=2):
"""
Display land allocation results with rounding and units.
Parameters:
- model: The Pyomo model containing decision variables and the objective function.
- crops: List of crop names to extract results for.
- units: Units for land allocation (default is 'hectares').
- decimals: Number of decimal places to round the results to (default is 2).
"""
# Extract and round the results
crop_results = {
crop: round(model.x[crop].value, decimals) for crop in crops
}
# Convert results to a DataFrame
crop_results_df = pd.DataFrame.from_dict(crop_results, orient="index", columns=["Land Allocation"])
crop_results_df["Units"] = units
# Display the results
print("Land Allocation Results:\n", crop_results_df)
# Calculate and display the profit
profit = -round(model.obj(), decimals)
print(f"Total profit: {profit} (dollars)")
# Example usage
crops = ["Wheat", "Corn", "SugarBeets"]
display_land_allocation_results(model, crops, units="hectares", decimals=2)
Land Allocation Results:
Land Allocation Units
Wheat 131.62 hectares
Corn 88.04 hectares
SugarBeets 280.34 hectares
Total profit: 114169.28 (dollars)
Statistical Inference#
When solving the optimization problem using the Sample Average Approximation (SAA) method, it is common to observe variability in the results across different samples, even if the sample size remains fixed. This variability arises due to the randomness inherent in the sampling process.
def solve_and_collect_results(N, N_samples, crops, solver_name="ipopt", units="hectares", decimals=2):
"""
Solve the SAA_uniform_model multiple times and collect results.
Parameters:
- N: Size of the problem
- N_samples: Number of samples to solve.
- crops: List of crops to extract results for.
- solver_name: Name of the solver to use
- units: Units for land allocation (default is 'hectares').
- decimals: Number of decimal places to round results
Returns:
- A pandas DataFrame containing results for all samples.
"""
results_data = []
solver = SolverFactory(solver_name)
results_data2 = []
for i in range(N_samples):
model = SAA_uniform_model(N) # Create a new model instance
solver.solve(model) # Solve the model
# Collect results for each crop
crop_results = {
crop: round(model.x[crop].value, decimals) for crop in crops
}
# Calculate profit
profit = -round(model.obj(), decimals)
# Append the results for this sample
results_data.append({
"Sample": i + 1,
**{crop: f"{crop_results[crop]} {units}" for crop in crops}, # Include units
"Expected Profit": f"{profit} Dollars"
})
results_data2.append({
"Sample": i + 1,
"Wheat": crop_results["Wheat"],
"Corn": crop_results["Corn"],
"SugarBeets": crop_results["SugarBeets"],
"Expected Profit": profit
})
# Convert to a DataFrame
return pd.DataFrame(results_data),results_data2
# Example usage
N = 400
N_samples = 10
crops = ["Wheat", "Corn", "SugarBeets"]
results_df,results_data = solve_and_collect_results(N, N_samples, crops, solver_name="ipopt", units="hectares", decimals=2)
# Display the results
print(results_df)
Sample Wheat Corn SugarBeets Expected Profit
0 1 134.58 hectares 85.14 hectares 280.28 hectares 111191.8 Dollars
1 2 135.2 hectares 84.62 hectares 280.18 hectares 110561.38 Dollars
2 3 133.42 hectares 85.4 hectares 281.18 hectares 110005.75 Dollars
3 4 135.52 hectares 85.86 hectares 278.62 hectares 110623.41 Dollars
4 5 137.89 hectares 84.59 hectares 277.52 hectares 113029.91 Dollars
5 6 135.35 hectares 84.91 hectares 279.74 hectares 111859.47 Dollars
6 7 131.91 hectares 84.57 hectares 283.52 hectares 110146.44 Dollars
7 8 136.85 hectares 85.52 hectares 277.63 hectares 111394.62 Dollars
8 9 139.89 hectares 84.92 hectares 275.19 hectares 113085.93 Dollars
9 10 134.66 hectares 85.18 hectares 280.16 hectares 110039.1 Dollars
To better understand and analyze this variability, we rely on statistical measures. These measures allow us to summarize the outcomes of multiple samples and quantify key characteristics such as:
The sample mean measures the central tendency of the dataset. Given a set of results for expected profits \(X = \left\{ x_1, x_2, ... , x_n \right\}\), the sample mean is defined as: \(\overline{x}= \frac{1}{n} \sum_{i=1}^n x_i\).
The sample variance quantifies the spread of the data around the mean: \(S^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2\). (The term n-1 is used in the denominator to account for Bessel’s correction, which corrects bias in the estimation of the population variance.)
The sample standard deviation is the square root of the variance: \(S=\sqrt{S^2}\).
To estimate the true mean of the population from the sample mean, we construct a confidence interval. The formula for a two-tailed confidence interval is:
where \(t_{\alpha/2,n-1}\) is the t-critical value for the given confidence level and degrees of freedom \(n-1\).
The margin of error is:
The confidence interval is then:
These statistical tools provide insight into the stability and reliability of the SAA method and allow us to draw meaningful conclusions based on the results.
def t_critical_approx(alpha, df):
'''
Calculate the t-critical value for a given alpha and degrees of freedom.
Parameters:
- alpha: Significance level (1 - confidence level).
- df: Degrees of freedom.
Returns:
- The t-critical value.
'''
p = 1 - alpha / 2
a = 1 / (df - 0.5)
b = 48 / (df ** 2)
c = ((a * b) + 1) ** 2
t = math.sqrt(df * (c - 1) / c)
return t
# Sample Mean
expected_profits = [d["Expected Profit"] for d in results_data]
sample_mean = sum(expected_profits) / len(expected_profits)
print(f"Sample Mean: {sample_mean}")
# Samople Variance & Standard Deviation
sample_variance = sum((x - sample_mean) ** 2 for x in expected_profits) / (len(expected_profits) - 1)
sample_standard_deviation = sample_variance ** 0.5
print(f"Sample Variance: {sample_variance}")
print(f"Sample Standard Deviation: {sample_standard_deviation}")
# Confidence Interval
confidence_level = 0.95
alpha = 1 - confidence_level
n = len(expected_profits)
t_critical = t_critical_approx(alpha, n-1) # t-critical value for 2-tailed test
margin_of_error = t_critical * (sample_standard_deviation / n**0.5)
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error
print(f"Confidence Interval ({confidence_level * 100}%): [{lower_bound}, {upper_bound}]")
Sample Mean: 111193.781
Sample Variance: 1333551.417476662
Sample Standard Deviation: 1154.7949677222628
Confidence Interval (95.0%): [110804.75248349282, 111582.80951650719]
Conclusions#
In the end we see that we were able to
Formulate and solve optimization problems using real-world examples that incorporate uncertainty and stochasticity
Develop proficiency in Python and Pyomo for modeling and solving complex optimization problems
Understand and implement full-grid and half-grid discretization in stochastic optimization frameworks, including their respective advantages and applications.
Gain a deeper understanding of two-stage stochastic optimization, focusing on its application to agricultural systems under uncertainty
Interpret and analyze optimization results, drawing meaningful conclusions about resource allocation and decision-making in uncertain environments
A major takeaway from this work is the recognition that alternative numerical methods, such as the half-grid approach, can effectively address the “curse of dimensionality” associated with full-grid methods. By accounting for the reality that perfect information is rarely available, we demonstrated how modeling uncertainty using Gaussian distributions leads to more realistic scenarios. Finally, this project highlighted the significance of two-stage stochastic optimization in tackling real-world problems, emphasizing its role in enabling robust and informed decision-making under uncertainty.
Reference#
Birge, J. R., & Louveaux, F. (2011). Introduction to Stochastic Programming (2nd ed.). Springer.
Shapiro, A., Dentcheva, D., & Ruszczyński, A. (2014). Lectures on Stochastic Programming: Modeling and Theory (2nd ed.). SIAM.
Van der Laan, M. J., & Rubin, D. B. (2006). Statistical Models and Causal Inference: A Dialogue with the Social Sciences. Springer.
Chow, Y., & Pavone, M. (2013). A Uniform-grid Discretization Algorithm for Stochastic Optimal Control with Risk Constraints. Proceedings of the IEEE Conference on Decision and Control
Chen, M., Mehrotra, S., & Papp, D. (2015). Scenario Generation for Stochastic Optimization Problems via the Sparse Grid Method. Computational Optimization and Applications, 62, 669 - 692
Kleywegt, A. J., Shapiro, A., & Homem-de-Mello, T. (2002). The Sample Average Approximation Method for Stochastic Discrete Optimization. SIAM Journal on Optimization, 12(2), 479-502.