Simulation
Note: This assignment must be submitted in github classroom.
Warming Up
Most statistical software packages have several built-in distributions to handle the most common situations statisticians come across when simulating values.
Remember to set your seeds for these exercises so that you get the same answer each time the code is evaluated.
Exponential Distribution
Simulate 1000 draws from the exponential distribution \[f(x) = \lambda e^{-\lambda x},\;\;\;\; x\geq0, \text{ with } \lambda=3\]
Create a density plot of your values. Add the PDF of the exponential(3) distribution to your plot in a different color. How do they compare?
How might you use your 1000 draws to estimate the probability that \(x > 1\)? Provide an estimate using only your 1000 draws, and compare this estimate to one derived from
pexp(R) orscipy.stats.expon.cdf(Python).What is the average error from using a \(n=1000\) simulation instead of a numerical calculation for the probability that \(x>1\)? Write a function to calculate the error from one simulation, and evaluate that function 100 times to calculate the expected error from using a simulation estimate rather than deterministic numerical integration.
Gamma Distribution
The Gamma distribution is an exponential family, so it has MLEs that are unbiased and efficient estimators, if somewhat challenging to calculate in closed form.
Luckily, there are functions in R and python to handle this:
fitdistr(...)in theMASSpackage (R) will estimate the parameters for a gamma distribution from sample data.scipy.statsdistribution functions have a.fit(data)method that will estimate distributional parameters
In this problem, you will work with the gamma distribution with parameters \(\alpha, \beta\) (note that R uses a different parameterization than Python) and PDF
\[f(x | \alpha, \beta) = \frac{\beta^\alpha x^{\alpha-1} e^{-x}}{\Gamma(\alpha)}.\]
Write a function that will calculate the MLE \(\hat\alpha,\hat\beta\) from a vector \(x_{samp}\) of gamma-distributed samples. Your function should have required parameter \(x\) and optional parameters \(\alpha,\beta\) representing the true values. The function should return a list of \(\hat\alpha, \hat\beta\) and, if \(\alpha,\beta\) are provided, it should also return \(\alpha_{err} = \alpha-\hat\alpha\) and \(\beta_{err} = \beta-\hat\beta\).
For a grid of \(\alpha \times \beta=\{0.25, 0.50, 1.00, 2.00, 4.00\}\times\{0.25, 0.50, 1.00, 2.00, 4.00\}\), sample 100 values from the \(\text{gamma}(\alpha,\beta)\) distribution. Provide the true parameter values and plot your error values in an appropriate plot. Take care to choose your plot mappings to support your discussion of the following: What trends (if any) do you notice in the estimation error?
Now, let’s vary the sample size. For \(n=\{20, 30, 50, 100\}\), use your function to evaluate the error for \(\alpha=1, \beta=2\). How does the error change as \(n\) increases? Provide an appropriate plot as well as a description of the changes for increasing \(n\).