MLE_thetahat is a random variable, Fisher information, Cramer-Rao Notes

Dec 3, 2023 · 484 words · 1 minute read

2024-04-04 更新 🔗

MLE_theta_hat is a RV because X is a RV. One realization of data gives the point estimate.
its OK we only get one realization of MLE_theta. Theoretically, we are studying how MLE_theta will behave if we repeat the experiment. That is, we are more interested in E(MLE_theta). we know bias(MLE_theta) = E(MLE_theta - true theta) = E(MLE_theta) - true theta = 0, so we are good with the MLE_theta from one realization, because we know on average, MLE estimator give unbiased estimation of theta.
see mle.R in static/R folder

2024-04-11 更新 🔗

Fisher information is defined based on the log-likelihood of theta with 1 observation. The log-likelihood function values changes as X is a RV.X changes –> different log-likelihood function values –> different score values (first derivative) for all thetas –> fisher information takes the average/expectation of (score)^2. given theta(i.e., theta is a constant). we can compute the fisher information for different thetas for example, FI(theta=1); FI(theta=2); observed information refers to one realization.
Cramer-Rao inequality gives the lower bound of variance: minimum variance of unbiased estimator; MLE give the smallest variance

set.seed(1017)
## population mu =1, one sample x
mu = 1; n =100; x = rnorm(n,mean = mu)
library(dplyr)
## mu candidates
mu_list =seq(-2,2,0.01)

## this is the likelihood value when observed data = x
likelihood_value <- sapply(mu_list,FUN = function(mu_cand){
  ### likelihood value under different mu
  dnorm(x, ## this is observed data
        mean= mu_cand ## x-axis
        ) %>%
    prod() ## x1, x2, ...xn are iid, so multiply all pdf to get the total likelihood,
})


likelihood_dat <- data.frame(
  theta = mu_list,
  likelihood_value =likelihood_value
)

## theta is mu
MLE_theta <- likelihood_dat$theta[which.max(likelihood_dat$likelihood_value)]

library(ggplot2)

##
likelihood_dat %>%
  ggplot(aes(x = theta,y = likelihood_value))+
  geom_point(size=0.5)+
  theme_bw

# this plot showed the MLE_theta under **one**
# experiment/realization (one random sample of X1 X2,...Xn),
# since X is a RV, if we repeat the random sampling process,
# we will get a distribution of MLE_theta, this is very important
# in practice, we only get one sample (one row of X1, X2, ..Xn),
# i.e., a sample point from the distribution of MLE_theta



## Let's repeat the experiment to
## get the  distribution of MLE_theta
set.seed(0629)
MLE_thetas <- sapply(1:1000,FUN = function(i){
  x <- rnorm(n,mean = mu) ## sample from the true population
  ## calculate the likelihood = prod of pdf under different mu_cand
  likelihood_values <- sapply(mu_list,
                              FUN = function(mu_cand){

    dnorm(x,mean = mu_cand) %>% prod()
  })

  ## which mu_cand has the max. likelihood?

  mu_list[which.max(likelihood_values)]

})

## we did the experiment for 1000 times, we
## now know the distribution of MLE_theta

density(MLE_thetas) %>% plot()

## we know one nice property of MLE_theta is unbiased
## so on average, E(MLE_theta) = true theta (bias=0)
mean(MLE_thetas)

#> mean(MLE_thetas)
# [1] 0.99903

MLE, Fisher information, and Cramer-Rao notes: Reference 1 and Reference 2