Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

r - Manual calculation - instrumental variable with a tobit distribution in the 2nd stage, different results with robust errors

Cross Posted at CrossValidated.

I am trying to correct my standard errors, when using an ols distribution in the first stage and using a tobit distribution in the second. For some reason, I am getting different estimates when correcting and I cannot figure out why..

A couple of things to make clear. In this example, the estimate of the IV is only 0.05 off. In my actual data, it is 14% -> 22%. I see that also the intercept and the logSigma are very different. I am not sure to what extent that matters, but I thought to point it out.

The Data

set.seed(2)

a    <- 2    # structural parameter of interest
b    <- 1    # strength of instrument
rho  <- 0.5  # degree of endogeneity

N    <- 1000
z    <- rnorm(N)
res1 <- rnorm(N)
res2 <- res1*rho + sqrt(1-rho*rho)*rnorm(N)
x    <- z*b + res1
ys   <- x*a + res2
d    <- (ys>0) #dummy variable
y    <- round(10-(d*ys))
random_variable <- rnorm(100, mean = 0, sd = 1)

library(data.table)
DT_1 <- data.frame(y,x,z, random_variable)
DT_2 <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 
45, 46, 47, 48, 49, 50), year = c(1995, 1995, 1995, 1995, 1995, 
1995, 1995, 1995, 1995, 1995, 2000, 2000, 2000, 2000, 2000, 2000, 
2000, 2000, 2000, 2000, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 
2010, 2010, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015), Group = c("A", "A", "A", "A", "B", "B", "B", "B", "C", 
"C", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "A", "A", 
"A", "A", "B", "B", "B", "B", "C", "C", "A", "A", "A", "A", "B", 
"B", "B", "B", "C", "C", "A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C"), event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), win_or_lose = c(-1, 
-1, -1, -1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, -1, 1, 1, 1, 1, 0, 0, 
-1, -1, -1, -1, 1, 1, 1, 1, 0, 0)), row.names = c(NA, -50L), class = c("tbl_df", 
"tbl", "data.frame"))
DT_1 <- setDT(DT_1)
DT_2 <- setDT(DT_2)
DT_2 <- rbind(DT_2 , DT_2 [rep(1:50, 19), ])
sandboxA <- cbind(DT_1, DT_2)
sandboxB <- cbind(DT_1, DT_2)

The Regression

require(AER)
require(censReg)
first_stage_ols <- lm(x ~ z + random_variable + year, data=sandboxA)
yhat <- first_stage_ols$fitted.values
attr(yhat,"class")[1] <- "numeric"
yhat <- as.data.frame(yhat)
yhat <- unlist(yhat)
dataset <- cbind(sandboxA, yhat)
form_2st_yhat <- as.formula("y ~ yhat + random_variable + year")
second_stage_tobit <<- AER::tobit(form_2st_yhat, left=0, right=10, data=sandboxA, na.action = na.exclude)
second_stage_tobit_b <<- censReg(form_2st_yhat, left=0, right=10, data=sandboxA)
summary(second_stage_tobit)
summary(second_stage_tobit_b)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     33.34972   31.49314   1.059    0.290    
yhat            -2.20394    0.12052 -18.287   <2e-16 ***
random_variable -0.03412    0.11147  -0.306    0.760    
year            -0.01146    0.01571  -0.730    0.466    
Log(scale)       1.08955    0.03628  30.035   <2e-16 ***

                Estimate Std. error t value Pr(> t)    
(Intercept)     33.34972   31.49313   1.059   0.290    
yhat            -2.20394    0.12052 -18.287  <2e-16 ***
random_variable -0.03412    0.11147  -0.306   0.760    
year            -0.01146    0.01571  -0.730   0.466    
logSigma         1.08955    0.03628  30.035  <2e-16 ***

Correcting Standard Errors (Link)

reduced.form <- lm(x ~ z + random_variable + year, data=sandboxB)
summary(reduced.form)
    
consistent.tobit <- censReg(y~fitted(reduced.form)+residuals(reduced.form), left=0, right=10, data=sandboxB)
summary(consistent.tobit)


FUN <- function(x) {
  reduced.form <- lm(x ~ z + random_variable + year, data=x)
  censReg(y ~ fitted(reduced.form) + residuals(reduced.form))$estimate
}

library(censReg)
set.seed(42)
R <- 200
res <- t(replicate(R, FUN(sandbox[sample(nrow(sandboxB), nrow(sandboxB), replace=T), ])))

library(matrixStats)
b <- consistent.tobit$estimate
SE <- colSds(res)
z <- consistent.tobit$estimate/SE
p <- 2 * pt(-abs(z), df = Inf)
ci <- colQuantiles(res, probs=c(.025, .975))
res <- signif(cbind(b, SE, z, p, ci), 4)
res

                               b        SE       z p     2.5%  97.5%
(Intercept)             10.26000 0.0055910 1835.00 0  8.54300 8.5690
fitted(reduced.form)    -2.15500 0.0560100  -38.48 0 -0.09655 0.1241
residuals(reduced.form) -2.71400 0.0689700  -39.35 0 -0.12450 0.1522
logSigma                 0.05015 0.0009665   51.88 0  0.72270 0.7259

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...