Autopsy of NHST and Bayesian Models (part 1)
Chapter 2 of Statistical Rethinking: A Bayesian Course with Examples in R and Stan introduces a water tossing example for the demonstration how Bayesain model run through the data based on the researcher’s hypothesis. This chapter defines a three stage process that are used in the coming chapters. We start from a narrated Data Story, then update the models by filling data in, and finally evaluate all the upated models. This literated process shows a clear picture for the learners who have yet stuied statistics before read this book. The readers who have studied statistics, like me, will have a hole in the mind what are the differences between Bayesian methods and the null hypothesis significance testing (NHST) in this process. After read Michael Clark’s Bayesian Basics: A Conceptual Introduction with Application in R and Stan on Stan official site, I require the key to understand the difference between the two golems.
The critical difference is which type of conditional probability the statistical method is used to evaluate the model. Once we collected the data based on some hypothesis, we have the distribution of hypothesis
Here is a pseudo experiment I want to know if a test is success based on the expect value smaller than 5 (Obersvations are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10). I completed an experiment of 25 observations and an experiment 36 observations. In use of NHST with a critical value (p < .05), the criterion change with the sample size. The simulation
set.seed(1)
OBV <- 1:10
Dist25 <- NULL
Dist36 <- NULL
count = 100
while(count > 0){Dist25 <- c(Dist25,mean(sample(OBV, 25,replace = TRUE) ) ); count <- count - 1}
Dist25_Density <- data.frame(Theta = density(Dist25, kernel = "gaussian")$x, Density = density(Dist25, kernel = "gaussian")$y)
CL25 <- max(Dist25_Density$Theta[Dist25_Density$Theta < 5 & Dist25_Density$Density < .05])
print(CL25) ## The smallest parameter that could make judgement
## [1] 4.286263
count = 100
while(count > 0){Dist36 <- c(Dist36,mean(sample(OBV, 36,replace = TRUE) ) ); count <- count - 1}
Dist36_Density <- data.frame(Theta = density(Dist36, kernel = "gaussian")$x, Density = density(Dist36, kernel = "gaussian")$y)
CL36 <- max(Dist36_Density$Theta[Dist36_Density$Theta < 5 & Dist36_Density$Density < .05])
print(CL36) ## The smallest parameter that could make judgement
## [1] 4.345472
When this experiment outputs a avarage value between 4.15 and 4.42, would you like to collect the data more than 36 participants? This is the opportunity many researchers have to struggle in their study. Researchers who are educated as like me used to collect the observation till the p value is smaller than .05. Most researchers used to stop the experiment till the 36th participant finished the experiment. However, this treatment lacks of the serious scientific thinking. If the hypothesis has set up the conditions to have the average, the sample size should be appointed before the start of experiment.
The appointment of sample size could be loose when the possible average is uncertain. When the experiment is conducted at first time in the world, NHST without the appointment of sample size could draw the range of averages that might obey the hypothesis. On the other hand, the following studies better has the appointment of sample size. In this case, NHST has to accompany the other statistical evaluation, such as power, effect size, to find the appropriate sample size. Therefore, NHST will perform well when the researcher conducted the first study or have the preparation as well as the first study. This will be the key in my autopsy of Bayesian method.
!登入個人github帳號就能留言!