Z and T-tests from Scratch

Z and T-tests from Scratch

The Z and T-tests are both used for hypothesis testing. Using a quick block of code, it is easy to generate results from these tests, but what is going on behind the scenes. In this blog, I will discuss the math and assumptions behind each of these test, I will also assume you understand the basics of running a hypothesis test to focus on the math.

One-Sample Z-Test

A Z test is used when we want to measure if a sample comes from a specified population. In other words, does is the sample different than what we expect from a given population.

When we conduct a Z-test we first need a sample. In a Z-test, our sample is generally large (greater than or equal to 30). We also need to ensure that our sample is randomly selected and that the selection process is independent. The other thing we need to know in the Z-test is the population?s standard deviation.

Once the above assumptions are met, we can start running the Z-test. To begin we calculate the Z-statistic. The Z-statistic is the standardization of your sample?s mean to the standard normal distribution. As a reminder, a standard normal distribution has a mean of 0 and a standard deviation of 1. So if your Z-statistics is 0.75 it is .75 standard deviations above the average. The formula to calculate your Z-statistic is:

Image for postCalculate the Z score for a sample

Then once you have the Z-statistic, we look at the Z table. These Z tables are measurements of the area under the curve to the left. To use the table, find the appropriate Z-statistic which will give you a decimal value which will lead to your p-value. To find your p-value we need to take into account the alternative hypothesis. If your alternative hypothesis states :

  • the sample mean is greater than the population mean: subtract this value from 1 to get the p-value
  • the sample mean is less than the population mean: use this value as your p-value
  • the sample mean is not equal to the population mean: use either of the methods above (whichever is more appropriate for your calculated Z-statistic) and multiply by two. Alternatively, you can skip the multiplication of the p-value and half your alpha.

If your p-value is less than your alpha value, then you can reject the null hypothesis and state that there is a statistical difference. If not, you fail to reject the null hypothesis and cannot say that the sample is different from the population.

One-Sample T-Test

In the T-test, we want to measure if two samples are different from one another. One of these samples could be the population, however, we use a T-test in place of a Z-test if the population?s standard deviation is unknown.

There are a lot of similar assumptions to the Z-test. The sample must be random and independently selected as well as drawn from the normal distribution. The values should also be numeric and continuous. The sample size does not necessarily have to be large.

We then calculate the T-statistic. This looks very similar to the Z-statistic other than the denominator in which we use the sample?s standard deviation instead of the population. The other number we need to calculate is the degrees of freedom which is n ? 1.

Image for post

We then utilize a T distribution table to find the critical value. The table requires your alpha value and your degrees of freedom. This table gives the area under one tail. If your T-statistic is less than this critical value then you fail to reject the null hypothesis. If your critical value is greater than the critical value then you can reject the null hypothesis and there is a significant difference between the sample and the population.

Conclusion

The Z and T-tests are important tests to measure if a difference between a sample and a population is significant. While the formulas are similar the choice to use a specific test often relies on the sample size and if you know the population standard deviation. If you know the population standard deviation and the sample size is large enough, then we want to use a Z-test. If we don?t know the population standard deviation we want to use a T-test. Additionally, regardless of the status of the population standard deviation, if our sample size is small we want to use a T-test.

19