If a variable y is a linear (y = a + bx) transformation of x then the variance of y is b² times the variance of x and the standard deviation of y is b times the variance of x. When distributions are approximately normal, SD is a better measure of spread because it is less susceptible to sampling fluctuation than (semi-)interquartile range. For any symmetrical (not skewed) distribution, half of its values will lie one semi-interquartile range either side of the median, i.e. Semi-interquartile range is half of the difference between the 25th and 75th centiles. Interquartile range is the difference between the 25th and 75th centiles. Formula Used: SEp sqrt p ( 1 - p) / n where, p is Proportion of successes in the sample,n is Number of observations in the sample. This is not the case when there are extreme values in a distribution or when the distribution is skewed, in these situations interquartile range or semi-interquartile are preferred measures of spread. SD is the best measure of spread of an approximately normal distribution. All three terms mean the extent to which values in a distribution differ from one another. Its a SAMPLING distribution of STATISTICS (like the mean) we calculate from a SAMPLE.
The spread of a distribution is also referred to as dispersion and variability. The unbiased estimate of population variance calculated from a sample is: Variance is usually estimated from a sample drawn from a population. SD is calculated as the square root of the variance (the average squared deviation from the mean). The standard deviation of the mean (SD) is the most commonly used measure of the spread of values in a distribution. So the probability that the sample mean is greater than 22 is between 0.005 and 0.025 (or between 0.5% and 2.5%)Įxercise. To obtain the one-tailed probability, divide the two-tailed probability by 2. Then we calculate t, which follows a t-distribution with df = (n-1) = 24.įrom the tables we see that the two-tailed probability is between 0.01 and 0.05. Suppose that is unknown and we need to use s to estimate it. We found that the probability that the sample mean is greater than 22 is P( > 22) = 0.0548. In the previous example we drew a sample of n=16 from a population with μ=20 and σ=5. So, all we can say is that P(|Z| > 2.00) is between 2% and 5%, probably closer to 5%! Using the z-table, we found that it was exactly 4.56%. Now, suppose that we want to know the probability that Z is more extreme than 2.00. So, if we look at the last row for z=1.96 and follow up to the top row, we find thatĮxercise: What is the critical value associated with a two-tailed probability of 0.01? The t-table also provides much less detail all the information in the z-table is summarized in the last row of the t-table, indexed by df = ∞. The t-table is presented differently, with separate rows for each df, with columns representing the two-tailed probability, and with the critical value in the inside of the table. The (one-tailed) probabilities are inside the table, and the critical values of z are in the first column and top row. The z table gives detailed correspondences of P(Z>z) for values of z from 0 to 3, by. Note: If n is large, then t is approximately normally distributed. Has a t distribution with (n-1) degrees of freedom (df)
If X is approximately normally distributed, then As the degrees of freedom increase, the t distribution approaches the standard normal distribution. There are actually many t distributions, indexed by degrees of freedom (df). However, we can estimate σ using the sample standard deviation, s, and transform to a variable with a similar distribution, the t distribution. If the standard deviation, σ, is unknown, we cannot transform to standard normal. How will this affect the standard error of the mean? How do you think this will affect the probability that the sample mean will be >22? Use the Z table to determine the probability. So the probability that the sample mean will be >22 is the probability that Z is > 1.6 We use the Z table to determine this:Įxercise: Suppose we were to select a sample of size 49 in the example above instead of n=16. Suppose we draw a sample of size n=16 from this population and want to know how likely we are to see a sample average greater than 22, that is P( > 22)? If the standard deviation, σ, is known, we can transform to an approximately standard normal variable, Z:įrom the previous example, μ=20, and σ=5. If X has a distribution with mean μ, and standard deviation σ, and is approximately normally distributed or n is large, then is approximately normally distributed with mean μ and standard error. The statistic used to estimate the mean of a population, μ, is the sample mean.