Probability and Hypothesis Testing
Reference: Probability and Hypothesis Testing. No running head please.
Author: (Jackson, S. L. (2017). Statistics plain and simple. (4th ed.). Boston, MA: Cengage Learning.) Please use this reference.
Question to be discuss: Discuss, elaborate and give example on the topic or question below.
****Define and share an example of a null hypothesis and an alternative hypothesis****
Probability and the Standard Normal Distribution
Critical Thinking Check Answers
Module 8: Hypothesis Testing and Inferential Statistics
Null and Alternative Hypotheses
Two-Tailed and One-Tailed Hypothesis Tests
Type I and Type II Errors in Hypothesis Testing
Probability, Statistical Significance, and Errors
Critical Thinking Check Answers
In this chapter you will be introduced to the concepts of probability and hypothesis testing. Probability is the study of likelihood and uncertainty. Most decisions that we make are probabilistic in nature. Thus, probability plays a critical role in most professions and in our everyday decisions. We will discuss basic probability concepts along with how to compute probabilities and the use of the standard normal curve in making probabilistic decisions.
probability The study of likelihood and uncertainty; the number of ways a particular outcome can occur, divided by the total number of outcomes.
Hypothesis testing is the process of determining whether a hypothesis is supported by the results of a research project. Our introduction to hypothesis testing will include a discussion of the null and alternative hypotheses, Type I and Type II errors, and one- and two-tailed tests of hypotheses as well as an introduction to statistical significance and probability as they relate to inferential statistics.
hypothesis testing The process of determining whether a hypothesis is supported by the results of a research study.
MODULE 7 |
Learning Objectives
•Understand how probability is used in everyday life.
•Know how to compute a probability.
•Understand and be able to apply the multiplication rule.
•Understand and be able to apply the addition rule.
•Understand the relationship between the standard normal curve and probability.
In order to better understand the nature of probabilistic decisions, consider the following court case of The People v. Collins, 1968. In this case, the robbery victim was unable to identify his assailant. All that the victim could recall was that the assailant was female with a blonde pony tail. In addition, he remembered that she fled the scene in a yellow convertible that was driven by an African American male who had a full beard. The suspect in the case fit the description given by the victim, so the question was “Could the jury be sure, beyond a reasonable doubt, that the woman on trial was the robber?” The evidence against her was as follows: She was blonde and often wore her hair in a pony tail; her codefendant friend was an African American male with a moustache, beard, and a yellow convertible. The attorney for the defense stressed the fact that the victim could not identify this woman as the woman who robbed him, and that therefore there should be reasonable doubt on the part of the jury.
The prosecutor, on the other hand, called an expert in probability theory who testified to the following: The probability of all of the above conditions (being blonde and often having a pony tail and having an African American male friend and his having a full beard, and his owning a yellow convertible) co-occurring when these characteristics are independent was 1 in 12 million. The expert further testified that the combination of characteristics was so unusual that the jury could in fact be certain “beyond a reasonable doubt” that the woman was the robber. The jury returned a verdict of “guilty” (Arkes & Hammond, 1986; Halpern, 1996).
As can be seen in the previous example, the legal system operates on probability and recognizes that we can never be absolutely certain when deciding whether an individual is guilty. Thus, the standard of “beyond a reasonable doubt” was established and jurors base their decisions on probability, whether they realize it or not. Most decisions that we make on a daily basis are, in fact, based on probabilities. Diagnoses made by doctors, verdicts produced by juries, decisions made by business executives regarding expansion and what products to carry, decisions regarding whether individuals are admitted to colleges, and most everyday decisions all involve using probability. In addition, all games of chance (for example, cards, horse racing, the stock market) involve probability.
If you think about it, there is very little in life that is certain. Therefore, most of our decisions are probabilistic and having a better understanding of probability will help you with those decisions. In addition, because probability also plays an important role in science, that is another important reason for us to have an understanding of it. As we will see in later modules, the laws of probability are critical in the interpretation of research findings.
Probability refers to the number of ways a particular outcome (event) can occur divided by the total number of outcomes (events). (Please note that the words outcome and event will be used interchangeably in this module.) Probabilities are often presented or expressed as proportions. Proportions vary between 0.0 and 1.0, where a probability of 0.0 means the event certainly will not occur and a probability of 1.0 means that the event is certain to occur. Thus, any probability between 0.0 and 1.0 represents an event with some degree of uncertainty to it. How much uncertainty depends on the exact probability with which we are dealing. For example, a probability close to 0.0 represents an event that is almost certain not to occur, and a probability close to 1.0 represents an event that is almost certain to occur. On the other hand, a probability of .50 represents maximum uncertainty. In addition, keep in mind that probabilities tell us about the likelihood of events in the long run, not the short run.
Let’s start with a simplistic example of probability. What is the probability of getting a “head” when tossing a coin? In this example, we have to consider how many ways there are to get a “head” on a coin toss (there is only one way, the coin lands heads up) and how many possible outcomes there are (there are two possible outcomes, either a “head” or a “tail”). So, the probability of a “head” in a coin toss is:
p(head)=NumberofwaystogetaheadNumberofpossibleoutcomes=12=.50p(head)=Number of ways to get a headNumber of possible outcomes=12=.50
This means that in the long run, we can expect a coin to land heads up 50% of the time.
Let’s consider some other examples. How likely would it be for an individual to roll a 2 in one roll of a die? Once again, let’s put this into basic probability terms. There is only one way to roll a 2, the die lands with the 2 side up. How many possible outcomes are there in a single roll of a die? There are six possible outcomes (any number between 1 and 6 could appear on the die). Hence, the probability of rolling a 2 on a single roll of a die would be 1/6, or about .17. Representing this in a formula as we did for the previous example:
p(2)=Numberofwaystogeta2Numberofpossibleoutcomes=16=.17p(2)=Number of ways to get a 2Number of possible outcomes=16=.17
Let’s make it a little more difficult. What is the probability of rolling an odd number in a single roll of a die? Well, there are three odd numbers on any single die (1, 3, and 5). Thus, there are three ways that an odd number can occur. Once again, how many possible outcomes are there in a single roll of a die? Six (any number between 1 and 6). Therefore, the probability of rolling an odd number on a single roll is 3/6, or .50. Represented as a formula this would be:
p(oddnumber)=NumberofwaystogetanoddnumberNumberofpossibleoutcomes=36=.50p(odd number)=Number of ways to get an odd numberNumber of possible outcomes=36=.50
What if I asked you what the probability of rolling a single-digit number is in a single roll of a die? A die has six numbers on it, and each is a single-digit number. Thus, there are six ways to get a single-digit number. How many possible outcomes are there in a single roll of a die? Once again, six. Hence, the probability of rolling a single-digit number is 6/6, or 1.0. If someone asked you to place a bet on this occurring, you could not lose on this bet! Once again, as a formula this would be:
p(single-digitnumber)=Numberofwaystogetasingle-digitnumberNumberofpossibleoutcomse=66=1.0p(single-digit number)=Number of ways to get a single-digit numberNumber of possible outcomse=66=1.0
Now that you have a basic idea of where probabilities come from, let’s talk a little bit more about how we use probabilities. Keep in mind that probabilities tell us something about what will happen in the long run. Therefore, when we think about using some of the probabilities that we just calculated, we have to think about using them in the long run. For example, we determined that the probability of rolling a 2 on a single roll of a die was .17. This means that over many rolls of the die, it will land with the 2 side up about 17% of the time. We cannot predict what will happen on any single roll of the die, but over many rolls of the die, we will roll a 2 with a probability of .17. This means that with a very large number of trials, we can predict with great accuracy what proportion of the rolls will end up as 2s. However, we cannot predict which particular rolls will yield a 2. So when we think about using probabilities, we need to think about using them for predictions in the long run, not the short run.
1.What is the probability of pulling a king from a standard (52-card) deck of playing cards?
2.What is the probability of pulling a spade from a standard deck of playing cards?
3.What is the probability of rolling an even number on a single roll of a die?
4.Imagine that you have a bag that contains 4 black poker chips and 7 red poker chips. What is the probability of pulling a black poker chip from the bag?
Often we are concerned with the probability of two or more events occurring and not just the probability of a single event occurring. For example, what is the probability of rolling at least one 4 in two rolls of a die, or what is the probability of getting two tails in two flips of a coin?
FIGURE 7.1 Tree diagram of possible coin toss outcomes
Let’s use the coin-toss example to determine the probability of two tails occurring in two flips of a coin. Based on what we discussed in the previous section, we know that the probability of a tail on one flip of a coin is 1/2, or 50. The same is true for the second toss, the probability of a tail is 1/2, or .50. However, let’s think about the possible outcomes for two tosses of a coin. One outcome is a head on the first toss and a head on the second toss (HH). The other outcomes would be a head followed by a tail (HT), a tail followed by a head (TH), and a tail followed by a tail (TT). These four possible outcomes are illustrated in the tree diagram in Figure 7.1.
Notice that the probability of two tails or any one of the other three possible outcomes is 1/4, or .25. But how are these probabilities calculated? The general rule that we apply here is known as the multiplication rule, or the and rule. When the events are independent and we want to know the probability of one event “and” another event, we use this rule.
The multiplication rule says that the probability of a series of outcomes occurring on successive trials is the product of their individual probabilities, when the events are independent (do not impact one another). Thus, when using the multiplication rule, we multiply the probability of the first event by the probability of the second event. Therefore, for the present problem, the probability of a tail in the first toss is 1/2, or .50, and the probability of a tail in the second toss is 1/2, or .50. When we multiply these two probabilities, we have .50 × .50 = .25. This should make some sense to you because the probability of both events occurring should be less than that of either event alone. We can represent the problem as follows:
multiplication rule A probability rule stating that the probability of a series of outcomes occurring on successive trials is the product of their individual probabilities, when the sequence of outcomes is independent.
p(tailonfirsttossandtailonsecondtoss)=p(tailonfirsttoss)×p(tailonsecondtoss)p(tail on first toss and tail on second toss)=p(tail on first toss) ×p(tail on second toss)
Let’s try another example. Assuming that the probabilities of having a girl and having a boy are both .50 for single-child births, what is the probability that a couple planning a family of three children would have the children in the following order: girl, girl, boy?
You can see in the tree diagram in Figure 7.2 that the probability of girl, girl, boy is .125. Let’s use the and rule to double-check this probability. The probability of a girl as the first child is 1/2, or .50. The same is true for the probability of a girl as the second child (.50) and the probability of a boy as the third child (.50). In order to determine the probability of this sequence of births, we multiply: .50 × .50 × .50 = .125.
FIGURE 7.2 Tree diagram of possible birth orders
In addition to being able to calculate probabilities based on a series of independent events (as in the preceding examples), we can also calculate the probability of one event or another event occurring on a single trial when the events are mutually exclusive. Mutually exclusive means that only one of the events can occur on a single trial. For example, a coin toss can either be heads or tails on a given trial, but not both. When dealing with mutually exclusive events, we apply what is known as the addition rule , which states that the probability of one outcome or the other outcome occurring on a particular trial is the sum of their individual probabilities. In other words, we are adding the two probabilities together. Thus, the probability of having either a girl or a boy when giving birth would be:
addition rule A probability rule stating that the probability of one outcome or another outcome occurring on a particular trial is the sum of their individual probabilities, when the outcomes are mutually exclusive.
p(girlorboy)=p(girl)+p(boy)=.50+.50=1.00p(girl or boy) = p(girl) + p(boy) = .50 + .50=1.00
This is sometimes referred to as the or rule because we are determining the probability of one event or the other event.
Let’s try another problem using the or rule. What is the probability of drawing either a club or a heart when drawing one card from a deck of cards? The probability of drawing a club is 13/52, or .25. The same holds for drawing a heart (13/52 = .25). Thus, the probability of drawing either a club or a heart card on a single draw would be .25 + .25 = .50.
p(cluborheart)=p(club)+p(heart)=.25+.25=.50p(club or heart) = p(club) + p(heart) = .25 + .25 = .50
THE RULES OF PROBABILITY
Rule | Explanation | Example |
The Multiplication Rule | The probability of a series of independent outcomes occurring on successive trials is the product of their individual probabilities. This is also known as the and rule because we want to know the probability of one event and another event. | In order to determine the probability of one coin toss of a head followed by (and) another coin toss of a head, we multiply the probability of each individual event: 50 × .50 = .25 |
The Addition Rule | The probability of one outcome or another outcome occurring on a particular trial is the sum of their individual probabilities when the two outcomes are mutually exclusive. This is also known as the or rule because we want to know the probability of one event or another event. | In order to determine the probability of tossing a head or a tail on a single coin toss, we sum the probability of each individual event: 50 +.50 = 1.0 |
1.Which rule, the multiplication rule or the addition rule, would be applied in each of the following situations?
a.What is the probability of a couple having a girl as their first child followed by a boy as their second child?
b.What is the probability of pulling a spade or a diamond from a standard deck of cards on a single trial?
c.What is the probability of pulling a spade (and then putting it back in the deck) followed by pulling a diamond from a standard deck of cards?
d.What is the probability of pulling a jack or a queen from a standard deck of cards on a single trial?
2.Determine the probability for each of the examples in exercise 1.
Probability and the Standard Normal distribution
As you might remember from Chapter 3, z scores can be used to determine proportions under the standard normal curve. In that chapter, we used z scores and the area under the standard normal curve to determine percentile ranks. We will now use z scores and the area under the standard normal curve (Table A.1) to determine probabilities. As you might remember, the standard normal curve has a mean of 0 and a standard deviation of 1. In addition, as discussed in Chapter 3, the standard normal curve is symmetrical and bell-shaped and the mean, median, and mode are all the same. Take a look at Figure 7.3, which represents the area under the standard normal curve in terms of standard deviations. We looked at this figure in the previous module (Figure 6.1), and based on this figure, we see that approximately 68% of the observations in the distribution fall between −1.0 and + 1.0 standard deviations from the mean. This approximate percentage holds for all data that are normally distributed. Notice also that approximately 13.5% of the observations fall between −1.0 and −2.0 and another 13.5% between +1.0 and +2.0, and that approximately 2% of the observations fall between − 2.0 and − 3.0 and another 2% between + 2.0 and + 3.0. Only .13% of the scores are beyond a z score of either ±3.0. If you sum the percentages in Figure 7.3, you will have 100%—all of the area under the curve, representing everybody in the distribution. If you sum half of the curve, you will have 50%—half of the distribution.
FIGURE 7.3 area under the standard normal curve
We can use the areas under the standard normal curve to determine the probability that an observation falls within a certain area under the curve. Let’s use a distribution that is normal to illustrate what we mean. Intelligence test scores are normally distributed with a mean of 100 and a standard deviation of 15. We could use the standard normal curve to determine the probability of randomly selecting someone from the population who had an intelligence score as high or higher than a certain amount. For example, if a school psychologist wanted to know the probability of selecting a student from the general population who had an intelligence test score of 119 or higher, we could use the area under the standard normal curve to determine this. First we have to convert the intelligence test score to a z score. As you might remember, the formula for a z score is:
z=X−μσz=X−μσ
where X represents the individual’s score on the intelligence test, Âµ represents the population mean, and σ represents the population standard deviation. Using this formula, we can calculate the individual’s z score as follows:
z=X−μσ=119−10015=1915=+1.27z=X−μσ=119−10015=1915=+1.27
FIGURE 7.4 Standard normal curve with z = +1.27 indicated
Thus, we know that this individual’s z score falls +1.27 standard deviations above the mean. As in Chapter 3, it is helpful to represent this on a figure where the z score of + 1.27 is indicated. This is illustrated in Figure 7.4.
Now, in order to determine the probability of selecting a student with an intelligence test score of 119 or higher, we need to turn to Table A.1 in Appendix A. We begin by looking up a z score of 1.27 and find that for this score, a proportion of .39797 of the scores fall between the score and the mean of the distribution and a proportion of .10203 of the scores fall beyond the score. Referring to Figure 7.4, we see that the proportion of the curve in which we are interested is the area beyond the score, or .10203. This means that the probability of randomly selecting a student with an intelligence test score of 119 or higher is .10203, or just slightly higher than 10%. We can represent this problem in standard probability format as follows:
p(X≥119)=.10203p(X≥119)=.10203
Let’s try a couple more probability problems using the intelligence test score distribution. First, what is the probability of the school psychologist randomly selecting a student with an intelligence test score of 85 or higher? Secondly, what is the probability of the school psychologist selecting a student with an intelligence test score of 70 or lower?
Let’s begin with the first problem. We need to convert the intelligence test score to a z score and then consult Table A.1.
z=X−μσ=85−10015=−1515=−1.0z=X−μσ=85−10015=−1515=−1.0
When we consult Table A.1, we find that for a z score of −1.0, 15866 of the scores fall below this score and .34134 of the scores fall between this score and the mean of the distribution. This z score is illustrated in Figure 7.5 along with the area in which we are interested—the probability of a student with an intelligence test score of 85 or higher being selected.
FIGURE 7.5 Standard normal curve with z = −1.0 indicated
In order to determine the probability of selecting a student with an intelligence test score this high or higher, we take the area between the mean and the z score (.34134) and add the .50 from the other half of the distribution to it. Hence, the probability of selecting a student with an intelligence test score of 85 or higher is .84134, or approximately 84%. You should see that the probability of this happening is fairly high because when we look at Figure 7.5 we are talking about a large proportion of people who fit this description. This can be represented as follows:
P(X≥85)=.84134P(X≥85)=.84134
Let’s work the second problem, the probability of selecting a student with an intelligence test score of 70 or lower. Once again we begin by converting this score into a z score.
z=X−μσ=70−10015=−3015=−2.0z=X−μσ=70−10015=−3015=−2.0
Next, we represent this on a figure with the z score indicated along with the area in which we are interested (anyone with this score or a lower score). This is illustrated in Figure 7.6.
Consulting Table A.1, we find that for a z score of −2.0, .02275 of the scores are below the score (beyond it) and .47725 of the scores are between the score and the mean of the distribution. We are interested in the probability of selecting a student with an intelligence test score of 70 or lower. Can you figure out what that would be? If you answered .02275, you are correct. Therefore, there is slightly more than a 2% chance of selecting a student with an intelligence test score this low or lower—a fairly low probability event. This can be represented as follows:
p(X≤70)=.02275p(X≤70)=.02275
FIGURE 7.6 Standard normal curve with z = −2.0 indicated
Let’s apply what we have learned about using the standard normal curve to calculate probabilities together with the addition rule from earlier in the module to determine the probability of selecting a child whose intelligence test score is 70 or lower or 119 or higher. We have already determined the z scores for each of these intelligence test scores in our previous problems. The intelligence test score of 70 converts to a z score of −2.0 and the intelligence test score of 119 converts to a z score of + 1.27. Moreover, we have already determined that the probability of selecting a student with a score of 70 or lower is .02275 and that the probability of selecting a student with an intelligence test score of 119 or higher is .10203. Thus, applying the addition rule, the probability of selecting a student with a score that is 70 or lower or 119 or higher would be the sum of these two probabilities, or .02275 + .10203. These two probabilities sum to .12478, or just about 12.5%. This can be represented as follows:
=p(X≤70orX≥119)=p(X≤70)+p(X≥119)=p(X≤70orX≥119)=(.02275)+(.10203)=p(X≤70orX≥119)=(.02275)+(.10203)=p(X≤70 or X≥119)= p(X≤70)+p(X≥119)=p(X≤70 or X≥119)= (.02275)+(.10203)=p(X≤70 or X≥119)=(.02275)+(.10203)
Now let’s turn to using the multiplication rule, discussed earlier in the module, with the area under the standard normal curve (Table A.1). In this case, we want to determine the probability of selecting two students who fit different descriptions. For example, what is the probability of selecting one student with an intelligence test score equal to or below 80, followed by another student with an intelligence test score equal to or above 125? Once again, we begin by converting the scores to z scores.
z=X−μσ=80−10015=−2015=−1.33z=X−μσ=80−10015=−2015=−1.33
z=X−μσ=125−10015=2515=+1.67z=X−μσ=125−10015=2515=+1.67
Consequently, the intelligence test scores convert to z scores of −1.33 and +1.67, respectively. Next we use Table A.1 to determine the probability of each of these events. Consulting Table A.1, we find that the probability of selecting a student with a score of 80 or lower is .09175 and the probability of selecting a student with a score of 125 or higher is .04745. These z scores and proportions are represented in Figure 7.7.
We now apply the multiplication rule to determine the probability of selecting the first person followed by the second person. Thus, we multiply the first probability by the second probability, or .09175 × .04745 = .00435. This can be represented as follows:
=p(X≤80andX≥125)=p(X≤80)+p(X≥125)=p(X≤80andX≥125)=(.09175)+(.04745)=p(X≤80andX≥119)=.00435=p(X≤80 and X≥125)= p(X≤80)+p(X≥125)=p(X≤80 and X≥125)= (.09175)+(.04745)=p(X≤80 and X≥119)= .00435
FIGURE 7.7 Standard normal curve with z = −1.33 and z = +1.67 indicated
Thus, the probability of the first event followed by the second event has a very low probability of less than 1%.
1.If SAT scores are normally distributed with a mean of 1,000 and a standard deviation of 200, what is the probability of a student scoring 1,100 or higher on the SAT?
2.For this hypothetical SAT distribution, what is the probability of a student scoring 910 or lower on the SAT?
3.For this hypothetical SAT distribution, what is the probability of a student scoring 910 or lower or 1,100 or higher?
4.For this hypothetical SAT distribution, what is the probability of selecting a student who scored 910 or lower followed by a student who scored 1,100 or higher on the SAT?
addition rule (p. 117)
hypothesis testing (p. 111)
multiplication rule (p. 116)
probability (p. 111)
(Answers to odd-numbered questions appear in Appendix B.)
1.Imagine that I have a jar that contains 50 blue marbles and 20 red marbles.
a.What is the probability of selecting a red marble from the jar?
b.What is the probability of selecting a blue marble from the jar?
c.What is the probability of selecting either a red or a blue marble from the jar?
d.What is the probability of selecting a red marble (with replacement) followed by a blue marble?
2.What is the probability of a couple having children in the following birth order: boy, boy, boy, boy?
3.What is the probability of selecting either a 2 or a 4 (of any suit) from a standard deck of cards?
4.If height is normally distributed with a mean of 68 inches and a standard deviation of 5 inches, what is the probability of selecting someone who is 70 inches or taller?
5.For the distribution described in exercise 4, what is the probability of selecting someone who is 64 inches or shorter?
6.For the distribution described in exercise 4, what is the probability of selecting someone who is 70 inches or taller or 64 inches or shorter?
7.For the distribution described in exercise 4, what is the probability of selecting someone who is 70 inches or taller followed by someone who is 64 inches or shorter?
CRITICAL THINKING CHECK ANSWERS
Critical Thinking Check 7.1
1.452=.077452=.077
2.1352=.251352=.25
3.36=.5036=.50
4.411=.36411=.36
Critical Thinking Check 7.2
1.
(a)Multiplication rule
(b)Addition rule
(c)Multiplication rule
(d)Addition rule
2.
(a).50 × .50 = .25
(b)(1352)+(1352)=.25+.25=.50(1352)+(1352)=.25+.25=.50
(c)(1352)+(1352)=.25+.25=.0625(1352)+(1352)=.25+.25=.0625
(d)(452)+(452)=.077+0.77=.154(452)+(452)=.077+0.77=.154
Critical Thinking Check 7.3
1.z = +1.50, p = .30854
2.z = −.45, p = .32634
3..32634 +.30854 = .63488
4..32634 × .30854 = .101
MODULE 8 |
Learning Objectives
•Differentiate null and alternative hypotheses.
•Differentiate one- and two-tailed hypothesis tests.
•Explain how Type I and Type II errors are related to hypothesis testing.
•Explain what statistical significance means.
•Explain the difference between a parametric test and a nonparametric test.
Research is usually designed to answer a specific question, for example, “Do science majors score higher on tests of intelligence than students in the general population?” The process of determining whether this statement is supported by the results of the research project is referred to as hypothesis testing.
Suppose a researcher wants to examine the relationship between type of after-school program attended by a child and intelligence level. The researcher is interested in whether students who attend an after-school program that is academically oriented (math, writing, computer use) score differently on an intelligence test than students who do not attend such programs. The researcher will form a hypothesis. The hypothesis might be that children in academic after-school programs will have different IQ scores than children in the general population. Because most intelligence tests are standardized with a mean score (μ) of 100 and a standard deviation (σ) of 15, the students in the academic after-school program would have to score higher or lower than 100 for the hypothesis to be supported.
Null and Alternative Hypotheses
Most of the time, researchers are interested in demonstrating the truth of some statement. In other words, they are interested in supporting their hypothesis. It is impossible statistically, however, to demonstrate that something is true. In fact, statistical techniques are much better at demonstrating that something is not true. This presents a dilemma for researchers. They want to support their hypotheses, but the techniques available to them are better for showing that something is false. What are they to do? The logical route is to propose exactly the opposite of what they want to demonstrate to be true, then disprove or falsify that hypothesis. What is left (the initial hypothesis) must then be true (Kranzler, 2007).
Let’s use our sample hypothesis to demonstrate what we mean. We want to show that children who attend academic after-school programs have different IQ scores from those who do not. We understand that statistics cannot demonstrate the truth of this statement. We therefore construct what is known as a null hypothesis (symbol H0). Whatever the research topic, the null hypothesis always predicts that there is no difference between the groups being compared. This is typically what the researcher does not expect to find. Think about the meaning of null—nothing or zero. The null hypothesis means you have found nothing—no difference between the groups.
null hypothesis The hypothesis predicting that no difference exists between the groups being compared.
For the sample study, the null hypothesis would be that children who attend academic after-school programs are of the same intelligence level as other children. Remember, we said that statistics allow us to disprove or falsify a hypothesis. Therefore, if the null hypothesis is not supported, our original hypothesis—that children who attend academic after-school programs have different IQs from other children—is all that is left. In statistical notation, the null hypothesis for this study would be:
H0: μ0 = μacademic program = μgeneral population
The purpose of the study, then, is to decide whether H0 is probably true or probably false.
The hypothesis that the researcher wants to support is known as the alternative hypothesis (Ha), or the research hypothesis (H1). The statistical notation for Ha is:
Ha:μ0≠μ1,orμacademicprogram≠μgeneralpopulationHa:μ0≠μ1, or μacademic program≠ μgeneral population
alternative hypothesis (research hypothesis) The hypothesis that the researcher wants to support, predicting that a significant difference exists between the groups being compared.
When we use inferential statistics, we are trying to reject H0, which means that Ha is supported.
Two-Tailed and One-Tailed Hypothesis Tests
The manner in which the previous alternative hypothesis (Ha) was stated reflects what is known statistically as a two-tailed hypothesis , or a nondirectional hypothesis—an alternative hypothesis in which the researcher expects to find differences between the groups but is unsure what the differences will be. In this case, the researcher would predict a difference in IQ scores between children in academic after-school programs and those in the general population, but the direction of the difference would not be predicted. Those in academic programs would be expected to have either higher or lower IQs but not the same IQs as the general population of children. The statistical notation for a two-tailed test is as follows:
H0:μ1≤μ1,orμacademicprogram=μgeneralpopulationHa:μ0≤μ1,orμacademicprogram≠μgeneralpopulationH0:μ1≤μ1, or μacademic program=μgeneral populationHa:μ0≤μ1, or μacademic program≠μgeneral population
two-tailed hypothesis (nondirectional hypothesis) An alternative hypothesis in which the researcher predicts that the groups being compared differ but does not predict the direction of the difference.
The alternative to a two-tailed or nondirectional test is a one-tailed hypothesis , or a directional hypothesis—an alternative hypothesis in which the researcher predicts the direction of the expected difference between the groups. In our example, the researcher would predict the direction of the difference—namely, that children in academic after-school programs will be more intelligent than children in the general population. Thus, the alternative hypothesis would be written as follows:
Ha:μ0>μ1;orμacademicprogram>μgeneralpopulationHa:μ0>μ1; or μacademic program > μgeneral population
one-tailed hypothesis (directional hypothesis) An alternative hypothesis in which the researcher predicts the direction of the expected difference between the groups.
When we use a directional alternative hypothesis, the null hypothesis is also, in some sense, directional. If the alternative hypothesis is that children in academic after-school programs will have higher intelligence test scores, then the null hypothesis is that being in academic after-school programs either will have no effect on intelligence test scores or will decrease intelligence test scores. Thus, the null hypothesis for the one-tailed directional test might more appropriately be written as follows:
H0:μ0≤μ1;orμacademicprogram≤μgeneralpopulationH0: μ0≤μ1; or μacademic program ≤ μgeneral population
In other words, if the alternative hypothesis for a one-tailed test is μ0 > μ1, then the null hypothesis is μ0 ≤ μ1, and to reject H0, the children in academic after-school programs have to have intelligence test scores higher than those in the general population. In our example, the one-tailed hypothesis makes more sense.
Assume that the researcher has selected a random sample of children from academic after-school programs to compare their IQs with the IQs of children in the general population (as noted previously, we know that the mean IQ for the population is 100). If we collected data and found that the mean intelligence level of the children in academic after-school programs is “significantly” (a term that will be discussed shortly) higher than the mean intelligence level for the general population, we could reject the null hypothesis. Remember that the null hypothesis states that no difference exists between the sample and the population. Thus, the researcher concludes that the null hypothesis—that there is no difference or that children performed more poorly—is not supported. When the null hypothesis is rejected, the alternative hypothesis—that those in academic programs have higher IQ scores than those in the general population—is supported. We can say that the evidence suggests that the sample of children in academic after-school programs represents a specific population that scores higher on the IQ test than the general population.
If, on the other hand, the mean IQ scores of the children in academic after-school programs are not significantly different from the population mean score, then the researcher has failed to reject the null hypothesis and, by default, has failed to support the alternative hypothesis. In this case, the alternative hypothesis—that the children in academic programs have higher IQs than the general population—is not supported.
Type I and Type II Errors in Hypothesis Testing
Any time we make a decision using statistics, there are four possible outcomes (see Table 8.1). Two of the outcomes represent correct decisions, whereas two represent errors. Let’s use our example to illustrate these possibilities.
TABLE 8-1 The four possible outcomes in statistical decision making
THE TRUTH (UNKNOWN TO THE RESEARCHER) | ||
THE RESEARCHER’S DECISION | H0 is true | H0 is false |
Reject H0 (say it is false) | Type I error | Correct decision |
Fail to reject H0 (say it is true) | Correct decision | Type II error |
If we reject the null hypothesis (the hypothesis stating that there is no IQ difference between groups), we may be correct in our decision, or we may be incorrect. If our decision to reject H0 is correct, that means there truly is a difference in IQ between children in academic after-school programs and the general population of children. However, our decision could be incorrect. The result may have been due to chance. Even though we observed a significant difference in IQ between the children in our study and the general population, the result might have been a fluke—maybe the children in our sample just happened to guess correctly on a lot of the questions. In this case, we have made what is known as a Type I error —we rejected H0 when in reality we should have failed to reject it (it is true that there really is no IQ difference between the sample and population). Type I errors can be thought of as false alarms—we said there was a difference, but in reality there is no difference.
Type I error An error in hypothesis testing in which the null hypothesis is rejected when it is true.
What if our decision is to not reject H0, meaning we conclude that there is no difference in IQ between the children in the academic after-school program and children in the general population? This decision could be correct, meaning that in reality there is no IQ difference between the sample and the population. However, it could also be incorrect. In this case, we would be making a Type II error —saying there is no difference between groups when in reality there is a difference. Somehow we have missed the difference that really exists and have failed to reject the null hypothesis when it is false. All of these possibilities are summarized in Table 8.1.
Probability, Statistical Significance, and Errors
Type II error An error in hypothesis testing in which there is a failure to reject the null hypothesis when it is false.
Suppose we actually did the study on IQ level and academic after-school programs. In addition, suppose we found that there was a difference between the IQ levels of children in academic after-school programs and children in the general population (those in the academic programs scored higher). Lastly, suppose that this difference is statistically significant at the .05 (or the 5%) level (also known as the .05 alpha level). To say that a result has statistical significance at the .05 level means that a difference as big as or bigger than what we observed between the sample and the population could have occurred by chance only 5 times or fewer out of 100. In other words, the likelihood that this result is due to chance is small. If the result is not due to chance, then it is most likely due to a true or real difference between the groups. If our result were statistically significant, we would reject the null hypothesis and conclude that we have observed a significant difference in IQ scores between the sample and the population.
statistical significance An observed difference between two descriptive statistics (such as means) that is unlikely to have occurred by chance.
Remember, however, that when we reject the null hypothesis, we could be correct in our decision, or we could be making a Type I error. Maybe the null hypothesis is true, and this is one of those 5 or fewer times out of 100 when the observed differences between the sample and the population did occur by chance. This means that when we adopt the .05 level of significance (the .05 alpha level), as often as 5 times out of 100 we could make a Type I error. The .05 level, then, is the probability of making a Type I error (for this reason, it is also referred to as a p value, which means probability value—the probability of a Type I error). In the social and behavioral sciences, alpha is typically set at .05 (as opposed to .01 or .08 or anything else). This means that researchers in these areas are willing to accept up to a 5% risk of making a Type I error.
What if you want to reduce your risk of making a Type I error and decide to use the .01 alpha level, reducing the risk of a Type I error to 1 out of 100 times? This seems simple enough: Simply reduce alpha to .01, and you have reduced your chance of making a Type I error. By doing this, however, you have now increased your chance of making a Type II error. Do you see why? If I reduce my risk of making a false alarm—saying a difference is there when it really is not—I increase my risk of missing a difference that really is there. When we reduce the alpha level, we have insisted on more stringent conditions for accepting our research hypothesis, making it more likely that we could miss a significant difference when it is present. We will return to Type I and Type II errors in the next module when we cover statistical power and discuss alternative ways of addressing this problem.
Which type of error, Type I or Type II, do you think is considered more serious by researchers? Most researchers consider a Type I error more serious. They would rather miss a result (Type II error) than conclude that there is a meaningful difference when there really is not (Type I error). What about in other arenas—for example, in the courtroom? A jury could make a correct decision in a case (find guilty when truly guilty, or find innocent when truly innocent). They could also make either a Type I error (say guilty when innocent) or Type II error (say innocent when guilty). Which is more serious here? Most people believe that a Type I error is worse in this situation also. How about in the medical profession? Imagine a doctor attempting to determine whether or not a patient has cancer. Here again, the doctor could make one of the two correct decisions or could make one of the two types of errors. What would the Type I error be? This would be saying that cancer is present when in fact it is not. What about the Type II error? This would be saying that there is no cancer when in fact there is. In this situation, most people would consider a Type II error to be more serious.
HYPOTHESIS TESTING
Concept | Description | Example |
Null Hypothesis | The hypothesis stating that the independent variable has no effect and that there will be no difference between the two groups | H0: μ0 = μ1 (two-tailed)
H0: μ0 ≤ μ1 (one-tailed) H0: μ0 ≥ μ1 (one-tailed) |
Alternative Hypothesis or Research Hypothesis | The hypothesis stating that the independent variable has an effect and that there will be a difference between the two groups | Ha: μ0 ≠ μ1 (two-tailed)
Ha: μ0 > μ1 (one-tailed) Ha: μ0 < μ1 (one-tailed) |
Two-Tailed or Nondirectional Test | An alternative hypothesis stating that a difference is expected between the groups, but there is no prediction as to which group will perform better or worse | The mean of the sample will be different from or unequal to the mean of the general population |
One-Tailed or Directional Test | An alternative hypothesis stating that a difference is expected between the groups, and it is expected to occur in a specific direction | The mean of the sample will be greater than the mean of the population, or the mean of the sample will be less than the mean of the population |
Type I Error | The error of rejecting H0 when we should have failed to reject it | This error in hypothesis testing is equivalent to a “false alarm,” saying that there is a difference when in reality there is no difference between the groups |
Type II Error | The error of failing to reject H0 when we should have rejected it | This error in hypothesis testing is equivalent to a “miss,” saying that there is not a difference between the groups when in reality there is difference between the groups |
Statistical Significance | When the probability of a Type I error is low (.05 or less) | The difference between the groups is so large that we conclude it is due to something other than chance |
1.A researcher hypothesizes that children from the South weigh less (because they spend more time outside) than the national average. Identify H0 and Ha. Is this a one- or two-tailed test?
2.A researcher collects data on children’s weights from a random sample of children in the South and concludes that children from the South weigh less than the national average. The researcher, however, did not realize that the sample included many children who were small for their age and that in reality there is no difference in weight between children in the South and the national average. What type of error was made?
3.If a researcher decides to use the .10 level rather than using the conventional .05 level of significance, what type of error is more likely to be made? Why? If the .01 level is used, what type of error is more likely? Why?
Now that we have an understanding of the concept of hypothesis testing, we can begin to discuss how hypothesis testing is used. The simplest type of study involves only one group and is known as a single-group design . The single-group design lacks a comparison group—there is not a control group of any sort. We can, however, compare the performance of the group (the sample) to the performance of the population (assuming that population data are available).
single-group design A research study in which there is only one group of participants.
Earlier in the module, we illustrated hypothesis testing using a single-group design—comparing the IQ scores of children in academic after-school programs (the sample) to the IQ scores of children in the general population. The null and alternative hypotheses for this study were:
H0:μ0≤μ1,orμacademicprogram≤μgeneralpopulationHa:μ0≤μ1,orμacademicprogram>μgeneralpopulationH0:μ0≤μ1, or μacademic program ≤μgeneral population Ha:μ0≤μ1, or μacademic program >μgeneral population
To compare the performance of the sample to that of the population, we need to know the population mean (μ) and the population standard deviation (σ). We know that for IQ tests, μ = 100 and σ = 15. We also need to decide who will be in the sample. Random selection will increase our chances of getting a representative sample of children enrolled in academic after-school programs. How many children do we need in the sample? We will see in later modules that the larger the sample, the greater the power of the study. We will also see that one of the assumptions of the statistical procedure we will be using to test our hypothesis is a sample size of 30 or more.
Once we have chosen our sample, we need to collect the data. To collect IQ score data, we could either administer an intelligence test to the children or look at their academic files to see whether they had already taken such a test.
Once the data are collected, we can begin to analyze them using inferential statistics —procedures for drawing conclusions about a population based on data collected from a sample. Inferential statistics involve the use of procedures for drawing conclusions based on the scores collected in a research study and going beyond them to make inferences about a population. In the following chapter (Chapter 5) we will describe two inferential statistical tests—the z test and t test. Both of these are parametric tests —tests that require us to make certain assumptions about estimates of population characteristics, or parameters. These assumptions typically involve knowing the mean (μ) and standard deviation (σ) of the population and that the population distribution is normal. Parametric tests are generally used with interval or ratio data. The alternative to a parametric test is a nonparametric test ; that is, it does not involve the use of any population parameters. In other words, μ and σ are not needed, and the underlying distribution does not have to be normal. Nonparametric tests are most often used with ordinal or nominal data and will be discussed more fully in Chapter 10.
inferential statistics Procedures for drawing conclusions about a population based on data collected from a sample.
parametric test A statistical test that involves making assumptions about estimates of population characteristics, or parameters.
nonparametric test A statistical test that does not involve the use of any population parameters—μ and s are not needed, and the underlying distribution does not have to be normal.
SINGLE-SAMPLE RESEARCH AND INFERENTIAL STATISTICS
Concept | Description | Examples |
Parametric Inferential Statistics | Inferential statistical procedures that require certain assumptions about the parameters of the population represented by the sample data, such as knowing μand s and that the distribution is normal Most often used with interval or ratio data | z test
t test (discussed in Chapter 5) |
Nonparametric Inferential Statistics | Inferential procedures that do not require assumptions about the parameters of the population represented by the sample data; μ and σ are not needed, and the underlying distribution does not have to be normal Most often used with ordinal or nominal data | Chi-square tests Wilcoxon tests (discussed in Chapter 10) |
1.How do inferential statistics differ from descriptive statistics?
2.How does single-sample research involve the use of hypothesis testing? In other words, in a single-group design, what hypothesis is tested?
alternative hypothesis (research hypothesis) (p. 126)
inferential statistics (p. 131)
nonparametric test (p. 132)
null hypothesis (p. 126)
one-tailed hypothesis (directional hypothesis) (p. 126)
parametric test (p. 131)
single-group design (p. 131)
statistical significance (p. 128)
two-tailed hypothesis (nondirectional hypothesis) (p. 126)
Type I error (p. 128)
Type II error (p. 128)
(Answers to odd-numbered questions appear in Appendix B.)
1.The admissions counselors at Brainy University believe that the freshman class they have just recruited is the brightest yet. If they wanted to test this belief (that the freshmen are brighter than the other classes), what would the null and alternative hypotheses be? Is this a one- or two-tailed hypothesis test?
2.To test the hypothesis in exercise 1, the admissions counselors select a random sample of freshmen and compare their scores on the SAT to those of the population of upper-classmen. They find that the freshmen do in fact have a higher mean SAT score. However, what they are unaware of is that the sample of freshmen was not representative of all freshmen at Brainy University. In fact, the sample over-represented those with high scores and under-represented those with low scores. What type of error (Type I or Type II) did the counselors make?
3.A researcher believes that family size has increased in the last decade in comparison to the previous decade—that is, people are now having more children than they were before. What would the null and alternative hypotheses be in a study designed to assess this? Is this a one- or two-tailed hypothesis test?
4.What are the appropriate H0 and Ha for each of the following research studies? In addition, note whether the hypothesis test is one- or two-tailed.
a.A study in which researchers want to test whether there is a difference in spatial ability between left- and right-handed people
b.A study in which researchers want to test whether nurses who work 8-hour shifts deliver higher-quality work than those who work 12-hour shifts
c.A study in which researchers want to determine whether crate-training puppies is superior to training without a crate
5.Assume that each of the following conclusions represents an error in hypothesis testing. Indicate whether each of the statements is a Type I or II error.
a.Based on the data, the null hypothesis was rejected.
b.There was no significant difference in quality of work between nurses who work 8- and 12-hour shifts.
c.There was a significant difference between right- and left-handers in their ability to perform a spatial task.
d.The researcher failed to reject the null hypothesis based on these data.
6.Explain the difference between parametric and nonparametric statistics.
CRITICAL THINKING CHECK ANSWERS
Critical Thinking Check 8.1
1.H0:μSouthernchildren≥μchildreningeneralHa:μSouthernchildren<μgeneralpopulationH0: μSouthern children≥μchildren in generalHa:μSouthern children<μgeneral population
This is a one-tailed test.
2.The researcher concluded that there was a difference when in reality there was no difference between the sample and the population. This is a Type I error.
3.With the .10 level of significance, the researcher is willing to accept a higher probability that the result may be due to chance. Therefore, a Type I error is more likely to be made than if the researcher used the more traditional .05 level of significance. With a .01 level of significance, the researcher is willing to accept only a .01 probability that the result may be due to chance. In this case, a true result is more likely to be missed, meaning that a Type II error is more likely.
Critical Thinking Check 8.2
1.Inferential statistics allow researchers to make inferences about a population based on sample data. Descriptive statistics simply describe a data set.
2.Single-sample research allows researchers to compare sample data to population data. The hypothesis tested is whether the sample performs similarly to the population or whether the sample differs significantly from the population and, thus, represents a different population.
CHAPTER FOUR SUMMARY AND REVIEW |