|
|
In order to perform adequate tests of claims, researchers must include systematic observations in their studies — observations made according to a plan. The specifics of the plan depend upon the claim being tested. For example, college aptitude tests[∂] (such as the SAT) are used to predict how well a person will do in college, typically in terms of grade-point average (GPA). If we want to test the claim that a new aptitude test predicts future GPAs, then our systematic observations must include the following:
If a strong positive correlation is found between scores and GPA, then we can conclude that the claim is supported by our results (that is, we can feel more certain that the claim is correct). Other claims will require different plans for systematic observations. You learned in an earlier section that the goal of research is to derive generalizations [∂] — to derive (infer) a general statement about something from a number of specific observations. For example, let's say that five studies found a positive correlation between aptitude-test-scores and cumulative GPAs. Based on these observations, we could generalize from them by concluding that scores on aptitude tests are good predictors of which students will do well in their college courses (on average). By deriving this generalization from the observations made in the five studies, we are claiming that the aptitude-test scores of students who did not participate in any of these five studies still will predict well the GPAs they will tend to get in college. A fundamental problem with generalizations, however, is that they are based on a limited number of observations. In order to increase the likelihood that a generalization is accurate, the observations on which it is based must be sufficient in terms of both their relevance and their number. Observations are relevant when they are appropriate to the generalization one wishes to make. For example, a pollster (a person who conducts or analyzes opinion polls) would not generalize about which of two political candidates was more likely to win based on interviews of a group of children. It is best to interview a group of registered voters (especially those considered most likely to vote). Furthermore, an adequate number of such voters would need to be interviewed: a pollster would not generalize about which of two political candidates was more likely to win after interviewing only five registered voters. It is essential that the group interviewed contains a large number of people from the particular population of people which a pollster wishes to generalize to. Representative SamplesIn deciding which of two candidates is most likely to win an election, the people interviewed (that is, the sample) would need to be similar to all people likely to vote in the election (the population of all voters). A sample is the set of observations made by researchers. The sample is selected from a population, which is the total number of relevant observations that could be made if researchers had unlimited time and resources. In most studies, the population is too large for researchers to observe each individual. Instead, they must select a sample from the population to observe. If researchers are to make an accurate generalization about what the population does or will do, they must select a representative sample — a sample that is similar to the population with respect to essential characteristics. In predicting the outcome of an election, the sample of registered voters interviewed must be similar to the population in terms of age, race, ethnicity, gender, political affiliations, and so on. If one or more of these characteristics deviate significantly from the population, then the study's results are based on a biased sample. A biased sample will lead to generalizations about the population that are inaccurate: the greater the deviations of the sample are from the average characteristics of the population, the greater the inaccuracy of generalizations will be. A famous example of this occurred just before the 1936 U.S. presidential election. Franklin Roosevelt was the incumbent Democratic president running against a Republican challenger by the name of Alfred Landon. Landon was supported primarily by those who had survived the initial economic losses of the Great Depression and were still relatively well-off financially. Roosevelt was supported primarily by people hit hard by the economic collapse. In predicting the outcome of the election, a magazine called Literary Digest sent questionnaires to about 10,000,000 Americans (American Social History Project, 2006; Goodwin, 1995). Their sample included subscribers to the magazine as well as a large number of people selected from phone books and motor-vehicle registration records. The pollsters received responses from about 2.5 million people, which is an extraordinarily large number of observations, especially considering that the population of the United States was only about 130 million in 1936. Almost 60% of the respondents[∂] stated that they were going to vote for Landon, whereas only about 40% stated that they were going to vote for Roosevelt. Based on this finding, the pollsters predicted that Landon would win in a landslide. The actual results of the election were reversed: Roosevelt won in a landslide (he received about 60% of the popular vote). What went wrong with the magazine's polling? It may not be immediately obvious today — a time period in which virtually everyone has at least a couple of telephones (land line and cell) and virtually every family owns at least one car. But in the middle of the Depression, car and telephone ownership were much less common: many people couldn't afford them. Thus, wealthier people were much more likely to appear in telephone books and car-registration records. What the Literary Digest study had done was poll primarily the well-off and Republican in a country that was primarily poor and Democratic. Another problem was that only about 25% of the original questionnaires were returned. It seems likely that there is a difference between the minority who would take the time to fill out a questionnaire and send it back, and the majority who probably tossed it in the garbage. So, even collecting a sample of 2.5 million people does not guarantee that the results will provide an accurate picture of the population. This very large number of people still made up a biased sample. Controlling Extraneous VariablesMost of you probably have wondered how much you need to study in order to do well in your courses. Some of you may have heard the rule-of-thumb[∂] that states that you should study two (sometimes, three) hours outside of class for every hour you spend in class. We'll refer to this as the "2-for-1 Rule." The rule claims that, if you go to class for three hours each week, you should study six hours outside of class each week in order to do well in the course. How could you test this claim for its accuracy? Perhaps you could test it by remembering courses you have taken in the past. You might remember that you took an American history course last semester and received an A even though you rarely opened the textbook. Rather, you simply listened carefully in class and took good notes, which you reviewed just before each test. In fact, you now recall that you received all As and Bs last semester with very little studying outside of class. Do these observations show that the 2-for-1 Rule is wrong? Not necessarily. It may be that the courses you took last semester were not a representative sample of courses offered at the college. They may have been less demanding than most other courses. Or it could be that you are misremembering how much you actually studied for your courses. In other words, you were not making systematic observations when you simply tried to recall what happened in a few courses that you took last semester. Let's look at a fictional research study that includes systematic observations capable of testing the 2-for-1 Rule. In our study, let's say that we asked a sample of 80 students to take a week-long course that met every day (Monday through Thursday) for one hour, for a total of four hours of class time, with a test on the last day (Friday). Two variables were measured: the number of hours spent studying and test scores. The students were split into four groups (20 students in each group), and each group was asked to study a different number of hours outside of class for the test (see Table 1; adapted from Goodwin, 1995, pp. 135-36).
Table 1. Design of an experiment for testing the 2-for-1 Rule Now, let's say that we discovered that Group 4, which had studied two hours for every hour spent in class, did best on Friday's test, Group 3 was next, Group 2 followed them, and Group 1 did the worst on the test. We then could conclude that the more hours spent studying, the better that one will do on tests. Is this a reasonable generalization to make? Although it may seem as if the study included systematic observations that support the generalization, you may have noticed a problem with the study. The four groups of students differed not only in the total number of hours that they studied, but also in the number of days between the last time that they studied and the time that they took the test (which is called the retention interval). Because of this, we cannot know whether the differences observed among the groups in test scores were due to the different amounts of time spent studying, the different retention intervals, or both. When making systematic observations to test a causal claim (such as the claim that two hours spent studying for every hour spent in class causes better test performance), the most important component of the plan is the need to "control for"[∂] the effects of extraneous variables[∂]. In our example, we were unable to generalize about the effect on test scores of the amount of time spent studying because we did not control for the effect of retention interval. When we control a research situation, we want to be left with only one possible explanation for the results of a study — an explanation that involves only the factor being investigated. In our study, however, there are three possible explanations for the results, none of which can be ruled out:
In order to systematically observe, we need to control for the extraneous variable of retention interval. How could we have controlled for the effects of retention interval? Perhaps we could have had the groups study according to the schedule in Table 2.
Table 2. Design of a second experiment for testing the 2-for-1 Rule In this case, each group would study only the day before the test, which would control for the extraneous variable of retention interval. But would this schedule allow us to achieve our goal of observing in a systematic manner? No, because it would introduce another extraneous variable: anyone who studies for eight hours on one day will suffer much more fatigue and, thus, have more trouble learning the material than someone who studies only two hours. In our study, we need to control for the extraneous variables of fatigue and retention interval at the same time. The schedule in Table 3 would allow us to do this.
Table 3. Design of a third experiment for testing the 2-for-1 Rule If we now find that the students in Group 4 receive the highest average test scores, Group 3 the second highest, Group 2 the third highest, and Group 1 the lowest test scores, we can conclude that spending more time studying causes students to receive higher test scores. Nevertheless, no matter how much care researchers take to control for the effects of extraneous variables, it always is possible that they will miss one or more extraneous variables because they did not think of them. For example, if you hadn't already had a great deal of experience with studying, it may never have occurred to you that a person who studies eight hours in one day might become more fatigued than a person who studies only four hours. This is why it is so important for researchers to describe their procedures carefully when publishing their studies. This allows other researchers to more easily detect the possible influence of unsuspected extraneous variables and then attempt to replicate the results with better-controlled studies of their own.
|
This site was developed and is maintained by Jeffry Ricker
Contact Person: Jeffry Ricker
This site is hosted on
Scottsdale Community College's
server. Please read their disclaimer.