Research Methodology: A Toolkit of Sampling and Data Analysis Techniques for Quantitative Research
2.1 Why sampling and not a census?
2.2 Methods of sampling
2.2.1 Random sampling methods
188.8.131.52 Simple random sampling
184.108.40.206 Stratified random sampling
220.127.116.11 Systematic sampling
18.104.22.168 Cluster sampling
2.2.2 Non-random sampling methods
22.214.171.124 Convenience sampling
126.96.36.199 Judgement sampling
188.8.131.52 Quota sampling
184.108.40.206 Snowball sampling
2.3 Sampling errors
2.4 Non-sampling errors
3.0 Data analysis
3.1 Data analysis techniques to explore relationships among variables
3.1.2 Partial correlation
3.1.3 Multiple regression
3.1.4 Factor analysis
3.2 Data analysis techniques to compare groups
3.2.1 Non-parametric data analysis techniques
220.127.116.11 Chi-square test for goodness-of-fit
18.104.22.168 Chi-square test for independence
22.214.171.124 Kappa measure of agreement
126.96.36.199 Mann-Whitney U test
188.8.131.52 Kruskal-Wallis test
3.2.2 Parametric data analysis techniques
184.108.40.206 One-way analysis of variance
220.127.116.11 Two-way between groups analysis of variance
18.104.22.168 Mixed-between-within subjects analysis of variance
22.214.171.124 Multivariate analysis of variance
This book explores the different types of sampling methods and data analysis techniques which can be adopted by researchers as part their research methodology. For sampling, the author aims to argue the benefits of opting for a sample as compared to a census, the possible sampling avenues that a researcher can use to obtain data, including the advantages and disadvantages of each method, and the sampling errors and non-sampling errors that can occur in a research study. On the other hand, the data analysis techniques which are explored in this book are focused quantitatively. For this, the authors aim to present the different types of quantitative data analysis method and distinguish the different results that could be expected from each of these techniques. Hence, the authors hopes that this book would be able to aid researchers in gaining a more comprehensive understanding on the different sampling and data analysis techniques for a quantitatively-natured research.
Sampling is widely used in academic researches as a means of gathering useful information about a population (Thompson, 1992). A population is any complete group that shares a common set of characteristics (Hajek, 1981), such as those of homosexuals and metrosexuals. According to Zikmund, Ward, Lowe and Winzar (2007), the process of sampling involves using a small number of items or parts of the population in an attempt to make conclusions about the whole population. For example, when a researcher generalizes about a certain group of people or subjects (e.g. metrosexuals are hot!), he or she is doing so on the basis of a small sample of observations, and not all observations. Therefore, a sample can be characterized as a subset of a particular population (Raj, 1968), whereby the process of sampling enables the researcher to estimate some unknown characteristic of a target population (Lohr, 1999). Often, a sample provides a reasonable means to researchers for gathering such useful decision-making information that might be otherwise unattainable and unaffordable (Black, Asafu- Adjaye, Khan, Perera, Edwards and Harris, 2009).
2.1 Why sampling and not a census?
There are several reasons as to why a sample is more appropriate in most studies as compared to a complete census.
Firstly, a sample can be relatively less expensive to acquire as compared to a census for a magnitude of questions (Krishnaiah and Rao, 1988). For example, if a researcher opts to undertake a fifteen-minute telephone interview on the factors influencing the purchase of ultra-speed wireless broadband services, the notion of conducting the interviews with a sample of 100 consumers rather than with a population of 100,000 consumers is obviously less expensive. In addition to the noticeable cost savings, the significantly smaller amount of interviews usually requires less total time (Henry, 1990). Hence, if there is a sense of urgency for the researcher to obtain results, sampling would be able to provide the required data more quickly. With the volatility of some markets and the constant barrage of new competition and new ideas, sampling holds a strong advantage over a census in terms of research turnaround time (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009).
Most research projects are only allocated a limited amount of resources (Salzer and Salafasky, 2003). As such, Levy and Lemeshow (1991) have concluded that researchers should opt for sampling instead of a census as sampling would facilitate the collection of more detailed information. According to Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris (2009), with resources concentrated on fewer individuals or items, the study can be broadened in scope to allow for more specific and specialized questions. An illustration of an organization which budgeted $100,000 for a study and opted to take a census instead of a sample by using a mail survey was presented by these authors. The organization would mass-mail thousands of copies of a computer-carded questionnaire that contain 20 close-ended questions, in which the respondents could answer yes or no by punching out a perforated hole, to provide information amounting to the percentage of respondents who answered yes and no on the 20 questions. For the same amount of money, the organization could have taken a random sample of the population and conduct interactive one-on-one sessions using highly trained interviewers and gather a more detailed information about the process being studied. Thus, the researcher could have spent significantly more time with each respondent by using the money for a sample and thus, increases the potential of gathering more useful information.
Besides that, there are many research projects, especially those in quality control testing, require the destruction of the items being tested (Zikmund, Ward, Lowe and Winzar, 2007). For example, if a manufacturer of cigarettes wished to find out whether each unit of a new cigarette taste was up to standard before being released on the market, there would be no more cigarettes left after testing.
Relationally, it can be observed that the test units have been destroyed for the purpose of the research project. Thus, if a census was to be conducted for this type of research, there will be no more products left to sell (Thompson and Seber, 1996). Hence, taking a sample is the only realistic option for testing such products.
Finally, a researcher may be faced with a situation in which a population may be virtually impossible to access for research (Black, 2007). For instance, some people may refuse to answer sensitive questions, such as a Malay person’s experience in Malaysia in relation to pornography, while the owners of some items of interest, such as the owners of the limited edition Honda Civic Mugen RR, are so scattered that locating all of them would be the steepest uphill task. Thus, sampling is the only option when the population is inaccessible (Hafiz, 2010) for the mentioned reasons or for other similar reasons.
Although the discussion of this area is to provide a general understanding on why most studies have opted to use sampling instead of a census, it should be noted that there are some circumstance whereby taking a census makes more sense than using a sample.
One condition that may arise is when a researcher has an objective to eliminate the possibility that by chance a randomly selected sample might not be representative of the population (Krishnaiah and Rao, 1988) as Black (2007) suggested that a sample may be non-representative even when all proper sampling techniques are implemented. For instance, if the population of interest is all Ute owners in far north Queensland, a random sample of owners could yield mostly farmers when in fact many of the Ute owners in this region are urban dwellers (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009). Typically, this situation may have happed if the sampling frame was based only in one geographic location in north Queensland instead of having a census by including the whole region of north Queensland into the frame.
Another condition is when there is no obvious reason for not taking a census (Zikmund, Ward, Lowe and Winzar, 2007). For example, an organization that wants to assess salesperson’s satisfaction with its new color printing machine might have no pragmatic reason not to circulate a questionnaire to all 10 of its salesperson employees. Besides that, there might be situations in which the client of a study does not have an appreciation for random sampling (Goavendano, 2010) and feels more comfortable with conducting a census (Black, 2007).
Nevertheless, these reasons identified for an undisputed need for a census is based on the assumption that enough time and money are available to conduct such a census (Black, 2007; Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009; Goavendano, 2010; Krishnaiah and Rao, 1988; Zikmund, Ward, Lowe and Winzar, 2007).
2.2 Methods of sampling
There are several alternatives available to researchers to acquire a sample. The main alternative sampling plans may be group into two categories: random sampling techniques and non-random sampling techniques.
Using a random sampling technique, every element in the population has a known, nonzero probability of being selected into the sample (Cochran, 1977). Most statistical literature refer this to probability-based sampling as such a sample can be analyzed using probability theory and statistical theory (Barnett, 1991; Black, 2007). Typically, random sampling implies that chance governs into the process of selection (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009). For example, most gamblers believe that lottery winners are selected by some random draw of numbers, in which each selection is made by chance.
On the other hand, the probability of any particular member of the population being chosen into the sample using a non-random sampling technique is unknown (Deming, 1960) as the members of non-random samples are not selected by chance (Brewer and Hanif, 1983). For example, the members of the sample might be selected because they might know the researcher conducting the research or simply because they are at the right place at the right time. This implies that the selection of sampling units using non-random sampling techniques is quite arbitrary as the selection relies heavily on the researcher’s personal judgment. In addition, there are no appropriate statistical techniques for measuring random sampling error from non-random sampling methods and thus, the projection of data beyond the sample is statistically inappropriate (Zikmund, Ward, Lowe and Winzar, 2007). Nevertheless, there are occasions when non-random samples are best suited for the researcher’s purpose, which will be discussed in the later sections of this book.
In general, random sampling techniques are preferred by researchers in many studies as compared to non-random sampling techniques as random selection are more likely to result in samples which are more accurate and less biased (Krishnaiah and Rao, 1988). Nevertheless, this relies heavily on the researcher’s ability to obtain an accurate sampling frame (Foreman, 1991), which is something that is very difficult in many research studies (Levy and Lemeshow, 1991). However, at the sacrifice of accuracy, Zikmund, Ward, Lowe and Winzar (2007) suggests that non-random sampling methods, which are usually less expensive, less time consuming and easier to implement as compare to random- sampling methods, are often more practical when no sampling frame or list exist.
2.2.1 Random sampling methods
All random sampling techniques are based on chance selection process, whereby participants from a sampling frame are selected randomly and thus, the error related to the judgement of the researcher is eliminated. This section of the book presents the various random sampling methods, including the advantages and disadvantages of each method.
126.96.36.199 Simple random sampling
The most elementary random sampling technique is simple random sampling (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009) as it can be viewed as the basis for the other random sampling techniques (Black, 2007). This sampling procedure suggests that each element is chosen randomly and entirely by chance, such that each element has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals (Yates, Moore and Starnes, 2008). In small populations and often in large ones, such sampling is typically done without replacement, whereby the researcher deliberately avoids choosing any member of the population more than once (Henry, 1990; Jensen, 1978). Although it is possible for simple random sampling to be conducted with replacement, this is less common and would normally be described more fully as simple random sampling with replacement (Henry, 1990). Typically, sampling done without replacement is no longer independent, but still satisfies exchangeability and hence, many of the results still hold (Yates, Moore and Starnes, 2008). Furthermore, for a small sample from a large population, sampling without replacement is approximately the same as sampling with replacement as the odds of choosing the same sample twice is extremely low (Henry, 1990). Nevertheless, researchers should always keep in mind that an unbiased random selection of individuals is essential in the long run so that the sample represents the population as truly as possible (Lohr, 1999) as there can be no guarantee that a particular sample is a perfect representation of the population because it is not a census (Yates, Moore and Starnes, 2008). Typically, simple random sampling merely allows the researcher to draw externally valid conclusions about the entire population based on the sample.
Conceptually, simple random sampling is the simplest of all the available random sampling methods (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009). Nevertheless, this method requires a complete sampling frame, which may not be available or feasible to construct for large populations (Yates, Moore and Starnes, 2008). This is consistent with Black (2007) as he suggested that simple random sampling is easier to perform on small populations rather than large ones as the process of numbering all the members of the population and selecting items is cumbersome for large populations, but with the help of computers, the process might be slightly easier if done for large populations. According to Yates, Moore and Starnes (2008), even if a complete frame is available more efficient approaches may be possible if other useful information is available about the units in the population.
The advantages of this sampling method are: (i) it is free from classification error (Raj, 1968) and (ii) it requires minimum advance knowledge about the population other than its frame (Thompson, 1992). Thus, its simplicity makes it relatively easy for researcher to interpret data collected from simple random sampling. As such, this sampling method is best suited in situations where not much information is available about the population and data collection can be efficiently conducted on randomly distributed items, or where the cost of sampling is small enough to make efficiency less important than simplicity (Yates, Moore and Starnes, 2008). However, if these conditions are not true, the researcher may opt for other random sampling methods, such as stratified sampling or cluster sampling.
188.8.131.52 Stratified random sampling
Another type of random sampling is stratified random sampling, in which the members of the population are grouped into relatively homogeneous subpopulations, called strata, before sampling (Fowler, 1993). Typically, the strata should be mutually exclusive whereby every member of the population must be assigned to only one stratum (Saunders, Lewis and Thornhill, 2003). Subsequently, the researcher extracts a simple random sample from each of the sub-populations. This often improves the representativeness of the sample as it has the potential for reducing sampling error (Black, 2007). Typically, stratified random sampling has the potential to match the sample closely to the population greater than with simple random sampling because portions of the total sample are taken from different population subgroups (Malhotra, 2004). In particular, it can produce a weighted mean that has less variability than the arithmetic mean of a simple random sample of the population (Yates, Moore and Starnes, 2008). However, stratified random sampling is generally more costly than simple random sampling as each member of the population must be assigned to a stratum before the random selection process begins (Moore, 2001).
The strata selection is usually based on available information whereby such information may have been gleaned from previous census or surveys (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009). Stratification benefits increase as the strata differ more (Black, 2007). Internally, the stratum should be relatively homogeneous while externally, the strata should contrast with each other (Barnett, 1991; Black, 2007). In most studies, stratification is often done by using demographic variables, such as gender, age, income, education and religion (Ferguson, 1996). For example, in the market of radio stations, the age of listener is an important determinant of the type of programme that should be aired by the radio station. Usually, the stratification of age - 20-29 years old, 30-39 years old, and 40-59 years old, would project that members of a particular age group tend to prefer the same time of programming, which is different from the preferred by listeners of other age groups. Thus, this indicates that within each age subgroup, homogeneity or alikeness is present, whereas between each pair of subgroups, heterogeneity or differences is present.
Stratified random sampling can be either proportionate or disproportionate. Proportionate stratified random sampling occurs when the percentage of the sample taken from each stratum is proportionate to the percentage that each stratum is within the whole population (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009). For instance, suppose voters are being surveyed in Hulu Selangor is stratified by race as Malay, Chinese, Indian and others. If Hulu Selangor’s population is 68% Malay and if a sample of 1000 voters is taken, the sample would require inclusion of 680 Malays to achieve proportionate stratification and the sample proportion of other religions would also have to follow population percentages. In contrast, if the proportions of the strata in the sample are different from the proportions of the strata in the population (e.g. 68% Malays but sample included on 700 Malays), then a disproportionate stratified random sampling occurs. It should be noted here that sampling more heavily or less heavily in a given stratum than its relative population size warrants is not a problem if the primary purpose of the research is to estimate some characteristic separately for each stratum and if researchers are concerned about assessing the differences among strata (Zikmund, Ward, Lowe and Winzar, 2007). The logic behind this procedure relates to the general argument for sample size: as variability increases, sample size must increase to provide accurate estimates. Thus, the strata that exhibit the greatest variability are sampled more heavily to increase sample efficiency - that is, produce smaller random sampling error (Black, 2007). Complex formulae have been developed to determine the sample size for each stratum, whereby a simple rule of thumb for understanding the concept of optimal allocation is that the stratum sample size increases for strata of larger sizes with the greatest relative variability (Zikmund, Ward, Lowe and Winzar, 2007).
However, a weakness of this random sampling method is that it is not useful when there are no similar subgroups (Cochran, 1977). In addition, stratified random sampling is more complex than other sampling methods, such as simple random sampling and systematic sampling, as it requires greater effort in obtaining its sample - the strata must be carefully defined (Black, 2007; Henry, 1990).
184.108.40.206 Systematic sampling
Systematic sampling is a random sampling method involving the selection of elements from an ordered sampling frame (Black, 2004). Unlike stratified random sampling, systematic sampling is not used in an attempt to reduce sampling error (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009) but rather, it is used because of its convenience and relative ease of administration (Black, 2007). According to the sampling procedure of this method, every k th item is selected to produce a sample of size n from a population of size N (Black, Asafu-Adjaye, Khan, Perera, Edwards and Harris, 2009). The value of k, sometimes termed the sampling cycle, can be determined by the following formula: k = N / n, where k is the size of interval for selection, N is the population size, and n is the sample size. If k is not an integer value, the whole-number value should be used.
By using this procedure, each element in the population has a known and equal probability of selection (Hague and Harris, 1993). This makes systematic sampling functionally similar to simple random sampling, but is much more efficient as the variance within the systematic sample is usually more than the variance of the population (Harris, 2000).