List of Abbreviations
List of Figures
List of Tables
1.2 Initially supplied conceptual model
1.3 Research question
1.4 Research context: processes of the public sector
1.5 Structure of thesis
2 Theoretical foundation: methodology of scale development
2.1 Introduction to scale development
2.2 A generalized scale development procedure
2.2.1 Derivation of discrete procedure aspects
2.2.2 Indicator creation
2.2.3 Preliminary instrument design choices
2.2.4 Reliability and validity requirements
2.2.5 Repeated analysis and refinement of instrument
22.214.171.124 Qualitative analysis and refinement
126.96.36.199 Quantitative analysis and refinement
188.8.131.52.1 First generation techniques
184.108.40.206.2 Second generation techniques
2.3 Scale development procedure of Hensley (1999)
2.4 Scale development procedure of DeVellis (2003)
2.5 Scale development procedure of Homburg and Giering (1996) . . .
3 Practical implementation: applied scale development procedure
3.1 Derivation of applied procedure
3.2 Preliminary scale development (phase 1)
3.2.1 Clear definition and description of constructs
3.2.2 Item pool generation
3.2.3 Control variables
3.3 Qualitative development of survey instrument (phase 2)
3.3.1 Instrument design choices
3.3.2 Qualitative pre-test of instrument
220.127.116.11 Round 1: procedure and results
18.104.22.168 Round 2: procedure and results
22.214.171.124 Round 3: procedure and results
3.4 Quantitative development of final scales (phase 3)
3.4.1 Quantitative survey and data collection
3.4.2 Quantitative data analysis and evaluation
126.96.36.199 Data preparation
188.8.131.52 Sample description
184.108.40.206 Choice of analysis techniques and corresponding evaluation procedure
220.127.116.11 Separate evaluation of measurement models . . .
18.104.22.168.1 Reflective measurement models
22.214.171.124.2 Formative measurement models
126.96.36.199 Joint evaluation of measurement models
4.1 Proposal of sound measurement models
4.3 Further research
1.1 Questionnaire of quantitative survey (German)
1.2 Final questionnaire (German)
1.3 Final control variables (German)
1.4 Final questionnaire (English)
1.5 Final control variables (English)
2 SPSS evaluation Results
2.1 Separate evaluation of measurement models
2.1.1 Cronbach’s alpha and item to total-correlations
2.1.2 Exploratory factor analysis
2.1.3 Multicollinearity tests
2.2 Joint evaluation of measurement models
2.2.1 Exploratory factor analysis
2.2.2 Multicollinearity tests
3 SmartPLS evaluation results
3.1 Separate evaluation of measurement models
3.2 Joint evaluation of measurement models
The following diploma thesis is a scale development study in context of the public sector. A set of sound measurement models had to be developed for measuring an initially supplied, not yet quantitatively evaluated, conceptual model of Barth and Veit (2011). As the individual parts of the conceptual model are not of a directly observable nature, scale development techniques had to be used to make them empirically manifest (e.g. DeVellis, 2003; Gerbing and Anderson, 1988). A stringent procedure, based on literature recommendations and best practices was followed, as the development of reliable and valid scales is not trivial. There- fore, a generalized scale development procedure and its corresponding subparts were derived of a large amount of available publications from different fields of research like information systems, operations management, psychometrics, etc. Two specific processes of the public sector, namely the civil marriage and resi- dency change, were used as context for phrasing meaningful sentences and cap- turing a broad range of citizens’ resistance towards conducting a public process virtually. Eventually, the measurement models should be usable to easily apply the conceptual model to ideally any process of the public sector.
After the creation of an initial pool of items, resulting preliminary scales were qualitatively tested and refined in multiple steps, ranging from expert opin- ions to comprehensiveness checks and particular sorting procedures. Notably, 62 distinct qualitative interviews with citizens were accomplished to reduce the set of indicators from 150 to 64. In a following step, a quantitative survey was employed, which yielded around 350 responses as a basis for data preparation and statistical analyses. This statistical evaluation procedure was twofold. First generation techniques like the exploratory factor analysis or Cronbach’s α were discussed and applied. Afterwards, the developed scales were tested in a confir- matory manner by the use of a multivariate second generation structural equation modeling approach. The two most common analysis methods for structural equa- tion modeling, covariance-based via LISREL and variance-based via partial least squares, were compared, before the latter was applied due to theoretical consider- ations. Special attention was necessarily paid to some of the conceptual model’s constructs, because their method of measurement was designed formatively. This holds especially true referring to the statistical analyses, as formative measures are grounded on very distinct assumptions and analysis consequences (e.g. Petter et al., 2007; Bollen and Lennox, 1991), such as undesired multicollinearity (e.g. Jarvis et al., 2003; Henseler et al., 2009). Quality criteria of the measurement models were analyzed to justify, how appropriate the measurement of the con- ceptual model actually is, hence how good the estimation of the latent variables’ true but unobservable values are. As a result, all of these models comply to generally acknowledged evaluation criteria with regard to reliability and validity. Eventually, the study draws to a close with a discussion of the findings, their limitations and future research implications.
Confirmatory factor analysis (CFA) Factor analytical technique, which exam- ines the provided indicators with regard to their underlying structure of constructs (or factors), in which explicit, prespecified hypotheses about the relationships between indicators and constructs are tried to confirm (Jöreskog, 1966).
Construct Theoretical, abstract entity representing the "nonobservable state or nature of a phenomenon" (Bagozzi and Phillips, 1982, p. 24). In this context, the conceptual model consists of distinct constructs, which possess causal interdependencies.
Exploratory factor analysis (EFA) Factor analytical technique for reflective constructs, which examines the provided indicators with regard to their underlying structure of constructs (or factors), in which no prespecified hypotheses about the relationships between indicators and constructs are established (e.g. Worthington and Whittaker, 2006).
Formative indicator Indicator that accounts for the cause of the latent variable (Bollen and Lennox, 1991). A set of formative indicators represent concep- tually different dimensions of the theoretical construct within a scale (Chin, 1998a) and thus can not be easily removed without affecting the ability to capture the underlying domain of content (e.g. MacKenzie et al., 2005). See also: indicator.
Indicator Variable, which is directly observable and used (as part of a scale) to estimate the "true value" of a latent variable (DeVellis, 2003). Also called manifest variable, item, observable variable or measure. See also: latent variable.
Latent variable Variable, which is not directly observable (Edwards and Bagozzi, 2000). Thereby, the true (but unknown) value needs to be es- timated by the use of manifest variables (Gerbing and Anderson, 1988).
Moderating effect Variable, which affects the strength of causal interdepen- dencies between latent variables in a conceptual model (Carte and Russell, 2003).
Partial Least Squares (PLS) Second generation, distribution free regression al- gorithm based on variance analysis which is used for model evaluation with regards to structural equation modeling. See also: structural equation mod- eling.
Pre-Test Recommended step in scale development procedure. Trial and qual- itative refinement of certain aspects of the preliminary survey instrument (e.g. questionnaire) before larger quantitative data collection and analyses are conducted.
Reflective indicator Values of reflective indicators are caused (i.e. reflected) by the latent variable (Bollen and Lennox, 1991). Reflective indicators represent conceptually equal dimensions of the theoretical construct and are supposed to be highly intercorrelated within a scale (MacKenzie et al., 2005).
Reliability Extent to which a variable produces consistent measurements or sta- tistically expressed "the amount of variance attributable to the true score of the latent variable" (DeVellis, 2003, p. 27). Influences a scale’s ability to adequately measure the phenomenon of interest (Homburg and Giering, 1996).
Scale A set of indicators, measuring the same latent variable.
Scale Development Multi-step procedure to develop sound scales. Used, if no ready-to-use measures exist (DeVellis, 2003).
Structural equation modeling (SEM) Second generation causal analytical technique for statistically estimating and testing causal interdependencies of conceptual models among multiple independent and dependent variables simultaneously (Gerbing and Anderson, 1988). A structural equation model consists of a structural model and a measurement model part (Gefen et al., 2000) and works best in a confirmatory manner (Chin, 1998b)
Validity "[T]he adequacy of a scale as a measure of a specific variable [...] is an issue of validity" (DeVellis, 2003, p. 49), thereby influencing a scale’s ability to adequately measure the phenomenon of interest (Homburg and Giering, 1996)
List of Abbreviations
Abbildung in dieser Leseprobe nicht enthalten
List of Figures
1.1 Conceptual model as basis for scale development within this diploma thesis (based on Barth and Veit (2011))
2.1 Reflective and formative method of measurement (based on MacKenzie et al. (2005); Chin (1998a))
2.2 Structural equation model with latent variables and measures (based on Bollen and Lennox (1991); Backhaus et al. (2011))
2.3 Identified scale development process by Hensley (1999)
2.4 Recommended scale development procedure by Homburg and Gier- ing (1996)
3.1 Applied scale development procedure in study
3.2 Basic indicator creation process of Chin and Newsted (1999)
3.3 Illustration of conducted pre-test
3.4 Arithmetic mean of ARRs over all constructs in pre-test
3.5 Histogram of the frequency distribution of indicator F
3.6 Pattern matrix of joint evaluation of reflective measurement mod- els (round 4) 111 List of Tables
2.1 Scale development procedure by DeVellis (2003)
3.1 Constructs of the conceptual model of Barth and Veit (2011), in- cluding moderating effects
3.2 Construct definitions of the conceptual model of Barth and Veit (2011), including moderating effects
3.3 Formative dimensions with regard to construct definitions
3.4 Constructs and their corresponding literature for item creation
3.5 Control variable aspects of potential influence on a citizen’s resistance
3.6 Pre-test round one: average rejection rates of C
3.7 Pre-test round one: average rejection rates of C
3.8 Pre-test round one: average rejection rates of all constructs
3.9 Pre-test round two: average rejection rates of all constructs
3.10 Definitions of advanced moderating effects
3.11 Pre-test round three: average rejection rates of all constructs, in- cluding variations
3.12 Pre-test round three: count of indicators per construct
3.13 Data preparation steps for enhancing data quality
3.14 Contradiction rates for reverse-coded items in quantitative sample
3.15 Sample representativeness in terms of all Internet users (IU) in Germany (based on the Federal Statistical Office of Germany (2010))
3.16 CBSEM or PLS: decision criteria based on literature
3.17 Quality criteria and corresponding applied methods used for sep- arate evaluation of reflective measurement models
3.18 Separate assessment of reflective measurement models: final values for Cronbach’s alpha
3.19 Separate assessment of reflective measurement models: AVE and composite reliability results
3.20 Quality criteria and corresponding applied methods used for sep- arate evaluation of formative measurement models
3.21 Separate assessment of formative measurement models: tolerance and VIF results
3.22 Discriminant validity assessment: squared latent variable correla- tions with AVE in the diagonal
4.1 Proposal of sound, reflective measurement models
4.2 Proposal of sound, formative measurement models
The following chapter is an introduction to this diploma thesis. It presents a short overview of what scale development is about and why it is needed, leading over to the study’s underlying conceptual model, for which the research is con- ducted. Consequently, the research question is derived, before a brief outline of the document structure is given to ease and foster the reader’s understanding of this thesis.
Many phenomena of science are of theoretical nature. Resulting theoretical con- structs examined by scientists are representations of something that is not di- rectly measurable nor directly observable (e.g. Schriesheim et al., 1993; Edwards and Bagozzi, 2000; Petter et al., 2007). Making those phenomena measurable by inferring the unobservable from the observable (Chin, 1998b), is an important activity in (empirical) research for a broad range of subjects (DeVellis, 2003). Thereby, a theory is defined as "a statement of relationships between units [, which are] observed or approximated in the empirical world" by Bacharach (1989, p. 498). Furthermore the author denotes these "observed units" as being opera- tionalized by measurement. Consequently, these measures are necessary to make the underlying theory empirically tangible, hence they "provide an empirical es- timate of each theoretical construct of interest" (Gerbing and Anderson, 1988, p. 186). Thus, empirical means can be used to evaluate (e.g. falsify) a theoretical framework’s propositions and causal interdependence relationships (Bacharach, 1989). Conceptual models like the one this present study is related to, can be quantitatively exercised in praxis, after adequate measures have been developed (Homburg and Giering, 1996). This is why Schoenfeldt (1984, p. 78) states that "[t]he construction of the measuring devices is perhaps the most important seg- ment of any study. Many well-conceived research studies have never seen the light of day because of flawed measures". Unfortunately, those measures are ex- clusively bound to specific constructs (e.g. DeVellis, 2003; Straub, 1989), which infers the need of developing new measures if no appropriate, existing ones are available. Hence, scale development techniques have to be used to create such measures for directly unobservable pieces of theory, whereby a set of measures corresponding to the same underlying piece is called a scale (e.g. Gerbing and Anderson, 1988), or measurement model with regard to causal analytic models which are presented later in this study. Although a vast amount of publications concerning scale development can be considered as a template for application, it is still not trivial and several partly different approaches exist (e.g. Boudreau et al., 2001; Hensley, 1999). However, developing scales is generally more about following stringent literature based procedures and methodological advices than seeking for theoretical advancements (e.g. DeVellis, 2003).
This diploma thesis is a scale development study in the public sector con- text. As foundation, a conceptual model of the e-government field of research was initially supplied by Barth and Veit (2011). Referring to above, no measure- ment model, respectively scales, have been developed for this conceptual model yet. Therefore, the present study aims at proposing the necessary, sound mea- surement models. The scale development procedure was derived from literature and includes a thorough preliminary, qualitative refinement of initially developed scales as well as a quantitative survey for an empirical examination. The subse- quent data analyses and model evaluation approaches were twofold: conventional first generation and recent multivariate second generation approaches were em- ployed to achieve optimal insights and results. The final scales have to conform to high requirements with respect to reliability and validity aspects. At this point it shall be made clear, that neither the (further) development of the conceptual model nor its underlying theory, propositions or causal interdependencies are subject to this diploma thesis.
1.2 Initially supplied conceptual model
Figure 1.1 shows the initially supplied conceptual model of this diploma thesis. It is subject to the e-government field of research and has been proposed by Barth and Veit (2011). The authors’ intention is to model and understand the resistance of citizens towards conducting a specific process of the public sector virtually, in comparison to the offline pendant within government agencies. Thus, this resistance subsequently depicts the model’s dependent variable. The model is grounded on the process virtualization theory of Overby (2008b). This theory was extended and transfered from the private to the public sector to gain knowledge about which processes are more likely to be virtually accepted among citizens and which reasons for a varying resistance can be identified. For this transfer, further research was incorporated by Barth and Veit, for instance relevant e-commerce literature (e.g. Chang et al., 2005), relevant information systems (IS) literature (e.g. Venkatesh et al., 2003), and relevant communications research literature (e.g. Daft and Lengel, 1986) as well as relevant public administration literature (e.g. Thomas and Streib, 2003).
Abbildung in dieser Leseprobe nicht enthalten
Figure 1.1: Conceptual model as basis for scale development within this diploma thesis (based on Barth and Veit (2011))
In compliance with Overby (2008b, p. 278), a virtual process is defined as "a process in which physical interaction between people and/or objects has been removed". Besides the dependent variable (C10), Barth and Veit model nine additional latent variables (C1-C9). These nine constructs depict process properties, which are supposed to influence a citizen’s resistance towards a pro- cess virtualization (i.e. C10). Eventually, the models’ propositions P1-P9 are all supposed to positively influence the dependent variable. Hence, the greater the value of the independent variables, the greater the value of the dependent variable is predicted to be.1 Further, Barth and Veit propose to conceptually check for moderating effects in future, especially "general risk tolerance" (mod1) and "per- manent lack of time" (mod2). These are presumed to moderate the strength of causal interdependencies (Carte and Russell, 2003), which are depicted by arrows in figure 1.1. This assumption was added, because citizens’ resistances showed to be different within the qualitative interviews, though the situation with regard to the other constructs seemed quite identical. However, moderating effects are not discussed in detail within the publication of Barth and Veit, which is also why none of these relationships are depicted in above’s figure. The necessary conceptual research is not yet accomplished at this stage of development. So far, Barth and Veit (2011) focus on the non-moderating, causal interdependence rela- tionships. As part of the ongoing research, developing measures for the proposed moderating effects is also pursued in this thesis. However, the idea of a construct related to "general risk tolerance" was later replaced within this study, as it did not seem possible to accurately measure someone’s general risk tolerance across all domains with only a few measures.2
1.3 Research question
The derivation of the underlying research question is straight forward. The conceptual model of Barth and Veit (2011) has not been quantitatively eval- uated yet and thus needs optimal measurement models for doing so. This diploma thesis proposes several measurement models with regard to the the- oretical constructs provided. Consequently the research question reads as follows:
"How does a set of sound measurement models for the supplied conceptual model and its corresponding constructs look like?"
For answering the research question above, scale development techniques were used to develop measures which then have been tested on their appropriateness. In this context, it should be noted once again, that the structural evaluation of the conceptual model’s causal interdependencies or further theory development are not subject to this study.
1.4 Research context: processes of the public sector
As stated earlier, constructs one to nine are supposed to depict process properties which, according to Barth and Veit (2011), influence a citizen’s resistance towards conducting a process virtually. For being able to develop measures, respectively a questionnaire for a quantitative survey, the constructs need to refer to a specific process of the public sector. Moderating effects (C11-C12) do not depict process properties and thus do not need to be adapted to a certain process because they represent citizens’ general attitudes which are presumed to moderate the importance of some process properties or causal relationships. Different public processes are suspected to account for statistical variance in questionnaire takers’ average resistance. Resulting measures of this study can be used to apply the conceptual model to virtually every public process in future.
The two chosen processes of the public sector as measure context are: (1) civil marriage (CV) and (2) residency change (RC).3 Though it is noteworthy, that these public processes were selected for being very complementary as described in the following. It is assumed that whereas the average citizen favors the virtual version of the residency change, this citizen is not willing to conduct a civil marriage online. On the one hand, the civil marriage is expected to have a high perceived relevance to the citizens and is therefore a very special event. On the other hand, the residency change process is a necessary task in compliance to applicable law, which is relatively brief. These assumptions were based on common knowledge and have not been explicitly proven, though they might be fortified by the actual application of the model itself, once it is fully developed. However, it is not necessary to determine if these assumptions really hold true, because the chosen processes shall only act as a skeleton for future explorations of other processes (i.e. the adoption of the measurement items to a specific process). By selecting processes which probably yield a high variance in resistance, the measures were tested on two "extreme examples", to see if they work out well in at least two different and distinct situations. In contrast, the selection of two closely interrelated processes might have shown a good fit for that specific domain, but no conclusion could be established about a good accuracy of fit in general. This complies to the fact, that "scales can be developed to be relatively broad or narrow with respect to the situations to which they apply" (DeVellis, 2003, pp. 63). A side objective is to find measures for the conceptual model, which can easily be adapted from one process to another by only changing very few words but not the meaning. If the latter was the case, the measures would have to be retested and evaluated again (Straub, 1989), which would interfere with the objective of developing measures for a generally applicable, conceptual model. This obviously implies some precautions to assure measures in both processes were really as equal as possible.4 Thus, a great amount of work arises due to the fact that a change of a measure of one process, immediately triggers a change in the specific measure of the other process such that the assumption of equality is not violated.
1.5 Structure of thesis
The present study commences with an introduction to the diploma thesis’ field of research in chapter 1, its underlying conceptual model and the necessary context and selection of public sector processes. The subsequent chapter 2 is a theoretical discussion and review of literature recommendations. It starts with a broad introduction to scale development, which leads over to the derivation of a generalized scale development procedure in section 2.2. Based on this generalized procedure, the section demonstrates the procedure’s major discrete aspects or elements and explains them in a detailed fashion. Necessary requirements in terms of reliability and validity are examined and a thorough introduction is given with respect to the foundation of data analysis techniques. Coming to an end, chapter 2 furthermore presents three distinct scale development procedures, which were notably important for this study.
Chapter 3 displays the applied scale development procedure and its deriva- tion from theoretical foundations and literature recommendations exposed in chapter 2. Section 3.2 examines the creation of preliminary scales, section 3.3 guides the reader along the conducted qualitative pre-test rounds and the preced- ing instrument design choices and lastly section 3.4 depicts data collection and analysis techniques used for statistically evaluating the developed measures.
The study closes with chapter 4, a discussion of the developed measurement models and its findings, also naming research limitations and future challenges before a final conclusion is made in chapter 5.
2 Theoretical foundation: methodology of scale development
The following chapter will introduce the reader to the theoretical foundations of scale development. Section 2.1 formally introduces scale development and its procedures, before section 2.2 derives some elements of a typical or generalized scale development procedure. These elements are identified and explained within this section. Lastly, three scale development procedures are discussed in more detail, as they significantly influence the procedures used in this study.
2.1 Introduction to scale development
As stated in section 1.1, many phenomena of science are of theoretical nature. Thus, most theoretical pieces examined by scientists are represented by abstract entities which are not directly measurable (e.g. Edwards and Bagozzi, 2000; Petter et al., 2007; DeVellis, 2003), e.g. personal beliefs or consumer behavior (Hom- burg and Giering, 1996). But this even holds true for less abstract things like temperature, which also cannot be observed directly. Instead, thermometers use physical effects like the expansion of liquids when the temperature increases. Those abstract entities, called constructs, represent the "true, non-observable state or nature of a phenomenon" (Bagozzi and Phillips, 1982, p. 24). Therefore, "[a] construct is an abstract theoretical (hypothetical) variable" (Schriesheim et al., 1993, p. 358), which is invented by scientists (Kerlinger, 1986). Accord- ing to Homburg and Giering (1996), complex theoretical constructs are still not correctly captured by many publications, although they definitively need to be adequately conceptualized before any thoughts concerning measurement can be made. This incorporates a thorough identification of distinct factors or dimen- sions of the theoretical construct. Furthermore, the implied constructs have to be defined unambiguously and in detail (Hughes and Kwon, 1990; Worthington and
Whittaker, 2006). Although this might not be extensively necessary for simple constructs, it definitely is for complex or very abstract ones, as they are especially difficult to capture (Nunnally, 1978). Subsequently, the step of operationaliza- tion deals with the development of measures referring to theoretical constructs, thus making them empirically manifest (Bacharach, 1989; Homburg and Giering, 1996), enabling them to "provide an empirical estimate of each theoretical con- struct of interest" (Gerbing and Anderson, 1988, p. 186). This is an important activity in (empirical) research (e.g. Gerbing and Anderson, 1988), because the unobservable often needs to be inferred from the observable (Chin, 1998b). By de- veloping observable measures during the constructs’ operationalization, the true value of the resulting variables can be predicted or at least be estimated (DeVellis, 2003).1 According to the nature of measurement, variables are also subject to measurement errors, that is the discrepancy between the true and the predicted value (Gerbing and Anderson, 1988), which should be assessed by the researcher. If no ready-to-use measures exist, scale development techniques are used for their creation (DeVellis, 2003). The measurable (i.e. manifest or observable) variables are often referred to as indicators (e.g. Homburg and Giering, 1996; Bollen and Lennox, 1991). The values of these observable variables are interrelated to the true but unknown values of the latent variables in different directions: they may either be caused by the true value, called reflective indicators, or cause the values of the observable variable, called formative indicators (DeVellis, 2003).2
A set of indicators sharing a common linkage to an underlying construct is called a scale (Gerbing and Anderson, 1988), which makes scale development the creation of measures, which are typically not ready-to-use (e.g. from previous research) (DeVellis, 2003).3 Though scale development is very time-consuming (Schmitt and Klimoski, 1991), special attention should be paid to it, because of the importance of sound measurement (e.g. Schoenfeldt, 1984). Obviously, scale development can only be reasonably conducted with at least a broad theoretical setting in mind. Whereas a pure exploratory type of research usually a priori defines "possible relationships in only the most general form" (Boudreau et al., 2001, p. 3), confirmatory studies are seeking to confirm prespecified relationships (Hair et al., 1998). Subsequently, this diploma thesis is not an exploratory nor conclusive confirmatory research approach.4 It is noteworthy, that the develop- ment of measures is also necessary for testing the causal dependence relationships of a conceptual model as they are unobservable (Homburg and Giering, 1996). Hence, developed measures are not only used to prove the appropriateness of the latent variables’ measurement, but also to actually judge a conceptual model. Without these measures, there would be no possibility of empirically evaluating a conceptual model due to the latent nature of their variables.
Generally, it is very important to ensure a high degree of reliability and validity of scales (e.g. DeVellis, 2003). Homburg and Giering (1996) clearly state, that the ability and quality of capturing a theoretical construct is mainly influenced by those two aspects.5 To enhance the probability of sound scales, many researchers agree with Fowler (1994), who demands that scales should be tested and refined (i.e. by qualitative means) before they get evaluated quantitatively (e.g. Boudreau et al., 2001; Hinkin, 1995).6
Although scale development techniques are widely-used among a broad range of areas, not all scales are developed carefully (DeVellis, 2003). Further- more, DeVellis argues, that flawed measures can lead to erroneous conclusions if assumptions of what a scale measures are not appropriate or valid. Therefore, it might be more costly to work with bad measures compared to any benefit attained. He demands, that researchers should at least recognize when mea- surement procedures are flawed. This statement is in line with others who also see difficulties in valid scale development research (e.g. Schriesheim et al., 1993; Hinkin, 1995; Hensley, 1999). At least, the number of properly validated instruments seemed to constantly rise in the information systems (IS) discipline (Boudreau et al., 2001).
2.2 A generalized scale development procedure
The following section identifies elements or discrete aspects of a generalized scale development process, which are derived from literature. Though, many scale development studies exist, the procedures used to obtain sound measures are not always identical. Nevertheless, generalized elements can be identified which represent up a typical scale development approach. Consequently, the section depicts these and further issues when doing scale development, as well as its corresponding challenges to the researcher.7
2.2.1 Derivation of discrete procedure aspects
There is a wide range of approaches referring to scale development procedures in literature (e.g. Hensley, 1999; Boudreau et al., 2001; Worthington and Whit- taker, 2006; Hinkin, 1995), which is no surprise considering its high relevance in empirical research (Schoenfeldt, 1984). As stated in section 2.1, sound measure- ment is very important, because the amount of collected data (e.g. by the use of questionnaires) is irrelevant if measures are inappropriately associated with the true value of a latent variable (DeVellis, 2003). As a result, many frameworks have been suggested during the last decades. For instance, Likert (1969) pro- poses three general steps that establish a scale development procedure: initial survey design (which is mainly about generating measures), questionnaire devel- opment (which is defining and repeatedly refining the survey instrument) and data analysis (which is analyzing the collected data for determining the quality criteria of measures). Similar to these, Schwab (1980) also suggests three dis- tinct steps: item generation, scale development and scale evaluation. Hensley (1999) as well identifies three very similar major aspects of scale development: initial survey design, questionnaire development and data collection and analy- sis. Spector (1992) introduces a framework of five steps: define the theoretical constructs, build the scales, pre-test the scales, administer the scales and statis- tically evaluate the scales. DeVellis (2003) even names eight steps in guiding a researcher’s scale development: define the variables, generate items, determine the format of measurement, expert-reviews, validation items, administer items, evaluate the items and optimize scale length. Lastly, Homburg and Giering (1996) as well as Churchill (1979) propose even more granular steps, which deal with very similar aspects like the ones above: a conceptualization and item pool gen- eration, a preliminary test or refinement of the indicators and finally one or more data collections with corresponding analyses. Due to space restrictions a lengthy comparison cannot be presented at this point.8 Looking at scale development literature in general, but especially those steps listed above, one can identify an extensive overlap of these approaches. A generalized scale development proce- dure is thereby characterized by: (1) an initial generation of indicators to form scales with a preceding clear definition of constructs, (2) an instrument develop- ment (e.g. questionnaire), incorporating design choices, like scaling and sampling as well as (3) an iterative testing and refinement of measures to obtain validity and reliability (e.g. by the use of qualitative and quantitative data collections and analyses). The identification of those three important elements is used to generally present and discuss arising issues and literature recommendations for developing scales along this chapter. Please refer to subsection 2.2.2 for point (1), subsection 2.2.3 for point (2) and subsection 2.2.5 for point (3). Subsection 2.2.4 presents requirements for reliability and validity, which are very important and incorporated in all of the above procedures.
The scale development approaches of Hensley (1999), DeVellis (2003) and Homburg and Giering (1996) are introduced in further detail at the end of this chapter, as they significantly influenced the procedure applied in this study.9
2.2.2 Indicator creation
The transition from conceptualization to operationalization of constructs is concerned with making theory empirically manifest, that is observable (Homburg and Giering, 1996). Consequently, this incorporates the creation of measures for latent variables. Although "listing all the things that make an item [i.e. indicator] good or bad is an impossible task" (DeVellis, 2003, p. 67), the following paragraphs are a summary of issues encountered when initially creating indicators for this study. Typical for empirical surveys, the indicators are incorporated into a questionnaire for quantitative data collection.10
Straub (1989) suggests that, if available, scientists should use already de- veloped and established scales based on literature. However, if available scales do not fit the new purpose, they need to be adjusted or reassembled in a new composition, either with existing or newly developed items. This is not an un- common task in new theory development, because measures have to extensively comply to their underlying theoretical constructs (Straub, 1989). This means, if the definition of a construct is new or has changed compared to the original one used to develop the already existing scales, the new scale needs to be reval- idated in terms of validity and reliability (Boudreau et al., 2001). As based on Schwab (1980) and put by Hinkin (1995, p. 980): "It is also recommended that when items are added or deleted [...], the ’new’ scale should then be administered to another independent sample". Furthermore, Hinkin (1995) and Schriesheim et al. (1993) point out, that certainly not all existing measures in literature are flawlessly developed. For instance, the differentiation between reflective and for- mative indicators is often not explicitly considered (Diamantopoulos and Siguaw, 2006b; Cenfetelli and Bassellier, 2009; Petter et al., 2007). When theory or the understanding of the real processes are still weak, practical oriented and academic procedures (i.e. qualitative interviews) should be combined before a further de- velopment of scales to minimize refinement steps (Hensley, 1999; Flynn et al., 1994). As a consequence, questions also need to be explicitly examined in what they are really supposed to measure, to accurately capture the intended domain 2.2 A generalized scale development procedure 14 (Bohrnstedt, 1970). This is called content validity, as described by Hinkin (1995, p. 969): "capture specific domain of interest, yet contain no extraneous con- tent". It is the degree to which a construct’s indicators represent the content or semantic domain fully and adequately (Churchill, 1979) and should be ensured in early stages of research, before a lot of time is spent with inadequate items (Schriesheim et al., 1993). For instance, this can be accomplished by literature reviews, expert panels or sorting procedures, whereas sorting means the arrange- ment of items in groups by external people according to their understanding of construct definitions (Boudreau et al., 2001). Following from the above, ambi- guities in the relation between indicators and a latent variable shall be avoided, since otherwise items tend to measure several things but not exclusively their original purpose. Thus, an indicator is supposed to be a measure of a single latent variable only, because otherwise, this would have unwanted influences on reliability and validity of the measurement (DeVellis, 2003; Gerbing and Ander- son, 1988). In contrast, redundancy is not hindering during the development of scales, DeVellis concludes, because different variations of items could be used to test for optimal results, whereas only the best performing items would stand the iterative refinement process. Though, in most cases he demands a more signif- icant variation than changing a single word. However, one needs to be careful at this point and differentiate between the development stages and the final in- strument. If a formative method of measurement is used, multicollinearity (i.e. high intercorrelations) among (final scale) items is unwanted (e.g. Henseler et al., 2009; Jarvis et al., 2003; Cenfetelli and Bassellier, 2009). This is not the case for reflective scales, where a high correlation among each indicator also constitutes a h igh correlation to the latent variable (Chin, 1998a). The method of measure- ment will be introduced below in this subsection, after mentioning some other issues which have to be considered.
When phrasing items for a questionnaire, the length of sentences and the used language need to be considered. From common knowledge, sentences should be as long as necessary to transmit the content, but as short as possible. Longer items would usually cause a higher complexity and decrease clarity, which might increase the required reading difficulty level (DeVellis, 2003). The same holds true for the used language, which should be adopted to survey takers’ style of language, but still follow generally admitted rules of grammar. Similarly, not explicitly necessary technical or foreign-language terms should be avoided (Anas- tasi, 1988). In general, poorly phrased sentences reduce the correlation between these items and the corresponding latent variable (Worthington and Whittaker, 2006; Quintana and Minami, 2006). Furthermore, items should not express more than one idea, called double barreled, because the influences cannot be easily sub- divided (DeVellis, 2003) - a survey taker’s answer could be caused by either of the ideas.
Great care should be taken with regard to negatively worded items. In general, positively phrased sentences are easier to understand, but may suffer from agreement bias, which occurs if people tend to agree with items, no matter what their content is about (DeVellis, 2003). Moreover, reverse-coded (or reverse- scored) items cannot only be used to recognize agreement bias, but also to detect survey takers who might create contradictions in their responses by answering very similar or equal to every question (Spector, 1992). Additionally, Spector states, that respondents would be more alert in completing the survey if reverse- coded measures are used, thus decreasing the response bias. Technically, reverse- coding of an item means, that its polarity is opposing the definition of the latent variable it refers to (which could end up in a negatively worded sentence though). For instance, in this study the dependent variable depicts a citizen’s resistance towards conducting a process virtually. Thus, an item like "I plan to conduct my civil marriage online via the Internet" is reverse-coded, in comparison to the latent variable. In other words, it might be mentally seen as the agreement instead of the originally defined resistance. Obviously, this requires a reversed expression of the item’s meaning compared to the underlying latent variable, but not necessarily the usage of negatively phrased sentences, for instance including "not", "never", etc. Some authors specifically highlight such indicators, e.g. by underlining them (e.g. Moore and Benbasat, 1991). On the other hand, errors may rise and validity may be reduced by including reverse-coded items (Schriesheim et al., 1993; Hinkin, 1995).11 Sentences should furthermore not be phrased very loosely, but strictly shelter a specific (Anastasi, 1988). Loosely phrased items could lead to a very low variance in participants’ answers, which of course reduces the statistical explanatory power. Finally, DeVellis (2003) demands to take care of (unintended) social desirability issues in scales, because these might bias the responses to be compliant with what is socially desired (e.g. being a person who cares for others).12
The method of measurement is of great importance when doing scale devel- opment, as not all constructs are applicable for a reflective measurement (Petter et al., 2007).13 Ignoring this fact, can lead to misinterpretations and flawed anal- yses (Chin, 1998a; Petter et al., 2007). Jarvis et al. (2003) as well as Edwards and Bagozzi (2000) state, that researchers still focus more on their conceptual model instead on appropriate measuring. The indicators’ method of measurement can either be of reflective or formative type, whereas both consider different assump- tions and analysis consequences (e.g. Petter et al., 2007; Bollen and Lennox, 1991; Barki et al., 2007; Cenfetelli and Bassellier, 2009; Chin, 1998a; Diamantopoulos and Winklhofer, 2001). Some different namings exist, accordingly formative indi- cators are also known as causal indicators or composite indicators and reflective indicators as effect indicators (Bollen and Lennox, 1991; Edwards and Bagozzi, 2000). Sometimes even latent constructs are differently named. Chin (1998a) for instance constitutes formatively measured constructs as emergent constructs, whereas Jarvis et al. (2003) call them composite constructs. Some authors even use the term scale development only with regard to reflective indicators, whereas the counterpart is called index construction for formative measures (Diaman- topoulos and Siguaw, 2006b).14 Although often handled equally, it should be noted, that from a strict point of view, not the constructs are measured reflec- tively or formatively but the indicators are either modeled reflectively or forma- of measurement as shown in the next paragraph. Influences on reliability and validity are further examined in section 2.2.4. tively (Diamantopoulos and Siguaw, 2006b). A basic distinction of both methods is the different interdependency direction towards the latent variable (e.g. Chin, 1998b,a; Bollen and Lennox, 1991; Cenfetelli and Bassellier, 2009; Diamantopou- los and Winklhofer, 2001; Edwards and Bagozzi, 2000). The authors express, that whereas reflective indicators are actually caused by the reflected value of the latent variable, formative indicators cause the latent variable’s value themselves, which is shown in figure 2.1. The different causal relations are indicated by the arrows’ direction between a latent variable and its measures. Consequently, re- flective indicators point away from the latent variable and formative ones point towards it. The figure shows the same latent variable "drunkenness" (i.e. the cir- cle), once measured by reflective and once by formative indicators (i.e. rectangles in both cases). The operationalization of drunkenness is a well-known example to illustrate the method of measurement.
Abbildung in dieser Leseprobe nicht enthalten
Figure 2.1: Reflective and formative method of measurement (based on MacKen- zie et al. (2005); Chin (1998a))
For the reflective version of figure 2.1, drunkenness is reflected by the "blood alcohol level", the "sense of balance" and the "speed of reaction", which means that they are caused by the true value (i.e. drunkenness) of somebody. If some- one is more or less drunken, the three indicators will positively correlate. Of course, these indicators need to be reliable (e.g. produce identical results in each measurement process) and free of other influences as far as possible. The speed of reaction may theoretically be influenced by other means like pharmaceuticals. Nevertheless, the causal direction is clear: the drunkenness causes the indica- tors’ values. Mathematically, Petter et al. (2007) (based on Bollen and Lennox (1991)) depicts the relationship of each indicator Yi to its latent variable X 1 as the regression equation:
Abbildung in dieser Leseprobe nicht enthalten
where Yi is the ith indicator, β i 1 the regression coefficient of the effect of X 1 on Yi (called loading) and ϵ i the measurement error of indicator i. Hence, each indicator is represented by its own regression equation, having its own measure- ment error only related to itself. Consequently, the assumption is that single error terms are uncorrelated among themselves (i.e. COV (ϵ i , ϵ j) = 0 for i = j) and with the latent variable X 1, and the expected value of an error term is 0 (i.e. E (ϵ i) = 0) (Bollen and Lennox, 1991). Reflective indicators are usually intended to be highly positively correlated among themselves (MacKenzie et al., 2005), which also leads to a high correlation with the latent variable (DeVellis, 2003). Therefore, reverse-coded items need to be mirrored so that all indicators reflect the same direction with respect to reliability analyses (Bollen and Lennox, 1991). They shall be unidimensional, thereby only loading onto one construct (i.e. nearly uncorrelated to any other constructs in the model) (Gerbing and An- derson, 1988). The importance of unidimensionality is argued by Hattie (1985, p. 49): "That a set of [reflective] items forming an instrument all measure just one thing in common is a most critical and basic assumption of measurement theory". DeVellis (2003) furthermore states the assumption, that they are ran- domly selected from within a universe of possible reflective indicators, hence they can be easily exchanged by others. However, the count of items within a scale cannot easily be changed as seen in section 2.2.3. Following this direction, the item count is not subject to change without effecting validity and reliability in general (Nunnally, 1978). Consequently, this may lead to more lengthy sentences, as all necessary information forming a construct needs to be included in every (reflective) sentence, especially, if the construct is rather adequate to a formative measuring.
Differently for the formative type, drunkenness itself is caused by the con- sumption of "beer", "wine" and "liquor". This causal relationship is depicted by arrows towards the latent variable. As perceivable from this example, those for- mative indicators are not easily exchangeable from within a universe of randomly selectable items, as they depict conceptually different dimensions of the same construct and do not need to be weighted equally towards the latent variable (Chin, 1998a). Authors argue, that if a formative item is removed, the mean- ing of the latent variable changes, whereas this is not the case with reflective indicators (because each represents the same distinct dimension of the construct) (Bollen and Lennox, 1991; Petter et al., 2007; Chin, 1998a). Therefore, discard- ing a formative indicator is equivalent to restricting the content domain of the construct (MacKenzie et al., 2005). Thus, all indicators that form a content domain need to be included, as far as they are not highly correlated, which is not desired in developing a formative measurement model (Bollen and Lennox, 1991). This is called multicollinearity (i.e. the degree of linear dependency) and negatively influences reliability and validity of formative constructs (Jarvis et al., 2003). MacKenzie et al. (2005, p. 712) note that "[i]ndeed, it would be entirely consistent with this [formative] measurement model for the indicators to be com- pletely uncorrelated". Actually, correlations among formative indicators can be positive, negative or zero and still yield satisfying results (Diamantopoulos and Winklhofer, 2001). As said, every (unique) dimension of the phenomenon that forms the construct has to be included in the scale (Petter et al., 2007), but not in each indicator. Mathematically, Petter et al. (2007) (based on Bollen and Lennox (1991)) depicts the relationship of formative measures for an construct Yi as a linear combination to be estimated as:
Abbildung in dieser Leseprobe nicht enthalten
where β in are the regression coefficients (called weights) for the formative indi- cators n of construct Yi, Xin the item observations (values) of construct Yi and ζ i the error or disturbance term of Yi. The error term ζ i is only related to the construct, but not to the single indicators, which means that formatively mea- sured constructs have their error (ζ i) at construct level, but not at indicator level like reflective do (Petter et al., 2007). Following from this, the error term ζ i for construct i is uncorrelated with the formative indicators (i.e. COV (Xin, ζ i) = 0 for all n) and the expected value of the error term is zero (i.e. E (ζ i) = 0) (Bollen and Lennox, 1991).
Generally, formative constructs can be seen as "an extreme example of multidimensional constructs" (Petter et al., 2007, p. 627). Multidimensional constructs may be used, if a theoretical construct is rather complex or abstract. They are also referred to as higher-order constructs, if multiple constructs are conceptualized in an interdependent way. Referring to Homburg and Giering (1996), an example of such a construct could be the customer proximity, having three different dimensions (quality, flexibility and interaction), which likewise have certain subconstructs. Within such a multidimensional construct, each dis- tinct measurement model can theoretically be either reflective or formative on its own (e.g. Jarvis et al., 2003), of course still depending on the construct’s nature. A misspecification is dangerous with regard to the empirical interpre- tation, which generally holds true for the method of measurement (MacKenzie et al., 2005). Moreover, Petter et al. (2007) state that one should be careful in re- spect to unidimensionality, because a (complex) construct can only be considered unidimensional, if all indicators actually measure the same aspect. The multi- dimensional modeling approach leads to a higher abstraction level and could for example be used if the main topic of a certain study is considered to be relatively complex (Petter et al., 2007).15
The adoption of formative as an alternative to reflective measurement sub- stantially increased with the progress and propagation of second generation mul- tivariate analysis techniques like SEM (Chin, 1998a; Petter et al., 2007), and were finally introduced to IS literature by Chin (1995, 1998b), as stated by Cenfetelli and Bassellier (2009). Thanks to some analysis techniques like PLS path mod- eling, formative measurement models can be evaluated with much less costs and effort as several years ago (Henseler et al., 2009; Chin, 1998a).16 Still, reflec- tive measures are by far more common than formative (Cenfetelli and Bassellier, 2009), possibly fostered by the fact, that measuring only one latent variable in a formative way, already makes the model more complex regarding interpretation and data analysis compared to models only measured reflectively (Petter et al., 2007; Henseler et al., 2009). Some authors argue, that in specific fields of research (e.g. marketing) a reflective indicator is more appropriate, because its measure- ment error is on indicator level (e.g. Homburg and Giering, 1996). However, Pet- ter et al. (2007) point out, that not all constructs can be adequately measured reflectively. This is in line with Diamantopoulos and Winklhofer (2001, p. 274), who state: "[we] believe that several marketing constructs currently operational- ized by means of reflective indicators would be better captured if approached from a formative perspective". In cases, for example where the construct cannot be fully captured or analyzed by formative measures, a third, mixed indicator model approach may be chosen, namely if both, formative and reflective indicators are present to measure the same construct (Bollen and Kwok-Fai, 2000). A well- known example of such a mixed indicator model is the multiple effect indicators for multiple causes (MIMIC) model (e.g. Hauser and Goldberger, 1971; Diaman- topoulos and Winklhofer, 2001). Such measurement models are often used, to explicitly determine errors of formative measurements, which is not as simple compared to reflective indicators (Henseler et al., 2009). It should be stressed, that the selected method of measurement drastically changes the final indicators (Diamantopoulos and Siguaw, 2006b). Ideally, researchers should additionally try to anticipate their data analysis methods, because formatively measured con- structs in the model might have quite large impacts on the available methods and techniques (e.g. Petter et al., 2007; Cenfetelli and Bassellier, 2009). Qureshi and Compeau (2009) furthermore mention, that not all of these techniques are equally good suited to deal with formative measures. For instance, covariance-based SEM approaches are less recommended for analyzing models including formative indi- cators, due to their intention to account for all the covariances among measures (Chin, 1998b), often leading to identification problems (Henseler et al., 2009). According to Chin (1998b), such algorithms often assume that "the correlations among indicators for a particular LV [i.e. latent variable] are caused by that LV" (p. ix), which is the case for reflective indicators as shown above. Hence, an alternation to the underlying model might be necessary a priori to the data collection (Petter et al., 2007). Although there are some possibilities or circum- stances when covariance-based SEM techniques may still be used in conjunction with formative measures, its employment may induce further difficulties to the researcher (e.g. Chin, 1998b). In contrast, PLS-based analysis approaches are suited for both types of indicators (Henseler et al., 2009).17
For actually determining the method of measurement for a given construct, Jarvis et al. (2003) illustrate four basic decision aspects to consider. First, the di- rection of the causal relationship should be determined to whether an indicator is caused by a latent variable or causes the variable itself. Secondly, the interchange- ability of the indicators should be examined. Thirdly, the covariation among the items should be checked, which means a change in one indicator should be seen in conjunction with one from another indicator. Lastly, the researcher has to focus on the question, if the indicators are thought to have the same antecedents and consequences, which is expected for reflective but not formative measures (Petter et al., 2007). In conclusion, Jarvis et al. (2003) set up points that are very compliant to the characteristics discussed in this section.
2.2.3 Preliminary instrument design choices
After the measures have been created, the preliminary survey instrument can be concluded. Whether partly done a priori or a posteriori to measure creation, DeVellis (2003) and Hinkin (1995) state, that important points deal with item scaling, sampling and the count of items within a scale. Similarly, Hensley (1999) also explicitly denotes the placement of scale items within a questionnaire as an issue.
Determining the scaling to use for items of a questionnaire, also referred to as scale type (Peterson, 1994), should ideally be conducted simultaneously to measure creation, because both need to be "compatible" (DeVellis, 2003, p. 71). Moreover, he points out, that very many different scalings exist, like Thur- stone scales, Guttman scales, etc.18 For statistical analyses of acquired data, the scales need to be able to generate sufficient variance among respondents (Stone, 1978). Anyhow, Likert scales, have been frequently used within different fields of research for item scaling in questionnaires (Spector, 1992; Cook et al., 1981). Fur- thermore, they have proven to be the most useful scaling in behavioral research (Kerlinger, 1986). As the name indicates, Likert scales were initially developed by Likert (1932) for complex, multi-item scales. According to Spector (1992), a Likert scale is a summated rating scale that is coined by four characteristics. Firstly, such a scale consists of multiple-items, which have to be used in con- junction. Secondly, it measures something that can quantitatively vary. Thirdly, there is "no ’right’ answer" (p. 1), but opinions, beliefs, and so on. Hence, Likert scales "cannot be used to test knowledge or ability" (p. 1). Lastly, each indicator represents a statement rather than an actual question. Likert scales range from three to ten in the number of points within the review of scale development prac- tices by Hinkin (1995). It has been shown that reliability drastically increases if at least five points are used (Lissitz and Green, 1975). However, Lissitz and Green (1975) and Cicchetti et al. (1985) also discovered, that reliability does not by far increase at similar rates if choosing more than five points. As suggested by Masters (1974), a seven point Likert scale may still (even if only to a small amount) increase reliability and response variance compared to a lower number of points. Though it is conceivable, that it might not be easy, if possible, to differentiate between seven distinct points for everybody on each potential scale. This also complies with Lehmann and Hulbert (1972), who suggest to use a five to seven point scale if focusing on individual behavior. It should be noted, that an even scaling (e.g. 6 points) is not recommended and was not intended when developed by Likert (1932), as there is no neutral mid-element for respondents, which might lead to undesired results, such as respondents quitting the survey because they get pressed to decide for a specific part of the scale.
Another issue to consider is the count of indicators of arranged scales or the length of a questionnaire, which needs to be balanced between two effects of opposite directions.19 When insufficient items are included in a (final) scale, the measurement may generally lack content and construct validity as well as inter- nal consistency (Nunnally, 1978).20 Though it might be right, that respondents may prefer shorter questionnaires and thus scale length possibly affects responses (Roznowski, 1989), DeVellis (2003) does not recommend to keep questionnaires too short. DeVellis furthermore argues that the amount of collected responses may be of no value and may lead to false conclusions, if no correct interpreta- tion of the data is possible. Based on Cook et al.
1 All variables are further discussed in section 3.2.1.
2 More information about this issue is presented in section 3.3.2.
3 Please note, that the civil marriage process is not affiliated with the church wedding. The residency change is also referred to as residence (re-)registration.
4 Obviously, there are characteristics of certain processes that cannot be exactly matched with another process. Whereas for example a registrar is present during the civil marriage, there is no registrar involved in the residency change. The degree of equality can be reviewed in the final questionnaire presented in appendix 1.2.
1 Following the view of Homburg and Giering (1996), this study will refer to the term (the- oretical) "construct" if the underlying theoretical theory is explicitly focused, in contrast to "(latent) variables" which represent the operationalized measurement entities of actual constructs. Though, as the latent variables are used to measure constructs, literature frequently uses both terms in a redundant and not entirely distinct way. Most of the time, it does however not change any meaning.
2 For a discussion on the method of measurement please refer to section 2.2.2.
3 Some researchers like Diamantopoulos and Siguaw (2006b) refer to scale development only for developing reflective measures, whereas they denote the development of formative indicators as index construction. These names are caused by very different assumptions and theory underlying reflective and formative indicators. The different names are delimited in section
4 Despite the different focus of this study, the variance of only two public processes would most likely not suffice to adequately test the interrelated dependence relationships of the conceptual model confirmatory.
5 See section 2.2.4 for more information on reliability and validity.
6 Several steps which should furthermore be considered on the way to reliable and valid in- struments, like the actual phrasing and creation of indicators, the count of items in a scale as well as other distinct development procedures like a pre-test or the data analysis are discussed throughout this chapter.
7 However, due to the vast amount of publications, those challenges are not meant to be exhaustive.
8 These mentioned procedures are by far not meant to be exhaustive.
9 Though, this does not necessarily imply, that either one of the procedures is better suited for scale development nor more appropriate than any other.
10 The actual quantitative data collection is presented in section 3.4.1. It is noteworthy, that whereas survey takers get "questioned", the items or phrases themselves are often no formal questions, depending on the used scaling. However, the reader should not be confused by the two terms scale (i.e. a set of indicators) and scaling (i.e. the type of answer possibilities as for instance Likert scales).
11 For reflective constructs, an item of comparatively low quality (for whatever reason) would lead to a lower item to total-correlation, which indicates a candidate for elimination as it decreases scale reliability and somehow validity (DeVellis, 2003). A formative item like this would mainly negatively influence the adequateness of capturing the domain of interest (i.e. content validity), which is even more unwanted for formative in contrast to reflective indicators (MacKenzie et al., 2005). Following common sense, low quality items are always worse than items of a higher quality, but the quality criteria is different for both methods
12 For more information on social desirability, please refer to the description of his scale devel- opment procedure in section 2.4.
13 The method of measurement is referred to by other authors as mode of measurement (e.g. Gudergan et al., 2008) or measurement perspective (e.g. Diamantopoulos and Siguaw, 2006b).
14 For the sake of simplification, this diploma thesis sticks to the terms reflective as well as formative for indicators as well as latent constructs, no matter if the construct is measured reflectively or formatively. According to Petter et al. (2007), this is also consistent with IS literature. Furthermore, the thesis will not adapt index construction for clarity, even if referring to formative indicators.
15 Considering the related study of Barth and Veit (2011), none of the constructs is superior complex on its own. Consequently, no further introduction to multidimensional constructs is given.
16 For an introduction to structural equation modeling, which was also used in this study, please refer to section 188.8.131.52.2.
17 For a thorough comparison of variance-based and covariance-based analysis techniques, please refer to section 184.108.40.206.2.
18 Refer to DeVellis (2003) for more information about available scalings. Due to space restric- tions and the appropriateness of Likert scales for this study as shown in this section, no other scalings are further examined.
19 This and the next paragraph are mainly based on Hinkin (1995) and Hensley (1999). When talking about the length of a questionnaire, its the final length at the end that is referred to. It is intended to create more measures at the beginning to only keep good perform- ers included during instrument refinement (Homburg and Giering, 1996). DeVellis (2003) recommends that about half of the items shall make it to the final questionnaire.
20 For a closer examination of these not yet formally introduced reliability and validity require- ments see section 2.2.5.