# RES601 MODULE 1 CASE, SLP AND DISCUSSION

**Module 1 – Home**

**REGRESSION AND STATISTICAL EFFECTS**

**Modular Learning Outcomes**

Upon successful completion of this module, the student will be able to satisfy the following outcomes:

- Case
- Explain the limiting nature of assumptions required by various statistical procedures, and the effects that violations of these assumptions have on the validity of results.
- Explain a variety of effect size measures and their relationship to statistical significance.
- Understand, apply, and interpret simple and multiple regression models, and explain their limitations.

- SLP
- Conduct and interpret a variety of regression analyses.

- Discussion
- Discuss the nature of causality in behavioral science, and its relationship to what is detected by statistical analysis.

**Module Overview**

Aside from exploratory qualitative research, most research comes down to examining relationships among theoretical constructs by measuring variables that correspond to those constructs and looking at statistical correspondences among the variables. “Constructs” are theoretically interesting ideas about how behavioral phenomena are lumped together. They can’t be measured directly, but they are assumed to have a variety of consequences that will show up in attitudes and/or behaviors that can be measured. Theories are often made up suggesting relationships among constructs rather than of relationships among concrete and measurable phenomena. But we don’t have tools for concretely measure relationships among constructs—all we have is statistics, and it has to work with variables, or mathematically represented images of those constructs.

Statistics is, as we have said, a great tool for untangling relationships among relatively well-defined phenomena. This is because almost all statistical procedures come with fine print attached in the form of assumptions, most specifically about the nature of the data to which they are being applied. These involve mainly the distribution of the variables, the degree of relationships among them, and the nature of the sampling procedures.

Almost all of the dissertations prepared by Trident students will use varieties of regression analysis on their data, sometimes in the form of structural equation modeling or other advanced regression-based techniques. There’s no question that regression models, or more properly, various varieties of the general linear model, are the most flexible and general-purpose statistical techniques—applicable to almost any situation where there is variation in the data, easy-to-learn and explain, useful in terms of their output, and, most importantly, capable of being interpreted as evidence supporting a causal relationship between variables—a set of procedures referred to as “path analysis”. Although the technique does require making certain assumptions about the data in terms of their distribution and other properties, it can withstand a fair degree of stress on those assumptions and still produce useful results; statisticians refer to it as a “robust” technique. Thus, regression seems to have almost everything going for it in the stat tools sweepstakes.

The thing that makes regression most attractive to data analysts is probably the inference of causality that it can provide. Although we ceremonially say “correlation does not equal causation”, we also use terminology such as “predictor” and “criterion”, interpret certain coefficients as “effect sizes”, and use the technique as evidence to support what are in effect causal models. The language is all about causation, because that’s where the interest is. Co-variation of two phenomena is a pretty weak property; all it means is that two things change at the same time, which is not real hard to do. The really interesting thing for us humans is to have things change because we want them to; implicitly or explicitly, the purpose for most of the research we do is to figure out ways to make that change process work better or more efficiently.

Now it is very important for the researcher in training to acquire mastery over these powerful statistical tools. That’s all well and good, and in varying forms regression is likely to be your statistic of choice for at least the remainder of your doctoral training if not beyond. Regression is an excellent way to test formal hypotheses, assuming that the data meet the required assumptions and that the problem is formulated in a way that maps data onto the hypothesis appropriately. Even when some of the assumptions are not met, regression remains a useful procedure. Indeed, it’s some of that very usefulness that causes it to be pushed into service to support causal inferences beyond its immediate ability.

But—and now we come to the real heart of the issue—even when used appropriately and within the bounds of good theory, **regression remains merely a test of association.**

The purpose of this module, then, is not only to introduce you to the practicalities of simple and multiple regression, but also to induce you to think about the reasons why this technique is so popular and so commonly applied—specifically, the degree of which we can use it to justify our inherent need for understanding of causality.

# Module 1 – Background

## REGRESSION AND STATISTICAL EFFECTS

### Required Reading

Porter, A., Connolly, T., Heikes, R.G., & Park, C.Y. (1981). *Misleading indicators: The limitations of multiple linear regression in formulation of policy recommendations*. *Policy Sciences*, 13, 397-418.

Phau, I., & Teah, M. (2009). Devil wears (counterfeit) Prada: A study of antecedents and outcomes of attitudes towards counterfeits of luxury brands. *The Journal of Consumer Marketing*, 26(1), 15-27.

Evans. A.L. (2008). Portfolio manager ownership and mutual fund performance. *Financial Management*, 37(3), 513-535.

Banker, Rajiv; Hu, Nan; Pavlou, Paul A.; and Luftman, Jerry. 2011. “CIO Reporting Structure, Strategic Positioning, and Firm Performance,” *MIS Quarterly*, (35: 2) pp.487-504.

Guzman, I. R., & Stanton, J. M. (2009). IT Occupational Culture: The Cultural Fit and Commitment of New Information Technologists. *Information Technology & People, 22*(2), 157-187.

### Required Videos

CPG Orlando. (2013, July 15). *How to clean SPSS data* [Video]. YouTube. *https://www.youtube.com/watch?v=Ik4Dyn8e8vA*

Cromer, K. (2020, August 16). *RES601 mod 1 SLP cleaning datasets* [Video]. YouTube. *https://youtu.be/RxmQNC1S7i8*

Grande, T. (2015, April 22). *Computing variables in SPSS* [Video]. YouTube. *https://www.youtube.com/watch?v=xTstSbkP8Fg*

how2stats. (2016, April 20). *How to detect outliers in SPSS* [Video]. YouTube. *https://www.youtube.com/watch?v=qQqF6HZo0Gc*

how2stats. (2011, September 8). *The right way to detect outliers: The outlier labeling rule (part 3)* [Video]. YouTube. *https://www.youtube.com/watch?v=bRdC1u9veg8*

### Optional Reading

Decision 411 (n.d.). Interpreting plots of predicted values and residuals. Duke University. Retrieved from *http://www.duke.edu/~rnau/regnotes.htm#plots*

Hopkins, W. (2000). log transformation for better fits. A New View of Statistics. Retrieved from *http://www.sportsci.org/resource/stats/logtrans.html*

Lane, D. (n.d.). Prediction. HyperStat Online. Retrieved from *http://davidmlane.com/hyperstat/prediction.html*

Trochim, W. (2006). Regression. Research Methods Knowledge Base. Retrieved from *http://www.socialresearchmethods.net/kb/genlin.php*

# Module 1 – Case

## REGRESSION AND STATISTICAL EFFECTS

### Assignment Overview

The first step in understanding how to use regression effectively in your research is to understand just what it is, what sorts of estimates it produces, and what its assumptions and limitations are. It is a very complex topic, and there is no way you will do more than scratch the surface at this point in your training. Here are some useful links to help you familiar with the regression analysis.

Introduction to regression. Retrieved from *http://dss.princeton.edu/online_help/analysis/regression_intro.htm*

Cottrell, A. (2011, Sep. 2). Regression analysis: Basic concepts. Retrieved from *http://users.wfu.edu/cottrell/ecn215/regress.pdf*

Regression analysis. Retrieved from *http://elsa.berkeley.edu/sst/regression.html*

When you have a reasonable conceptual familiarity with regression, please turn your attention to a most interesting critique of the technique. While specifically anchored within the policy sciences context, the critique applies equally well to all managerial research that purports to offer guidance on action based on findings from regression analyses. It’s important to understand the nature of their reservations about the approach.

Porter, A., Connolly, T., Heikes, R.G., & Park, C.Y. (1981). *Misleading indicators*: The limitations of multiple linear regression in formulation of policy recommendations. *Policy Sciences*, 13, 397-418. Available in the Trident Online Library.

Now that you know what the problems are, it’s time to take a look at some sample research and see if it holds up under the critique. You are to pick one respectable study in an area of interest to you that uses multiple regression as a primary analytical strategy. You have the following three choices; pick one (of course, nothing’s stopping you from reading the other two as well, but it’s not necessary for the purpose of this exercise):

**Marketing:**

Phau, I., & Teah, M. (2009). Devil wears (counterfeit) Prada: A study of antecedents and outcomes of attitudes towards counterfeits of luxury brands. *The Journal of Consumer Marketing*, 26(1), 15-27.

**Finance:**

Evans. A.L. (2008). Portfolio manager ownership and mutual fund performance. *Financial Management*, 37(3), 513-535.

**Information Systems:**

Banker, Rajiv; Hu, Nan; Pavlou, Paul A.; and Luftman, Jerry. 2011. “CIO Reporting Structure, Strategic Positioning, and Firm Performance,” *MIS Quarterly*, (35: 2) pp.487-504.

Guzman, I. R., & Stanton, J. M. (2009). IT Occupational Culture: The Cultural Fit and Commitment of New Information Technologists. *Information Technology & People, 22*(2), 157-187.

### Case Assignment

Read these articles, supplemented if you wish with material from the optional readings and perhaps the supplementary background as well, and even other outside reading you might find by yourself to be useful. Then write a 5 to 12 pages critique of the use of multiple regression analysis in one of three papers listed above.

### Assignment Expectations

A critique is a review and commentary on a particular article or piece of research. It is not necessarily critical in the negative sense, although you may need to comment negatively on some aspects; both positive and negative aspects should be treated. Just because something appears in print, even in an A-list journal, does not make it free from possible errors or beyond criticism; nothing should be necessarily taken at face value. Your informed commentary and analysis is as important as your summary of the material in the article — simply repeating what the article says does not constitute an adequate critique. You are also expected to use the terminology of regression correctly and clearly.

In this case, your critique should address at least the following issues, as well as any other points that you find relevant and worthy of comment:

A brief summary of the paper: its purposes, methods, and reported findings

The use of regression in the data analysis, and its relation (if any) to other kinds of analytical and/or statistical methods

The nature of the data used, and the degree to which the data met the requirements for regression as described by Garson (n.d.) and Porter et al (1981).

The appropriateness of the interpretations of coefficients developed in the analysis

The overall applicability of the Porter et al. (1981)’s critique to this study – does their approach call it into question, or does it manage to evade their critique? How, in either case?

Your overall assessment of the utility of regression as an analytical strategy in the kind of research you are contemplating for your dissertation and beyond, and your ideas for overcoming the problems raised for this strategy by Porter et al.

Remember, this is an applied statistics course. Thus, explaining the statistical tools, interpreting coefficients, and understanding the properties of the data analysis are particularly important, and need your careful thought and comment, not just general or generic observations.

You are expected to present your critique in appropriate academic form and language, with citations to the readings where needed.

**Module 1 – SLP**

**REGRESSION AND STATISTICAL EFFECTS**

In the Session Long Project this session, you will be conducting data analysis to test a hypothesis. In your SLP study, your research question is:

RQ1: What is the effect of engagement attitude on engagement behavior?

You already conducted an extensive literature review, using engagement theory and reasoning to develop a single hypothesis.

H1: Engagement Attitude is positively related to Engagement Behavior.

Your independent variable, Engagement Attitude (EA), is a construct that the literature and prior research suggests is best reflected in two variables, Organizational Commitment (OC) and Job Satisfaction (JS). Your dependent variable, Engagement Behavior (EB), is a construct that the literature and prior research suggests is best reflected in two variables, Task Performance (TP) and Organizational Citizenship Behavior (OCB).

In addition, you will check to see if two control variables, Age and Gender, interact with the relationship in the model.

This is your research model.

**Module 1 – Outcomes**

**REGRESSION AND STATISTICAL EFFECTS**

- Module
- Explain a variety of effect size measures and their relationship to statistical significance
- Explain the limiting nature of assumptions required by various statistical procedures, and the effects that violations of these assumptions have on the validity of results
- Understand, apply, and interpret simple and multiple regression models, and explain their limitations.

- Case
- Explain the limiting nature of assumptions required by various statistical procedures, and the effects that violations of these assumptions have on the validity of results.
- Explain a variety of effect size measures and their relationship to statistical significance.
- Understand, apply, and interpret simple and multiple regression models, and explain their limitations.

- SLP
- Conduct and interpret a variety of regression analyses.

- Discussion
- Discuss the nature of causality in behavioral science, and its relationship to what is detected by statistical analysis.

# Discussion 1

How can regression be used to address a research question within your area of interest? Discuss pros and cons.

You already collected your data using a questionnaire. Now you are ready to complete the statistical analysis. Make sure you save a clean copy of the dataset at the end of each module in case you need to start over.

The SLP dataset is modified for use at Trident University International from a raw dataset collected by Dr. Kenneth Cromer, Faculty, Glenn R. Jones College of Business, Trident University International. You will be using this dataset for the SLP in each module of RES601.

First, open the *dataset *in Excel and get familiar with it. There are 421 cases, each reflecting the survey responses of one participant. There are 33 items in the questionnaire: 10 items measuring Organizational Commitment (OC), 5 items measuring Job Satisfaction (JS), 8 items measuring Task Performance (TP), 8 items measuring Organizational Citizenship Behavior (OCB), and 2 items measuring demographic control variables Age and Gender.

**Module 1 SLP: Data Cleaning**

A raw dataset is not ready for statistical analysis until it is prepared for use. Most datasets contain cases that are incomplete or contain missing data. Your assignment in SLP1 is to get the dataset ready for analysis. This explanatory video will walk you through the assignment:

RES601 Mod 1 SLP Cleaning Datasets *https://youtu.be/RxmQNC1S7i8*

- Delete incomplete cases (cases missing responses to more than a few items), and then import your dataset to SPSS and handle missing items. Please review this video for information on how to clean data. You can ignore the second half, as there are no items to reverse score.
*https://www.youtube.com/watch?v=Ik4Dyn8e8vA* - Create the variables (OC, JS, TP, OCB, EA, and EB). Make OC, JS, TP, and OCB the statistical mean of all items in the respective scale. Make EA the statistical mean of JS and OCB and make EB the statistical mean of TP and OCB. A good resource for this is
*https://www.youtube.com/watch?v=xTstSbkP8Fg*. - Remove any significant outliers from the created variables. My suggestion on this is to use the 1.5 plot in SPSS discussed in this video,
*https://www.youtube.com/watch?v=qQqF6HZo0Gc*and then if you have an outlier identified, use the 2.2 multiplier discussed in this video to confirm it*https://www.youtube.com/watch?v=bRdC1u9veg8*. If you find a significant outlier the best course of action is to remove the case from the dataset. - Create an essay where you describe your experiences in each step and report how the dataset changed with each step in the process. Include properly formatted tables reflecting the number of missing values after your substitution. Upload the assignment to the SLP Module 1 Dropbox.

**SLP Assignment Expectations**

Your assignment will be graded using the following criteria :

**Assignment-driven Criteria:**Expressing quantitative analysis of data to support the discussion showing what evidence is used and how it is contextualized.**Interpretation:**Explaining information presented in mathematical terms (e.g., equations, graphs, diagrams, tables, words)**Presentation:**Ability to convert relevant information into various mathematical terms (e.g., equations, graphs, diagrams, tables, words)**Conclusions:**Drawing appropriate conclusions based on the analysis of data.**Timeliness and Professionalism:**Student demonstrates excellence in taking responsibility for learning, adhering to course requirement policies and expectations. Assignment submitted on time or collaborated with professor for an approved extension on due date.