Criminal Justice 405
Graduate Statistics:
Simple and Multiple Regression and Beyond
all materials (c) 1999 by R. B. Taylor unless indicated otherwise
on the web at:
http://www.rbtaylor.net

Syllabus
Spring 1999

UPDATE NOTES TO STUDENTS

1/24/00
Click here to download the US ecological data file that gets used in the examples discussed in the note files. The file is names usdanu12.sav. You should print out and look at the frequencies for that file.

TOMORROW for our session starting at 5:45 PM we will review NOT9905, and go back over some of the basics of simple regression, to be sure we have all those pieces down and in place. There was a link in not9905 that I just fixed.

S

Basics

Student emails

Assumptions

Memos for distribution:
MEMO0312 Scatterplots
MEMO0322
MEMO0324
MEMO0403modified 4/6

Memo0422 including homework

Data

Grades

Downloads for labs and assignments: you can click on the following:
LAB0303
LAB0318
LAB0325 < Download

LAB0325 - commands in separate file; optional download
LAB0408 - lab commands and such

Lab

Load

Purpose

Software

Structure

Texts

Themes

Topic and Reading Sequence

Links to Note files (links removed AT THIS TIME BECAUSE CLASS IS NOT BEING CURRENTLY OFFERED)
NOT9901B
NOT9902
NOT9903
NOT9905
NOT9906
NOT9907
NOT9908

 

 

BASICS

Instructor

R. B. Taylor

Time

One Hour Lab

Thurs: 6 - 8:30 (may be adjusted forward)

Tentatively: Tu., probably either 3-4 or 5

Office

539 Gladfelter

Office Hours

TuTh 11:30 - 1:00; 2:40 - 4

for updates/cancellations see:

http://blue.temple.edu/~ralph

Lab

5th Floor Gladfelter; available when building open; swipe ID card

Contact

215.204.7169

610.446.9023 (fax)

ralph@blue.temple.edu

Purpose

This is a course in statistics for MA and beginning PhD level students in criminal justice. The course has two, interlocking general goals. The first: students can run simple and multiple regression analyses, intelligently interpret the output, and know how to "check out" various problems that may be inherent in the data or analysis. The second: students can read and understand criminal justice research articles using multivariate statistical analyses based on simple or multiple regression, or related techniques Their expertise is such that they can easily grasp the details of the results.

 

The bulk of the quantitative methods used in these articles rely on the general linear model (GLM). GLMs include simple correlation, simple regression, partial correlation, ANOVA, ANCOVA, and multiple regression. Variations on the GLM include logit, probit, tobit, log-linear, principal components analysis and factor analysis, discriminant functions, and so on.

We will be devoting the bulk of the semester to learning about the basic ideas behind such techniques.

 

I want you to understand the reasoning behind statistical analyses. Yes, there may be some formulas to memorize, but not a lot (less than 60). I also want you to be able to think intelligently about output produced by statistical analyses.

 

THEMES

In teaching this course I stress two general themes; themes you may find useful regardless of where you go after completing your work in this program.

 

It is always important to find out how the data are arranged. There is no substitute for binocular inspection - eyeballing the data.

 

Researchers starting analyses with a new set of data may be tempted to "skip" preliminary exploratory analyses, and go right to univariate or multivariate statistical tests. I strongly urge against this for a number of reasons that should become clear over the course of the semester.

 

Weird cases can make a world of difference in your results and interpretation.

 

Statistical data processing has become more interactive and iterative over the last decade. It is easier to look at your results and re-do them without a case or two that may be strongly influencing your results. Editors and reviewers expect that we will spend more time doing exactly this. Therefore in this course we spend considerable time on regression "diagnostics" - indicators telling us if a case may be having an unusually strong impact on the results - and redoing analyses in light of those diagnostics.

 

SOFTWARE

The course may help you become literate or more literate in microcomputer-based statistical computing. In past years (back in the Cenozoic era), students relied solely on hand calculators. They spent too much time trying to get the numbers to come out right, and less time thinking about what the numbers meant. They also found that their hand calculator skills were not in great demand outside of the classroom. I hope that the microcomputer experience you gain with this course will serve you well on the job or in a position as a research assistant. This is the fifth year that I have taught this course on a microcomputer-base.

 

You have three software options.

FREE. If you don't want to spend any money, you can use SPSS FOR WINDOWS VERSION 6.x available in the 5th floor lab, and also available in the GH 107 lab. It also is available in the main Anderson lab.

CHEAP. If you need SPSS software to run at home because you cannot spend your time in the lab here on campus, and you want to buy the very cheapest version available, get SPSS FOR WINDOWS STUDENT VERSION. This will run you somewhere between $60 and $90. For information on how to buy either check the bookstore or contact PRENTICE HALL.

Be aware that this program is limited.

* it cannot write command (also called syntax) files. I think being able to write command files is an extremely important capability.

* it can only work with data files if they include less than 50 variables and less than 1500 cases.

* There may be other limitations as well, but I think these are the most important.

COSTLY. If you want to spend more money, you have two options:

SPSS GRAD PACK. This will cost you about $200. It is a pretty full-featured program, although again, there are limitations. If you are going on in CJ BUT WILL NOT BE TAKING CJ 605, this is probably a good all-around program.

SPSS GRAD PACK FOR BUSINESS. This is as above, but it DOES include time series, which you PROBABLY will need for CJ 605. But it lacks some of the other multivariate techniques you would like to have.

 

You can get more details on each of these by contact SPSS directly at

www.spss.com

The Bookstore refused to stock copies of either of the latter packages, so you will need to deal directly with SPSS.

Software only works with the requisite hardware. Be sure you have enough clock cycles and disk space to run these things. You probably at least want to have a 486DX2 if you're a patient person. Disk space requirements go up from 4 MB of disk space for the STUDENT VERSION. You want to have at least 16 mb of RAM.

 

SOFTWARE VERSIONS PROBLEM

The SPSS version in the lab is SPSS for WINDOWS 6.x. When you buy new versions of SPSS those will be either 8, 8.5 or 9. Some commands differ from version to version. Most importantly, SPSS writes output files differently, and creates charts in different ways. Beware.

 

When I give you example command files, and directions on how to handle output, those will probably be in version 6.x format, to guide those in the lab.

 

Of course, any command files and data files written with the 6.x command set can be read by versions 8 and up. I am not so sure about the reverse.

 



COMMENT ON LOAD

Students in past years have typically reported that the time involved in this course is anywhere from 25% to 100% greater than what is required in other graduate courses. The extra effort is required because you are learning two different things: how to run computer programs, and how to think about results. Try and plan ahead to allocate more time to this course, particularly if you do not consider yourself "computer literate."

Further, given the volume of material covered every week it is essential that you be here for every class. If you absolutely must miss a class, please let me know in advance.

 



ASSUMPTIONS

There are three working assumptions behind how I have set up this course.

 

My first assumption is that you have had an undergraduate course in social statistics and remember some of it, and/or are willing to spend significant time reviewing that material.

 

By basic statistics I refer to the following concepts:

 

frequency distributions

measures of central tendency

measures of dispersion

the normal curve

areas under the normal curve

the logic of hypothesis testing

probability theory

t test

 

To help you get back up to speed on these basics:

 

* We will spend a little time early in the semester reviewing some of these concepts,

 

* You will probably need to spend some time outside of class re-familiarizing yourself with some of these materials.

 

* During the first two or three weeks I will be willing to hold tutorial sessions with students in groups or four or more if several of you feel you have big gaps that would not be easily remedied by serious, intensive self-directed study. Or if the computer stuff is driving you totally bonkers. You should network with each other and let me know if there is interest. If there is sizable interest (at least 3 people) we can schedule a specific session and let everyone know about it.

 

* If you feel you need more in-depth therapy, ask Dr. Avakame if you can sit in on his statistics course throughout the semester. You may want to consider doing that before going ahead with this course.

 

My second assumption is that you are somewhat familiar with microcomputers. I assume that you know how to turn them on, how to insert floppies, how to find your way around hard disks, how to use basic WINDOWS and DOS commands such as copy, delete, dir, and so on. If you do not have some basic computer literacy, please prevail upon one of your more knowledgeable friends to help you out as soon as possible.

 

Every time I teach this course, at least one student loses mission critical files. The more you know about Windows and DOS commands, the less likely you are to lose important information. I recommend WINDOWS FOR DUMMIES. You also could get and read DOS FOR DUMMIES or Van Wolverton's RUNNING MS-DOS. The more you know about Windows and MS-DOS the less likely you are to lose files.

 



It also is true that every time I have taught this course at least one student has had a floppy fail or has suffered from a virus. Keep all mission critical files - including data, command files, listing files, and papers, on at least a couple of disks.

 

My third assumption is that you have access to a Windows 3.1-capable, IBM compatible computer for running the software OR you are willing to spend time in the Gladfelter lab getting your stuff run out. If you are not sure what your hardware can do, see me. If you are having trouble getting access to the appropriate type of microcomputer, please let me know immediately.

 



DATASETS

We will be working with two different datasets throughout the course of the semester. One is an ecological dataset using information from 50 states. The second is questions from a recent national survey on gun ownership, conducted by Phil Cook and Jens Ludwig for the Police Foundation. I use the two different datasets so we can get used to thinking about theory at different levels, and so we can see some of the differences between individual and ecological data.

 



CLASS STRUCTURE

You will have a homework assignment almost every week. To complete most assignments you will run one or more statistical analyses on a dataset, and interpret the results, and write it up. For some weeks the homework assignment may involve reading an article and writing the findings up in your own words.

 

You should bring to class two copies of your homework assignment, or an original and a copy. You will hand in your assignment due for that week at the beginning of class. That way you can keep a second copy and make notes on it as the class discussion unfolds.

 

During class I will be presenting conceptual material, reviewing readings, answering questions, and reviewing homeworks. You should be prepared to answer questions on any aspects of the readings and/or homework assignment for that week. In short, when you come to class, be prepared to talk about a few or many aspects of the work you have completed.

 

WEEKLY LAB

We are scheduling a one hour lab, in addition to the regular 2.5 hour course, as approved in 1995 by the department's Graduate Committee. Students in past years have strongly recommended a lab on a SEPARATE night to help them better absorb the vast mountain of material covered in this course.

 

In that lab we will run through procedures needed to complete the homework assignment for the following week, and may explore additional issues. The lab is scheduled on a separate night to avoid total meltdown. The lab will be held in the 5th floor lab. We will need to talk about when to schedule this. The only times I am currently available would be Tuesday afternoon or early evening, or some time early on Friday. Obviously having the lab on Tuesday creates problems for homework problems where you only have a week to complete the assignment.



CLASS GRADING

Your grade in this class will be based on the following:

70%

Average grade on handed in homework assignments. I will drop your worst grade from the average. Each assignment will either (a) ask you to run a problem and interpret the results or (b) read an article and describe detailed results in your own words. Toward the end of the semester I may announce that a limited number of homework assignments can be redone

20%

Final examination, to be held at the end of the semester. This will probably be an in-class no-notes exam. The exam will take place THURSDAY MAY 6 AT THE USUAL TIME.

10%

In-class participation. The participation may take several forms: answering questions, completing in-class group-work or in-class individual assignments.

GRADING POLICIES

 
GUIDELINES ON AVOIDING ACADEMIC MISCONDUCT

We will discuss in class the nature of academic misconduct, including plagiarism. You are responsible for understanding the different varieties of academic misconduct. If I encounter solid evidence of academic misconduct I will discuss the matter with you, and then deliver the consequence I deem appropriate. Possible consequences include: failure on the assignment in question (i.e., a 0); assigning a failing grade for the course; or attempting to have you expelled from Temple University. Should you wish to contest a decision I make on academic misconduct, I will inform you of the procedures to follow. The department and the college have fully specified grievance procedures.

 

Makeup Policy.

There will be no makeups for a missed final exam unless

* you notify me before the missed exam

* and you have a reason for missing the exam that I find valid (e.g., car accident) (I no longer accept excuses like your friend's grandmother dying.)

* and I have something in writing, for my records, verifying the nature of the problem.

 

 Late Assignments.

Assignments are due on the date indicated. I reserve the right to lower the grade for assignments that are handed in late. The amount the grade is lowered increases the longer the delay in handing the assignment in. Depending on the assignment, the grade may be lowered 1% to 10% a day.

If you have an excuse for a late assignment I will take this in to account only if you notify me beforehand about the problem and I find your excuse for the delay to be a valid one and I have something in writing. Again, a friend's grandfather's death may be questionable.

 

Regrading policy.

You have the right to submit any assignment for regrading. If you wish to submit an assignment for regrading proceed as follows:

 Prepare a written statement explaining why the assignment should be regraded. This applies to written assignments, essay exams, and multiple choice exam questions where you think there was more than one correct answer.

 On a cover sheet print your name, SSN, name of the assignment or test, date of the assignment or test, and the date you submitted the assignment for regrading.

Staple the cover sheet to your written rationale and the original assignment.

I will review your request for regrading. I will consult with other faculty if I deem that appropriate. As a result of your request for regrading the grade on your original assignment may stay the same, or it may go up, or it may go down.

 

Guidelines for Papers

You should type each written assignment, double spaced. You also should proof your written work carefully. Mis-spelled words and flagrantly poor grammar will reduce your grade. On your papers I usually take off one point for every mis-spelled word and one point for every flagrant grammatical error. Needless to say, this can add up after a while. I urge you to:

* always run the spell checker

* always run a grammar checker

* proofread carefully, if possible, get someone else to proofread for you as well.

Many students find that their writing improves if they consult some books on writing like Strunk & White's The Elements of Style or Provost's 100 Ways to Improve Your Writing. You can find copies of these in the bookstore under my undergraduate course CJ 160.

 

I strongly urge you to carefully proofread and to spell check and to grammar check every paper.

 

CLASSROOM EXPECTATIONS

- Please arrive on time for class. If you have something special, and you know you cannot make it to class on time, please let me know.

 

- If you must leave class early, please let me know.

 

- If you must miss class, please let me know beforehand (see above).

 

- Do not bring food or drinks into the lab. They can throw you out. Pepsi and floppies do not mix well.

 



TEXTS AND MATERIALS

Hamilton, L. C. (1992) Regression with graphics: A Second course in applied statistics. Monterey: Wadsworth.

 

We will use this as our main text on regression. I chose this book because 1) students have been unhappy with every other book on regression I chose; 2) it makes extensive use of graphical displays for understanding data, an approach used extensively in this course; 3) he deals "up front" with non-normal data and how to handle it, and this is an important issue in criminal justice research; and 4) although they will be covered only lightly in this course, the volume contains information on important recent developments in the general linear model (e.g, bootstrapping, structural equation modeling) that you may need to know about in the future. In short, I think it will hold up well as a general reference book on the topic. Unfortunately, Hamilton's examples all come from environmental science, which some students find less than enthralling.

 

If you feel that you need another text in this area, here are some that I have used in the past and are basically pretty good. Students, of course, have differed with me in their assessments.

 

Taylor, R. B. (1999). Various notes on statistics and regression.

 

You will link to my website and print these out, or save them to your own hard disk. I will do all I can to insure that each file is downloadable. If you encounter any problems whatsoever let me know asap. I used to put all this in a student copy pack but the copy center charges a ton.

 

These refer mostly to conceptual material we are covering throughout the course. I will tell you which set of notes we are covering which week. You want to read these notes thoroughly and carefully before coming to class for the week they are assigned.

 

To get to these notes go to:

 

http://blue.temple.edu/~ralph

 

There are a couple of related topics that we are going to try and get to at the end of the semester. One deals with regression for ordinal or nominal outcomes. This text is:

 

Aldrich, J., and Nelson, F. D. (1984). Linear probability, logit, and probit models. Newbury Park: Sage.

 

Another topic addresses multiple dependent variables, or data reduction for independent variables. This topic is factor analysis. The text we hope to read is:

 

Kline, P. (1994). An Easy Guide to Factor Analysis. London: Routledge.

 

The above two texts "should" be in the bookstore. Of course, there is always AMAZON.COM. The texts below are NOT in the bookstore.

 

RECOMMENDED TEXT: Porkess, R. (1991). The Harper Collins Dictionary of Statistics. New York: Harper.

 

Porkess is a useful guide to some basics - when you want to review variance or skewness or the normal distribution.

 



SOME OTHER TEXTS YOU MIGHT FIND HELPFUL:

 

Darlington, R. (1992) Regression and linear models. New York: McGraw Hill.

 

Advantages: this book also makes extensive use of graphics. He also introduces logit and probit transforms, which may be important for those of you who want to loglinear model. Disadvantages: all its examples come from psychology; example runs come either from SYSTAT or SAS, neither of which we are using; students in the past have said they feel Darlington is "talking down" to them.

 

Cohen, J., and Cohen, P. (1983) Applied multiple regression/correlation analysis for the behavioral sciences (Second edition) Hillsdale, NJ: Erlbaum.

 

This book uses a set theory approach to explaining multiple regression that many students seemed to like. Be warned, however, the Cohens use a notational system that takes some getting used to.

 

Blalock, H. M. (1979) Social statistics (Revised second edition) New York: McGraw Hill.

 

This is an excellent and widely-revered volume. But it is also closely written; it requires careful reading.

 

CAS LAB MATERIALS

 

Every week, before lab, I will ATTEMPT to distribute a "lab guide."

 





SEQUENCE OF TOPICS, ASSIGNMENTS AND READINGS

(subject to possible revision at a later date depending upon a host of factors)

Week

Date

Read

Class topic

Lab

1

1/18

NOT9901B

HAMILTON,

pages 1 - 23;

Taylor (1993) Research methods in criminal justice. New York: McGraw Hill, Chapter 10 "Sampling" pp. 183-192. TEXT IS ON RESERVE IN PALEY FOR CJ 160

Class: review syllabi

Descriptive data displays for single variables:

histogram, box and whisker, s&l, P-P, Q-Q

Generate and interpret univariate graphical displays; use explore; deciding if a variable is normal

2

1/25

NOT9902

GRST8706

The logic of hypothesis testing;

z test

Student t-test

T-test

Carry out independent and dependent t-tests; interpret results

3

2/1

NOT9904

NOT9905

Hamilton, pp.29-42, 51-53, 289-294

Understanding covariance

Transforms and functional relationships;

Looking carefully at scatterplots

4

2/8

NOT9906

Simple regression and zero-order correlation: B, A, r

Looking at scatterplots and regression lines

5

2/15

Hamilton pp. 42-49

Hypotheses we test in simple regression: B and r

6

2/22

NOT9907

Hamilton 124-133

Residuals in regression, error, and assumptions

WE MAY NEED TO RESCHEDULE THIS CLASS DUE TO AN UNAVOIDABLE CONFLICT

 

7

3/1

NOT9908

Hamilton, 65-72

Residual diagnostics

WE MAY NEED TO RESCHEDULE THIS CLASS DUE TO AN UNAVOIDABLE CONFLICT

BREAK: GO AWAY

8

3/15

Catchup

9

3/22

NOT9909

Hamilton, pp. 77-82

Partial correlation and multiple regression

10

3/29

NOT9910

NOT9910B

NOT9910C

Hamilton 109-133

- Going back to assumptions: a checklist approach

- Measures of influence, leverage, and general deviance

11

4/5

NOT9911

NOT9914

Hamilton 53-59, 84-88

Dummy variables

Interaction terms;

test of R squared increment

12

4/12

NOT9915

Path analysis

13

4/19

Hamilton,

Ch. 8

Aldrich and Nelson

Logit and probit

14

4/26

Hamilton,

Ch. 9

Kline

Factor/principal components analysis