standardizing-raw-data-WEEK-4-DISCUSSION-AREAS-UNDER-THE-CURVE

WEEK 4 DISCUSSION: AREAS UNDER THE CURVE

Letâ€™s pull things together. So far, for data sets, you have calculated MEANS, VARIANCES and STANDARD DEVIATIONS as well as QUARTILES and used those to determine if any data re â€œUNUSUALâ€. These would be data that are more than 2 standard deviations above or below the mean OR data that are more than 1.5 x IQR above and below the mean.

You have calculated FREQUENCY/RELATIVE FREQUENCY/CUMULATIVE RELATIVE FREQUENCY TABLES and with those you can determine how much of your data (or the probability that your data) are above or below a certain range (e.g., 21-30 or 51-60, etc.)

Currently, we are STANDARDIZING our raw data (x-values) to get z-values that are the number of standard deviations above or below the standardized mean of zero (the NORMAL DISTRIBUTION). We can use these z-values in our TABLE to determine the probability of data being BELOW (always to the LEFT) of a specific data value. Then, by subtracting that probability from 1.0000, we get the probability of data being ABOVE (to the right) of that data value. We can then see if any of the extreme data points (low end or high end) have a probability (from the Table) greater than or less than one of our â€œcriticalâ€ +z-values (1%, 5% or 10%) which would make that data value â€œUNUSUALâ€.

SO, SHOW WHAT YOU KNOW. Do the required calculations (SOFTWARE is fine for the calculation BUT USE THE ACTUAL TABLES FOR THE Z-VALUE PROBABILITIES) and fill in the Tables below.

1) Write down 11 numbers between 1 and 100. These can be whole numbers but we will assume this is CONTINUOUS data (not DISCRETE). Rank order them.

 1 2 3 4 5 6 7 8 9 10 | 11 x-values |

MEAN: ____, VARIANCE:____, STD DEV:____, Q1____, Q2____, Q3____, IQR____

2) Use those statistics to determine if any of your data values are â€œUNUSUALâ€.

(a) Mean + 2 standard deviations = ____ and ___ . Unusual data values?____________

(b) Mean + 1.5 * IQR = _____ and _____. Unusual data values? _______________

3) Fill in this Frequency Table by putting you data points into the ranges given.

 RANGE FREQUENCY RELATIVE FREQ. CUMULATIVE RELATIVE FREQ 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 (TOTAL MUST EQUAL 1.0 OR BE VERY CLOSE)

Using the above table:

(a) What percent of your data values are at or below 50: _______.

(b) What percent of your data are at or below between 61: ______ and at or below 90 : _______

(c) So, what percent are between 61 and 90? ______________

4) Lastly, letâ€™s STANDARDIZE (z-values) the data (x-values)

 1 2 3 4 5 6 7 8 9 10 | 11 x-values | z-values | Probability to LEFT* | | Probability to RIGHT* | |

* From the z- TABLES (NOT SOFTWARE) determine the area to the LEFT of each standardized x-value. This is the PROBABILITY that our data is less than or equal to that data point. Subtract that area from 1.0000 to get the probability that our data are greater than that data points. Obviously, these two probabilities MUST add up to 1.0000 or 100% which accounts for all of our data.

OK, letâ€™s see how probabilities determined from the z-values compare to those determined from the Frequency tables. We have the percent of data at or below 50 and the percent of data between 61 and 90

(a) STANDARDIZE â€œ50, 61 and 90 â€ using your data setâ€™s statistics (i.e., mean and SD)

 x-values 50 61 90 z-values PROBABILITY* from FREQUENCY TABLE

* From the z-Table these are the areas (probabilities) to the left of these data points.

(b) Subtracting the area (probability) to the left of 61 from the probability to the left of 90 gives us the probability of data being between 61 and 90. How do these probabilities compare to the Cumulative Relative Frequencies? _____________________

FOR REVIEWS: I REALIZE THAT YOU MAY NOT BE 100% SURE OF YOUR OWN CALCULATIONS HENCE NOT CONFIDENT IN PROVIDING GUIDA