Comparative Statistics
The main goal of this report is to identify whether the crime statistics of Texas is related to the crime statistics of the entire United States. Further, this also aims to determine the difference of the crime statistics of both data. With this, the statistical method to be used is Paired Sample T-Test and Correlational Statistics. The strength of the linear association between two variables is quantified by the correlation coefficient.
Given a set of observations (x1, y1), (x2,y2),…(xn,yn), the formula for computing the correlation coefficient is given by:
Where:
= Correlation between X and Y
= Sum of Variable X
= Sum of Variable Y
= Sum of the product X and Y
N= Number of Cases
= Sum of squared X score
= Sum of squared Y score
Furthermore, the correlation coefficient always takes a value between -1 and 1, with 1 or -1 indicating perfect correlation (all points would lie along a straight line in this case). A positive correlation indicates a positive association between the variables (increasing values in one variable correspond to increasing values in the other variable), while a negative correlation indicates a negative association between the variables (increasing values is one variable correspond to decreasing values in the other variable). A correlation value close to 0 indicates no association between the variables.
Since the formula for calculating the correlation coefficient standardizes the variables, changes in scale or units of measurement will not affect its value. For this reason, the correlation coefficient is often more useful than a graphical depiction in determining the strength of the association between two variables.
In addition, if the correlation index of the computed rxy is not perfect, then it is suggested to use the following categorization (Guilford, J.P. and B. Fruchter, 1973):
rxy Indication
between ± 0.80 to ± 1.00 : High Correlation
between ± 0.60 to ± 0.79 : Moderately High Correlation
between ± 0.40 to ± 0.59 : Moderate Correlation
between ± 0.20 to ± 0.39 : Low Correlation
between ± 0.01 to ± 0.19 : Negligible Correlation
The Paired Samples T-Test is used to compare the means of two variables (crime statistics of Texas and United States). In addition, this also calculates the difference between these two variables for each case, and evaluates to see if the average difference is significantly different from zero.
The following table shows the Crime Statistics of Texas and United States in 2005.
Table 1
Crime Statistics 2005
Texas
United States
Population
22,859,968
296,410,404
Index
1,111,384
11,556,854
Violent
121,091
1,390,695
Property
990,293
10,166,159
Murder
1,407
16,692
Forcible Rape
8,511
93,934
Robbery
35,790
417,122
Aggravated Assault
75,383
862,947
Burglary
219,828
2,154,126
Larceny-Theft
677,042
6,776,807
Vehicle Theft
93,423
1,235,226
Table 2
Data Analysis
Statistic
Texas Crime Statistics
United States
Mean
246974.222222
2.56819e+6
Variance
1.07650e+11
1.09937e+13
Standard Error
328101.080628
3.31568e+6
Correlation
0.999430
T-Test
78.312437
Critical 2-sided T-value (5%)
2.365000
2-sided p-value
0.000000
Critical 1-sided T-value (5%)
1.895000
1-sided p-value
0.000000
Degrees of Freedom
7
Observations
9
Figure 1
The correlations table displays Pearson correlation coefficients, significance values, and the number of cases with non-missing values. Pearson correlation coefficients assume the data are normally distributed. The Pearson correlation coefficient is a measure of linear association between two variables.
Basically, the values of the correlation coefficient range from -1 to 1. The sign of the correlation coefficient indicates the direction of the relationship (positive or negative). The absolute value of the correlation coefficient indicates the strength, with larger absolute values indicating stronger relationships. The correlation coefficients on the main diagonal are always 1.0, because each variable has a perfect positive linear relationship with itself. Correlations above the main diagonal are a mirror image of those below.
Analysis shows that the crime statistics of Texas and United States has a strong positive correlation. This means that the crime rate in Texas affects the crime rate of the entire United States. For the paired Samples t-test, the t, degrees of freedom, and significance of the data are computed.
Correlation
0.999430
Critical 2-sided T-value (5%)
2.365000
2-sided p-value
0.000000
Critical 1-sided T-value (5%)
1.895000
1-sided p-value
0.000000
Degrees of Freedom
7
The T value = 2.365
We have 7 degrees of freedom
Significance is 0.999430
The significance of each correlation coefficient is also displayed in the correlation table. The significance level (or p-value) is the probability of obtaining results as extreme as the one observed. If the significance level is very small (less than 0.05) then the correlation is significant and the two variables are linearly related. If the significance level is relatively large, for example 0.50, then the correlation is not significant, and the two variables are not linearly related.
The correlation coefficient for Texas (independent) and United States (dependent) is 0.999. The significance level or p-value is 0.000 which indicates a very low significance. The law significance level indicates that crime statistics of Texas (independent) and crime statistics of United States (dependent) are significantly positively correlated.
Credit:ivythesis.typepad.com
0 comments:
Post a Comment