Data analysis

Statistical Test : ANOVA, x2, r

keepgroovin' 2016. 3. 19. 16:13


1. P-value
- 1종 오류 확률 : the number of times out of 100 we would be wrong in rejecting the null hypothesis

- 통계적 추정에 대한 설명은 잘 설명된 아래 글 참조
http://m.blog.naver.com/hyear1004/220093860974


2. Statistical Test
- Bivariate statistical tool
1) ANOVA (F-Test) : analysis of Variance
2) X2 : Chi-Square test of indepence
3) r : correlation coefficience

- x와 Y의 변수에 따라




3. ANOVA F-test
- Are the difference
among sample means
due to true differences among the population means
or merely due to sampling variabiility?

- eg) 네 가지 전공과 전공별 frustration
Ho : (x와 y는 관련 없다) u1=u2=u3=u4
Ha : Not all the u are equal

F = variation among sample means / variation within groups

- In ANOVA, if the estimate of between-groups variance(그룹간) is about the same as the estimation of within-groups(그룹내) variance, then any difference between sample means is probably due to random sampling error

4. Pearson CorrelationCoefficient (r)

- H0 : Rho(r) = 0

PROC CORR;
VAR urban.rate income.per.person internet.use.rate ;
Run;

(해석) 변수간 p-value <.0001 로 나와서 유의
r이 1에 가까울수록 강한 상관 관계


- ​r-​square : the fraction of the variability of one variable that can be predicted by the other

e.g.) urban.rate와 internet.use.rate의 r = 0.61 -> r-square = 0.37
우리가 urban rate 로 인터넷 사용율을 37%의 변동성으로 예측한다, 나머지 63%의 변동성은 도시화율로 설명되지 않는다