> norm.sample <- rnorm(1000, mean=5.0, sd=1.0) > sum((norm.sample - mean(norm.sample))^2/(length(norm.sample)))
R言語では分散の計算には var が準備されており、不偏分散を算出する。 不偏分散については、それらの平均を母集団すべてについて計算すると母分散と一致する性質を持つ。
> var(norm.sample)
2変数の直線的な共変動を数値で代表させ、2変数の関係の強さの指標となる。すなわち2変数に正の相関があれば共分散は正の値をとり、逆に負の相関があれば共分散も負の値となる。
なお共分散にも不偏共分散があり、分散と同等の不偏性を持つ。> cov(norm.sample)
var に引数として行列を与えれば共分散行列が計算される。
> matrix.sample <- matrix(c(
9,8,6,8,
10,7,8,10,
5,5,6,6,
8,9,7,8,
7,7,7,8,
10,8,8,8,
8,6,7,6,
7,7,6,7,
9,8,8,9,
10,9,9,8,
4,3,4,3,
10,10,10,10,
9,9,7,7,
6,7,6,9,
10,9,8,9,
9,9,10,8,
7,8,6,7,
7,7,8,7,
4,6,5,4,
6,6,7,7), ncol=4, byrow=T)
> var(matrix.sample)
[,1] [,2] [,3] [,4]
[1,] 3.986842 2.684211 2.407895 2.750000
[2,] 2.684211 2.778947 1.884211 2.178947
[3,] 2.407895 1.884211 2.344737 1.928947
[4,] 2.750000 2.178947 1.928947 3.102632
> norm.sample <- rnorm(1000, mean=5.0, sd=1.0) > sd(norm.sample)
> norm.sample <- rnorm(1000, mean=5.0, sd=1.0) > sqrt((var(norm.sample)^2)/length(norm.sample)) # あるいは sd(norm.sample) / sqrt(length(norm.sample))
> norm.pop <- rnorm(200, mean=5.0, sd=1.0) > hist(norm.pop) > dev.print(png, file="norm-pop.png", width=480, height=480)
> control.sample <- sample(norm.pop,10) > spaghetti.sample <- sample(norm.pop,10) > steak.sample <- sample(norm.pop,10) > fruit.sample <- sample(norm.pop,10)
> temp <- matrix(nrow=40, ncol=2)
> temp.dataframe <- data.frame(temp[,1],temp[,2])
> names(temp.dataframe) <- c("diet","cardiac.index")
> temp.dataframe$diet <- append(rep("control",10),append(rep("spaghetti",10),append(rep("steak",10),rep("fruit",10))))
> temp.dataframe$cardiac.index <- append(control.sample,append(spaghetti.sample,append(steak.sample,fruit.sample)))
> temp.dataframe$diet <- factor(temp.dataframe$diet)
> anova(lm(temp.dataframe$cardiac.index~temp.dataframe$diet))
Analysis of Variance Table
Response: temp.dataframe$cardiac.index
Df Sum Sq Mean Sq F value Pr(>F)
temp.dataframe$diet 3 0.929 0.310 0.3221 0.8093
Residuals 36 34.611 0.961
> var.between <- standard.error(c(control.sample,spaghetti.sample,steak.sample,fruit.sample)) * (40-1) > var.within <- var(control.sample) + var(spaghetti.sample) + var(steak.sample) + var(fruit.sample) / (4-1)
a は水準数、Nは全体の標本数とすると、分散分析表は次のようになる。
| 要因 | 平方和 | 自由度 | 平均平方和 | F値 |
|---|---|---|---|---|
| A(因子) | SSa | Va=a-1 | Va = SSa/Va | F=Va/Ve |
| E(誤差) | SSe | Ve=N-a | Ve = SSe/Ve | |
| 全体 | SS | v=N-1 |
a は水準数、Nは全体の標本数とすると、分散分析表は次のようになる。
| 要因 | 平方和 | 自由度 | 平均平方和 | F値 |
|---|---|---|---|---|
| A(因子) | SSa | Va=a-1 | Va = SSa/Va | Fa=Va/Ve |
| B(ブロック) | SSb | Va=b-1 | Vb = SSb/Vb | Fb=Vb/Ve |
| E(誤差) | SSe | Ve=(a-1)(b-1) | Ve = SSe/Ve | |
| 全体 | SS | v=N-1 |
| 心筋梗塞 | 対照群 | 合計 | |
|---|---|---|---|
| 経口避妊薬暴露 | 10 | 5 | 15 |
| 非暴露 | 10 | 35 | 45 |
| 合計 | 20 | 40 | 60 |
| 心筋梗塞 | 対照群 | 合計 | |
|---|---|---|---|
| 経口避妊薬暴露 | 5 | 10 | 15 |
| 非暴露 | 15 | 30 | 45 |
| 合計 | 20 | 40 | 60 |
> data(Cancer) > attach(Cancer)
> pairs(Cancer, gap=0, cex.labels=0.9) > dev.print(png, file="cancer-scatterd-plot.png", width=480, height=480)
> cor(Cancer)
calorie meat milk liquor colon.cancer
calorie 1.0000000 0.7730921 0.7154085 0.6341641 0.7454581
meat 0.7730921 1.0000000 0.7227997 0.4515933 0.8541583
milk 0.7154085 0.7227997 1.0000000 0.2321334 0.5780637
liquor 0.6341641 0.4515933 0.2321334 1.0000000 0.5898522
colon.cancer 0.7454581 0.8541583 0.5780637 0.5898522 1.0000000
rectal.cancer 0.8101346 0.6718607 0.4785180 0.6095649 0.7553246
rectal.cancer
calorie 0.8101346
meat 0.6718607
milk 0.4785180
liquor 0.6095649
colon.cancer 0.7553246
rectal.cancer 1.0000000
> cor(cbind(calorie,milk,rectal.cancer))
calorie milk rectal.cancer
calorie 1.0000000 0.7154085 0.8101346
milk 0.7154085 1.0000000 0.4785180
rectal.cancer 0.8101346 0.4785180 1.0000000
> summary(lm(rectal.cancer ~ calorie + milk))
Call:
lm(formula = rectal.cancer ~ calorie + milk)
Residuals:
Min 1Q Median 3Q Max
-4.63258 -1.08817 -0.02667 0.95456 3.73896
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.097e+01 1.831e+00 -5.993 3.46e-07 \*\*\*
calorie 5.938e-03 7.595e-04 7.817 7.39e-10 \*\*\*
milk -4.028e-03 2.385e-03 -1.689 0.0983 .
Signif. codes: 0 `\*\*\*' 0.001 `\*\*' 0.01 `\*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.82 on 44 degrees of freedom
Multiple R-Squared: 0.6772, Adjusted R-squared: 0.6626
F-statistic: 46.16 on 2 and 44 DF, p-value: 1.568e-11
> standard.rectal.cancer <- standardize(rectal.cancer) > standard.calorie <- standardize(calorie) > standard.milk <- standardize(milk)
> summary(lm(standard.rectal.cancer ~ standard.calorie + standard.milk))
Call:
lm(formula = standard.rectal.cancer ~ standard.calorie + standard.milk)
Residuals:
Min 1Q Median 3Q Max
-1.47860 -0.34732 -0.00851 0.30467 1.19337
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.842e-16 8.473e-02 3.35e-15 1.0000
standard.calorie 9.582e-01 1.226e-01 7.817 7.4e-10 \*\*\*
standard.milk -2.070e-01 1.226e-01 -1.689 0.0983 .
Signif. codes: 0 `\*\*\*' 0.001 `\*\*' 0.01 `\*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.5809 on 44 degrees of freedom
Multiple R-Squared: 0.6772, Adjusted R-squared: 0.6626
F-statistic: 46.16 on 2 and 44 DF, p-value: 1.568e-11
3次元散布図を描くには scatterplot3d パッケージをインストールしておく。
> library(scatterplot3d) > attach(Cancer) > scatterplot3d(calorie,liquor,rectal.cancer, angle=24)
xgobiを使うと3次元散布図を立体的に見ることができる。事前に xgobi パッケージをインストールしておく。
> library(xgobi) > xgobi(cbind(calorie,liquor,rectal.cancer))
R言語では factanal パッケージを用いる。
> library(stats)
> factanal(cbind(calorie,milk,meat,liquor,rectal.cancer,colon.cancer), factors = 2)
Call:
factanal(x = cbind(calorie, milk, meat, liquor, rectal.cancer, colon.cancer), factors = 2)
Uniquenesses:
calorie milk meat liquor rectal.cancer
0.180 0.472 0.005 0.477 0.186
colon.cancer
0.196
Loadings:
Factor1 Factor2
calorie 0.567 0.706
milk 0.664 0.294
meat 0.936 0.344
liquor 0.230 0.685
rectal.cancer 0.425 0.795
colon.cancer 0.710 0.547
Factor1 Factor2
SS loadings 2.378 2.105
Proportion Var 0.396 0.351
Cumulative Var 0.396 0.747
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 21.44 on 4 degrees of freedom.
The p-value is 0.000259
> Bone <- read.table("data/ch17.dat",header=T)
> attach(Bone)
> factanal(cbind(BA1,BA2,BA3,BC1,BC2,BC3,BMD1,BMD2,BMD3,BA,BC), factors=2)
Call:
factanal(x = cbind(BA1, BA2, BA3, BC1, BC2, BC3, BMD1, BMD2, BMD3, BA, BC), factors = 2)
Uniquenesses:
BA1 BA2 BA3 BC1 BC2 BC3 BMD1 BMD2 BMD3 BA BC
0.007 0.051 0.121 0.005 0.030 0.094 0.005 0.005 0.029 0.011 0.010
Loadings:
Factor1 Factor2
BA1 0.996
BA2 0.973
BA3 0.932
BC1 0.763 0.644
BC2 0.753 0.634
BC3 0.680 0.666
BMD1 0.997
BMD2 0.997
BMD3 0.983
BA 0.994
BC 0.766 0.635
Factor1 Factor2
SS loadings 6.003 4.632
Proportion Var 0.546 0.421
Cumulative Var 0.546 0.967
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 777.43 on 34 degrees of freedom.
The p-value is 2.07e-141
| 構造変数 | 構造変数 | 誤差変数 | 誤差変数 | |
|---|---|---|---|---|
| 内生変数 | 外生変数 | 内生変数 | 外生変数 | |
| 観測変数 | x | (x)* | - | - |
| 潜在変数 | eta | xi | - | epsilon,zeta |
| 測定方程式 | 構造方程式 | |
|---|---|---|
| 外生変数からの因果係数 | λ | γ |
| 内生変数からの因果係数 | κ | β |
R言語では semパッケージを用いる。
digraph "cancer.path" {
rankdir=LR;
size="8,8";
node [fontname="Helvetica" fontsize=14 shape=box];
edge [fontname="Helvetica" fontsize=10];
center=1;
{rank=min "RIQ" "RSES" "RParAsp" "FParAsp" "FSES" "FIQ"}
{rank=max "ROccAsp" "REdAsp" "FEdAsp" "FOccAsp"}
"Western.food" [shape=ellipse]
"Cancer.trait" [shape=ellipse]
"Western.food" -> "calorie" [label="lam11"];
"Western.food" -> "meat" [label="lam21"];
"Western.food" -> "milk" [label="lam31"];
"Western.food" -> "liquor" [label="gam41"];
"RSES" -> "RGenAsp" [label="gam13"];
"FSES" -> "RGenAsp" [label="gam14"];
"RSES" -> "FGenAsp" [label="gam23"];
"FSES" -> "FGenAsp" [label="gam24"];
"FIQ" -> "FGenAsp" [label="gam25"];
"FParAsp" -> "FGenAsp" [label="gam26"];
"FGenAsp" -> "RGenAsp" [label="beta12"];
"RGenAsp" -> "FGenAsp" [label="beta21"];
"Western.food" -> "Cancer.trait" [label=""];
}
% dot -Tpng cancer-path.dot > cancer-path.png % eeyes sem.dhp.png
> data(Cancer) > attach(Cancer) > library(sem)
> cor.cancer
calorie meat milk liquor colon.cancer
calorie 1.0000000 0.7730921 0.7154085 0.6341641 0.7454581
meat 0.7730921 1.0000000 0.7227997 0.4515933 0.8541583
milk 0.7154085 0.7227997 1.0000000 0.2321334 0.5780637
liquor 0.6341641 0.4515933 0.2321334 1.0000000 0.5898522
colon.cancer 0.7454581 0.8541583 0.5780637 0.5898522 1.0000000
rectal.cancer 0.8101346 0.6718607 0.4785180 0.6095649 0.7553246
rectal.cancer
calorie 0.8101346
meat 0.6718607
milk 0.4785180
liquor 0.6095649
colon.cancer 0.7553246
rectal.cancer 1.0000000
cor.half.cancer <- matrix(c(
1.0000000, 0, 0, 0, 0, 0,
0.7730921, 1.0000000, 0, 0, 0, 0,
0.7154085, 0.7227997, 1.0000000, 0, 0, 0,
0.6341641, 0.4515933, 0.2321334, 1.0000000, 0, 0,
0.7454581, 0.8541583, 0.5780637, 0.5898522, 1.0000000, 0,
0.8101346, 0.6718607, 0.4785180, 0.6095649, 0.7553246, 1.0000000), ncol=6, byrow=T)
> rownames(cor.half.cancer) <- colnames(cor.half.cancer) <- c('calorie', 'meat', 'milk', 'liquor',
'colon.cancer', 'rectal.cancer')
> ram.cancer <- matrix(c( 'Western.food -> calorie', 'lam11', NA, 'Western.food -> meat', 'lam21', NA, 'Western.food -> milk', 'lam31', NA, 'Western.food -> liquor', 'lam41', NA, 'Western.food -> Cancer.trait', 'beta1', NA, 'Cancer.trait -> rectal.cancer', 'lam12', NA, 'Cancer.trait -> colon.cancer', 'lam22', NA), ncol=3, byrow=T) > ram.cancer <- matrix(c( 'calorie -> Western.food', 'lam11', NA, 'meat -> Western.food', 'lam21', NA, 'milk -> Western.food', 'lam31', NA, 'liquor -> Western.food', 'lam41', NA, 'Western.food -> Cancer.trait', 'beta1', NA, 'Cancer.trait -> rectal.cancer', 'lam12', NA, 'Cancer.trait -> colon.cancer', 'lam22', NA), ncol=3, byrow=T)
> sem.cancer <- sem(ram.cancer, cor.half.cancer, N=47, fixed.x=c('rectal.cancer', 'colon.cancer'))
> path.diagram(sem.cancer, min.rank='RIQ, RSES, RParAsp, FParAsp, FSES, FIQ', max.rank='ROccAsp, REdAsp, FEdAsp, FOccAsp')
model.cancer<-matrix(c( 'Cancer.trait -> rectal.cancer' ,'lam12' ,NA, 'Cancer.trait -> colon.cancer' ,'lam22' ,NA, 'Western.food -> calorie' ,'lam11' ,NA, 'Western.food -> meat' ,'lam21' ,NA, 'Western.food -> milk' ,'lam31' ,NA, 'Western.food -> liquor' ,'lam41' ,NA, 'Western.food -> Cancer.trait' ,'beta' ,NA, ),ncol=3,byrow=T)
| year | c | p | wp | i | k.lag | x | wg | g | t |
|---|---|---|---|---|---|---|---|---|---|
| 年代 | 消費 | 個人収益 | 個人賃金 | 投資 | 資本? | 均衡需要? | 政府収益 | ? | 税金 |
> library(sem) > data(Klein) > attach(Klein)
> p.lag <- c(NA, p[-length(p)]) # データを見ればわかるが、p欄の値を1年ずつずらしている。 > x.lag <- c(NA, x[-length(p)]) > a <- year - 1931 > cbind(year, a, p, p.lag, x, x.lag)
> eqn.1 <- tsls(c ~ p + p.lag + I(wp + wg), instruments= ~ g + t + wg + a + p.lag + k.lag + x.lag) > summary(eqn.1) > eqn.2 <- tsls(i ~ p + p.lag + k.lag, instruments= ~ g + t + wg + a + p.lag + k.lag + x.lag) > summary(eqn.2) > eqn.3 <- tsls(wp ~ x + x.lag + a, instruments= ~ g + t + wg + a + p.lag + k.lag + x.lag) > summary(eqn.3)
# The Duncan, Haller, and Portes Peer-Influences Model
> R.DHP <- matrix(c(
1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
.6247, 1, 0, 0, 0, 0, 0, 0, 0, 0,
.3269, .3669, 1, 0, 0, 0, 0, 0, 0, 0,
.4216, .3275, .6404, 1, 0, 0, 0, 0, 0, 0,
.2137, .2742, .1124, .0839, 1, 0, 0, 0, 0, 0,
.4105, .4043, .2903, .2598, .1839, 1, 0, 0, 0, 0,
.3240, .4047, .3054, .2786, .0489, .2220, 1, 0, 0, 0,
.2930, .2407, .4105, .3607, .0186, .1861, .2707, 1, 0, 0,
.2995, .2863, .5191, .5007, .0782, .3355, .2302, .2950, 1, 0,
.0760, .0702, .2784, .1988, .1147, .1021, .0931, -.0438, .2087, 1
), ncol=10, byrow=TRUE)
> model.dhp <- matrix(c(
'RParAsp -> RGenAsp', 'gam11', NA,
'RIQ -> RGenAsp', 'gam12', NA,
'RSES -> RGenAsp', 'gam13', NA,
'FSES -> RGenAsp', 'gam14', NA,
'RSES -> FGenAsp', 'gam23', NA,
'FSES -> FGenAsp', 'gam24', NA,
'FIQ -> FGenAsp', 'gam25', NA,
'FParAsp -> FGenAsp', 'gam26', NA,
'FGenAsp -> RGenAsp', 'beta12', NA,
'RGenAsp -> FGenAsp', 'beta21', NA,
'RGenAsp -> ROccAsp', NA, 1,
'RGenAsp -> REdAsp', 'lam21', NA,
'FGenAsp -> FOccAsp', NA, 1,
'FGenAsp -> FEdAsp', 'lam42', NA,
'RGenAsp <-> RGenAsp', 'ps11', NA,
'FGenAsp <-> FGenAsp', 'ps22', NA,
'RGenAsp <-> FGenAsp', 'ps12', NA,
'ROccAsp <-> ROccAsp', 'theta1', NA,
'REdAsp <-> REdAsp', 'theta2', NA,
'FOccAsp <-> FOccAsp', 'theta3', NA,
'FEdAsp <-> FEdAsp', 'theta4', NA),
ncol=3, byrow=TRUE)
> rownames(R.DHP) <- colnames(R.DHP) <- c('ROccAsp', 'REdAsp', 'FOccAsp', 'FEdAsp', 'RParAsp', 'RIQ', 'RSES', 'FSES', 'FIQ', 'FParAsp')
> sem.dhp <- sem(model.dhp, R.DHP, 329, fixed.x=c('RParAsp', 'RIQ', 'RSES', 'FSES', 'FIQ', 'FParAsp'))
> path.diagram(sem.dhp, min.rank='RIQ, RSES, RParAsp, FParAsp, FSES, FIQ', max.rank='ROccAsp, REdAsp, FEdAsp, FOccAsp')
digraph "sem.dhp" {
rankdir=LR;
size="8,8";
node [fontname="Helvetica" fontsize=14 shape=box];
edge [fontname="Helvetica" fontsize=10];
center=1;
{rank=min "RIQ" "RSES" "RParAsp" "FParAsp" "FSES" "FIQ"}
{rank=max "ROccAsp" "REdAsp" "FEdAsp" "FOccAsp"}
"RGenAsp" [shape=ellipse]
"FGenAsp" [shape=ellipse]
"RParAsp" -> "RGenAsp" [label="gam11"];
"RIQ" -> "RGenAsp" [label="gam12"];
"RSES" -> "RGenAsp" [label="gam13"];
"FSES" -> "RGenAsp" [label="gam14"];
"RSES" -> "FGenAsp" [label="gam23"];
"FSES" -> "FGenAsp" [label="gam24"];
"FIQ" -> "FGenAsp" [label="gam25"];
"FParAsp" -> "FGenAsp" [label="gam26"];
"FGenAsp" -> "RGenAsp" [label="beta12"];
"RGenAsp" -> "FGenAsp" [label="beta21"];
"RGenAsp" -> "ROccAsp" [label=""];
"RGenAsp" -> "REdAsp" [label="lam21"];
"FGenAsp" -> "FOccAsp" [label=""];
"FGenAsp" -> "FEdAsp" [label="lam42"];
}
% dot -Tpng sem.dhp.dot > sem.dhp.png % eeyes sem.dhp.png
nnetライブラリのmultinomを用いるのが簡便である。
Nを項目の数、rを項目間の相関係数の平均とすると次のように計算される。
> data(cronbach) > N <- length(cronbach)
write.table 関数を用いる。
> cov.matrix <- var(matrix.sample) > write.table(cov.matrix, file = "/tmp/covmat.txt")
"X1" "X2" "X3" "X4" "1" 3.98684210526316 2.68421052631579 2.40789473684211 2.75 "2" 2.68421052631579 2.77894736842105 1.88421052631579 2.17894736842105 "3" 2.40789473684211 1.88421052631579 2.34473684210526 1.92894736842105 "4" 2.75 2.17894736842105 1.92894736842105 3.10263157894737
> write.table(cov.matrix, file = "/tmp/covmat.txt", quote = FALSE)
X1 X2 X3 X4 1 3.98684210526316 2.68421052631579 2.40789473684211 2.75 2 2.68421052631579 2.77894736842105 1.88421052631579 2.17894736842105 3 2.40789473684211 1.88421052631579 2.34473684210526 1.92894736842105 4 2.75 2.17894736842105 1.92894736842105 3.10263157894737