本文讲解coplot的使用, 这个函数可以用于绘制多元图, 例如按省份调查身高和体重.
省份是factor, z
身高是x
体重是y
对应用法 coplot(y ~ x | z)
数据例如 :
z <- factor(rep('浙江', 6), rep('江西', 6), rep('北京', 6), rep('广州', 5) )
表示浙江,江西,北京有6组数据, 广州有5组数据. 一共23组数据
对应数据为
y <- c(55, 65, 73, 63, 58, 66, ...........)
x <- c(160, 165, 170, 171, 158, 167, ......)
xy的前6个point对应浙江的数据, xy的接下来的6个point对应江西的采样数据, 以此类推.
绘制4幅图.
测试 :
绘制三幅图. 每幅图对应的点数和factor对应的level内的元素个数一致.
每幅图的点数, 对应factor的每个段的元素个数, 这里是abc各3,5,2个点.
> set.seed(1 ); x <- sample(c(1:50),10)
> set.seed(2); y <- sample(c(1:50),10)
> f <- as.factor(c(rep('a',3),rep('b',5),rep('c',2)))
> x
[1] 14 19 28 43 10 41 42 29 27 3
> y
[1] 10 35 28 8 44 43 6 36 20 23
> f
[1] a a a b b b b b c c
Levels: a b c
> coplot(y ~ x | f)
图的顺序:
y是因变量, 把上图顺时针旋转90度后, 数据分段描述顺序为 左上, 左下, 右上.
如果不旋转来看就是从下往上, 从左往右.
因此 :
下左图对应数据 :
a,a,a
10,14; 35,19; 28,28;
下右对应数据 :
b,b,b,b,b
8,43; 44,10; 43,41; 6,42; 36,29;
上左对应数据 :
c,c
20,27; 23,3;
当factor有4个水平(level)时, 绘制4幅图 :
每幅图的点数, 对应factor的每个段的元素个数, 这里是abcd各3,5,1,1个.
> f <- as.factor(c(rep('a',3),rep('b',5),rep('c',1),rep('d',1)))
> f
[1] a a a b b b b b c d
Levels: a b c d
> coplot(y ~ x | f)
图的顺序:
y是因变量, 把上图顺时针旋转90度后, 数据描述顺序为 左上, 左下, 右上, 右下.
如果f是一个向量的话, coplot对这个向量自动分段, 然后绘制x,y的二元图.
按照每个分段内的元素个数, 提取对应个数的xy数据点.
例如 :
> x <- 1:12
> y <- 100:111
> z <- 300:311
> x
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> y
[1] 100 101 102 103 104 105 106 107 108 109 110 111
> z
[1] 300 301 302 303 304 305 306 307 308 309 310 311
> coplot(y ~ x | z)
自动将z分成6段, 注意分段有重叠部分, 从z分段来看包含几个点.
第1段包含3个值, 300,301,302, 对应xy的第1,2,3组值.
第2段包含3个值, 302,303,304, 对应xy的第3,4,5组值.
第3段包含4个值, 303,304,305,306, 对应xy的第4,5,6,7组值.
第4段包含4个值, 305,306,307,308, 对应xy的第6,7,8,9组值.
第5段包含3个值, 307,308,309, 对应xy的第8,9,10组值.
第6段包含3个值, 309,310,311, 对应xy的第10,11,12组值.
相对pairs()函数只能显示双向关系,coplot()函数能够说明三向甚至四向关系,它特别于适合观察当给定其他预测变量时,反应变量如何根据一个预测变量变化。
## 单个条件变量的条件图
> mtcars$wt
[1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070
[13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840
[25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780
> mtcars$mpg
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
> as.factor(mtcars$cyl)
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Levels: 4 6 8
coplot(mtcars$wt ~ mtcars$mpg | as.factor(mtcars$cyl), main="", xlab="", ylab="", pch=19)
## 两个条件变量的条件图
> as.factor(mtcars$vs)
[1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1
Levels: 0 1
coplot(mtcars$wt ~ mtcars$mpg | as.factor(mtcars$cyl) * as.factor(mtcars$vs), main="", xlab="", ylab="", pch=19)
[参考]
1. help(coplot)
coplot package:graphics R Documentation
Conditioning Plots
Description:
This function produces two variants of the *co*nditioning plots
discussed in the reference below.
Usage:
coplot(formula, data, given.values, panel = points, rows, columns,
show.given = TRUE, col = par("fg"), pch = par("pch"),
bar.bg = c(num = gray(0.8), fac = gray(0.95)),
xlab = c(x.name, paste("Given :", a.name)),
ylab = c(y.name, paste("Given :", b.name)),
subscripts = FALSE,
axlabels = function(f) abbreviate(levels(f)),
number = 6, overlap = 0.5, xlim, ylim, ...)
co.intervals(x, number = 6, overlap = 0.5)
Arguments:
formula: a formula describing the form of conditioning plot. A
formula of the form ‘y ~ x | a’ indicates that plots of ‘y’
versus ‘x’ should be produced conditional on the variable
‘a’. A formula of the form ‘y ~ x| a * b’ indicates that
plots of ‘y’ versus ‘x’ should be produced conditional on the
two variables ‘a’ and ‘b’.
All three or four variables may be either numeric or factors.
When ‘x’ or ‘y’ are factors, the result is almost as if
‘as.numeric()’ was applied, whereas for factor ‘a’ or ‘b’,
the conditioning (and its graphics if ‘show.given’ is true)
are adapted.
data: a data frame containing values for any variables in the
formula. By default the environment where ‘coplot’ was
called from is used.
given.values: a value or list of two values which determine how the
conditioning on ‘a’ and ‘b’ is to take place.
When there is no ‘b’ (i.e., conditioning only on ‘a’),
usually this is a matrix with two columns each row of which
gives an interval, to be conditioned on, but is can also be a
single vector of numbers or a set of factor levels (if the
variable being conditioned on is a factor). In this case (no
‘b’), the result of ‘co.intervals’ can be used directly as
‘given.values’ argument.
panel: a ‘function(x, y, col, pch, ...)’ which gives the action to
be carried out in each panel of the display. The default is
‘points’.
rows: the panels of the plot are laid out in a ‘rows’ by ‘columns’
array. ‘rows’ gives the number of rows in the array.
columns: the number of columns in the panel layout array.
show.given: logical (possibly of length 2 for 2 conditioning
variables): should conditioning plots be shown for the
corresponding conditioning variables (default ‘TRUE’).
col: a vector of colors to be used to plot the points. If too
short, the values are recycled.
pch: a vector of plotting symbols or characters. If too short,
the values are recycled.
bar.bg: a named vector with components ‘"num"’ and ‘"fac"’ giving the
background colors for the (shingle) bars, for *num*eric and
*fac*tor conditioning variables respectively.
xlab: character; labels to use for the x axis and the first
conditioning variable. If only one label is given, it is
used for the x axis and the default label is used for the
conditioning variable.
ylab: character; labels to use for the y axis and any second
conditioning variable.
subscripts: logical: if true the panel function is given an additional
(third) argument ‘subscripts’ giving the subscripts of the
data passed to that panel.
axlabels: function for creating axis (tick) labels when x or y are
factors.
number: integer; the number of conditioning intervals, for a and b,
possibly of length 2. It is only used if the corresponding
conditioning variable is not a ‘factor’.
overlap: numeric < 1; the fraction of overlap of the conditioning
variables, possibly of length 2 for x and y direction. When
overlap < 0, there will be _gaps_ between the data slices.
xlim: the range for the x axis.
ylim: the range for the y axis.
...: additional arguments to the panel function.
x: a numeric vector.
Details:
In the case of a single conditioning variable ‘a’, when both
‘rows’ and ‘columns’ are unspecified, a ‘close to square’ layout
is chosen with ‘columns >= rows’.
In the case of multiple ‘rows’, the _order_ of the panel plots is
from the bottom and from the left (corresponding to increasing
‘a’, typically).
A panel function should not attempt to start a new plot, but just
plot within a given coordinate system: thus ‘plot’ and ‘boxplot’
are not panel functions.
The rendering of arguments ‘xlab’ and ‘ylab’ is not controlled by
‘par’ arguments ‘cex.lab’ and ‘font.lab’ even though they are
plotted by ‘mtext’ rather than ‘title’.
Value:
‘co.intervals(., number, .)’ returns a (‘number’ x 2) ‘matrix’,
say ‘ci’, where ‘ci[k,]’ is the ‘range’ of ‘x’ values for the
‘k’-th interval.
References:
Chambers, J. M. (1992) _Data for models._ Chapter 3 of
_Statistical Models in S_ eds J. M. Chambers and T. J. Hastie,
Wadsworth & Brooks/Cole.
Cleveland, W. S. (1993) _Visualizing Data._ New Jersey: Summit
Press.
See Also:
‘pairs’, ‘panel.smooth’, ‘points’.
Examples:
## Tonga Trench Earthquakes
coplot(lat ~ long | depth, data = quakes)
given.depth <- co.intervals(quakes$depth, number = 4, overlap = .1)
coplot(lat ~ long | depth, data = quakes, given.v = given.depth, rows = 1)
## Conditioning on 2 variables:
ll.dm <- lat ~ long | depth * mag
coplot(ll.dm, data = quakes)
coplot(ll.dm, data = quakes, number = c(4, 7), show.given = c(TRUE, FALSE))
coplot(ll.dm, data = quakes, number = c(3, 7),
overlap = c(-.5, .1)) # negative overlap DROPS values
## given two factors
Index <- seq(length = nrow(warpbreaks)) # to get nicer default labels
coplot(breaks ~ Index | wool * tension, data = warpbreaks,
show.given = 0:1)
coplot(breaks ~ Index | wool * tension, data = warpbreaks,
col = "red", bg = "pink", pch = 21,
bar.bg = c(fac = "light blue"))
## Example with empty panels:
with(data.frame(state.x77), {
coplot(Life.Exp ~ Income | Illiteracy * state.region, number = 3,
panel = function(x, y, ...) panel.smooth(x, y, span = .8, ...))
## y ~ factor -- not really sensible, but 'show off':
coplot(Life.Exp ~ state.region | Income * state.division,
panel = panel.smooth)
})