關於SAS的PROC GLIMMIX
April 14, 2010
相信有些人有用過SAS
的GLIMMIX
:
proc glimmix data=test1; class treat; prop = percentage * 100; model prop = drug age drug*age / dist=beta; run;
這個Proc頗為強大,它能作我所有用到的事(Bayesian除外)
(待續)
如果SAS 不是你的菜
February 28, 2010
SAS
,世界上的最大的統計軟體公司(但,如果你把Data mining, sorting and searching也當成統計事業的話, 那現在最大的一家統計軟體公司叫作Google), 他們的產品的品質我完全是高度肯定,高度滿意的。 SAS
的設計初衷並不是給”統計計算家”的。 從它的一堆PROC
的名稱來看, 它希望提供給使用者的是一堆標準作業程序, 而不是一堆函式(function)。 SAS
也是有IML的,但那畢竟跟可以讓你”事必親躬”的function language 不一樣。 更重要的事他產生一都統計量都要靠統計學家來解釋, 對於SAS的衣食之恩,想必諸多的統計從業人員會銘感五內。
第一次的跟SAS的邂逅一點也不浪漫,那是一個風雨交加的夜晚, 我跟我的PC 同處一室(在夜系上的Lab只有我跟值日生,以此來說,值日生算是設備不算人), 我心裡小豬亂撞地點下SAS, 看看這常常聽到的簡寫, 到底有多強大。
Maya, 這什麼東西呀?
個program editor, 一個ojbect browser, 沒了。
我光是試圖生出一個normal的random sample 就花掉了我一個多鐘頭
但對有些還沒去上班,或買不起SAS的人來說,那飯碗似乎可望不可及。在OpenSource運動的加持下, 我們有了R
。
相較起來, R
就顯得對那些從小就開始寫C, C++, 或Java的胃口,比如在下。
(to be continued)
給R的coxph函式用的資料處理函式
December 8, 2009
在用R
在survival
套件中提供的coxph
作cox proportional hazard model 引進時間相關解釋變數時,
\[ h(t_i) = h_0(t_i)\exp[\mathbf X_i\boldsymbol\beta_i]\]
我們會發現處理這樣的變數是有困難的, 不能直接套用。
下面的R
函式可以幫我們解決這個問題:
expand.breakpoints <- function(dataset, index = "patnum", status = "status", tevent = "time", Zvar = F, breakpoints = NULL) { ## Expand so that each individual has a row for each ## unique failure time that is <= his/her/its own failure time. ## ## Original version written by Kathy Russell, for ## Kenneth Hess, Department of Biomathematics, ## University of Texas M. D. Anderson Cancer Centre. ## ## Substantially modified by JHM, Nov 23, 1998. ## ## ERROR checking onceonly <- paste("must be a character string matching exactly", "one name in the first parameter, dataset") ## Require dataset to be of type data.frame if((missing(dataset) || !is.data.frame(dataset))) stop("\n Parameter, dataset, must be a data frame") varset <- names(dataset) covset <- varset lvset <- 1:length(varset) # Require dataset to have unique names if(any(duplicated(varset))) { stop(paste("\n Parameter, dataset, must have uniquely defined", "column names")) } ## Require index to match exactly 1 dataset name if(length((indexloc <- lvset[varset == index])) != 1) stop(paste("\n Parameter, index,", onceonly)) covset <- covset[covset != index] ## Require status to match exactly 1 dataset name if(length((statusloc <- lvset[varset == status])) != 1) stop(paste("\n Parameter, status,", onceonly)) covset <- covset[covset != status] ## Require tevent to match exactly 1 dataset name if(length((teventloc <- lvset[varset == tevent])) != 1) stop(paste("\n Parameter, tevent,", onceonly)) covset <- covset[covset != tevent] # ----------------------------- ## Form vector of breakpoints, if necessary if(is.null(breakpoints)) { times <- dataset[, tevent] breakpoints = 1 if((is.null(breakpoints)) || (!(is.numeric(breakpoints))) || (!(is.vector(breakpoints))) || (length(breakpoints) < 1)) stop(paste("\n Parameter, breakpoints, must be a numeric vector", "with at least 1 element")) ##***************** ## Begin ##***************** n <- nrow(dataset) temp.stop 0) temp.start <- c(0, breakpoints) else temp.start <- c(-1, breakpoints) n.break <- length(breakpoints) + 1 t.event <- dataset[, tevent] ## --------------------------- ## Begin calculate n.epochs n.epochs <- sapply(1:n, function(m, breaks, times) sum(breaks < times[m]) + 1, breaks = breakpoints, times = t.event) ## End n.epochs id <- rep(1:n, n.epochs) ## Index into supplied dataset last.epoch <- cumsum(n.epochs) epoch <- unlist(sapply(n.epochs, function(x) 1:x)) ## Index into vectors of interval start & stop points if(Zvar) { Zmat <- diag(rep(1, n.break))[epoch, ] Z.name <- paste("Z", seq(1, n.break), sep = "") } Tstop <- temp.stop[epoch] Tstop[last.epoch] <- dataset[, tevent] status2 <- rep(0, length(epoch)) status2[last.epoch] <- dataset[, status] new.dataset <- data.frame(dataset[id, index, drop = F], temp.start[epoch], Tstop, status2, epoch, dataset[id, covset, drop = F]) if(Zvar) new.dataset <- data.frame(new.dataset, Zmat) nam <- c(index, "Tstart", "Tstop", status, "epoch", covset) if(Zvar) nam <- c(nam, Z.name) dimnames(new.dataset) <- list(1:length(epoch), nam) return(new.dataset) }
SAS
則是非常簡單, 可參閱phreg
的文件。
Read ERIM Data: from SAS to R
November 8, 2009
After you obtained from ERIM
, you can used following code to export the data from SAS
‘s .sas7bdat
files to csv format files that is more handy to R
users:
The SAS
code:
libname erim 'c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\saslib'; filename erim_ke 'c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\saslib\erim_ke'; filename erim_ma 'c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\saslib\erim_ma'; filename erim_pb 'c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\saslib\erim_pb'; filename erim_su 'c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\saslib\erim_su'; filename erim_ti 'c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\saslib\erim_ti'; filename erim_tu 'c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\saslib\erim_tu'; proc cimport library=erim infile=erim_ke; run; proc cimport library=erim infile=erim_ma; run; proc cimport library=erim infile=erim_pb; run; proc cimport library=erim infile=erim_su; run; proc cimport library=erim infile=erim_ti; run; proc cimport library=erim infile=erim_tu; run; PROC EXPORT DATA=erim.kchp_hp1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Ketchup\kchp_hp1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.kchp_hp2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Ketchup\kchp_hp2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.kchp_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Ketchup\kchp_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.kchp_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Ketchup\kchp_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.kchp_upc OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Ketchup\kchp_upc.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.marg_hp1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Margarine\marg_hp1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.marg_hp2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Margarine\marg_hp2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.stk_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Margarine\stk_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.stk_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Margarine\stk_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tub_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Margarine\tub_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tub_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Margarine\tub_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.marg_upc OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Margarine\marg_upc.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.pbtr_hp1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Peanut Butter\pbtr_hp1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.pbtr_hp2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Peanut Butter\pbtr_hp2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.pbtr_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Peanut Butter\pbtr_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.pbtr_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Peanut Butter\pbtr_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.pbtr_upc OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Peanut Butter\pbtr_upc.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.sug_hp1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Sugar\sug_hp1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.sug_hp2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Sugar\sug_hp2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.sug_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Sugar\sug_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.sug_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Sugar\sug_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.sug_upc OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Sugar\sug_upc.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tiss_hp1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Toilet Tissue\tiss_hp1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tiss_hp2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Toilet Tissue\tiss_hp2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tiss_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Toilet Tissue\tiss_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tiss_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Toilet Tissue\tiss_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tiss_upc OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Toilet Tissue\tiss_upc.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tuna_hp1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Canned Tuna\tuna_hp1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tuna_hp2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Canned Tuna\tuna_hp2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.lite_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Canned Tuna\lite_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.lite_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Canned Tuna\lite_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.wht_pf1 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Canned Tuna\wht_pf1.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.wht_pf2 OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Canned Tuna\wht_pf2.csv' DBMS=CSV REPLACE; RUN; PROC EXPORT DATA=erim.tuna_upc OUTFILE='c:\users\gongyi\My Documents\My SAS Files\9.1\Lab\ERIM\Canned Tuna\tuna_upc.csv' DBMS=CSV REPLACE; RUN;
The R
code:
dataDirectory <- "../Data/ERIM" dataDirectory.ketchup <- paste(dataDirectory, "/Ketchup", sep='') dataDirectory.margarine <- paste(dataDirectory, "/Margarine", sep='') dataDirectory.peanutButter <- paste(dataDirectory, "/Peanut Butter", sep='') dataDirectory.sugar <- paste(dataDirectory, "/Sugar", sep='') dataDirectory.toiletTissue <- paste(dataDirectory, "/Toilet Tissue", sep='') dataDirectory.cannedTuna <- paste(dataDirectory, "/Canned Tuna", sep='') kchp.hp1 <- read.csv(paste(dataDirectory.ketchup, "/kchp_hp1.csv", sep="" ), header=T) kchp.hp2 <- read.csv(paste(dataDirectory.ketchup, "/kchp_hp2.csv", sep="" ), header=T) kchp.pf1 <- read.csv(paste(dataDirectory.ketchup, "/kchp_pf1.csv", sep="" ), header=T) kchp.pf2 <- read.csv(paste(dataDirectory.ketchup, "/kchp_pf2.csv", sep="" ), header=T) kchp.upc <- read.csv(paste(dataDirectory.ketchup, "/kchp_upc.csv", sep="" ), header=T) marg.hp1 <- read.csv(paste(dataDirectory.margarine, "/marg_hp1.csv", sep="" ), header=T) marg.hp2 <- read.csv(paste(dataDirectory.margarine, "/marg_hp2.csv", sep="" ), header=T) stk.pf1 <- read.csv(paste(dataDirectory.margarine, "/stk_pf1.csv", sep="" ), header=T) stk.pf2 <- read.csv(paste(dataDirectory.margarine, "/stk_pf2.csv", sep="" ), header=T) tub.pf1 <- read.csv(paste(dataDirectory.margarine, "/tub_pf1.csv", sep="" ), header=T) tub.pf2 <- read.csv(paste(dataDirectory.margarine, "/tub_pf2.csv", sep="" ), header=T) marg.upc <- read.csv(paste(dataDirectory.margarine, "/marg_upc.csv", sep="" ), header=T) pbtr.hp1 <- read.csv(paste(dataDirectory.peanutButter, "/pbtr_hp1.csv", sep="" ), header=T) pbtr.hp2 <- read.csv(paste(dataDirectory.peanutButter, "/pbtr_hp2.csv", sep="" ), header=T) pbtr.pf1 <- read.csv(paste(dataDirectory.peanutButter, "/pbtr_pf1.csv", sep="" ), header=T) pbtr.pf2 <- read.csv(paste(dataDirectory.peanutButter, "/pbtr_pf2.csv", sep="" ), header=T) pbtr.upc <- read.csv(paste(dataDirectory.peanutButter, "/pbtr_upc.csv", sep="" ), header=T) sug.hp1 <- read.csv(paste(dataDirectory.sugar, "/sug_hp1.csv", sep="" ), header=T) sug.hp2 <- read.csv(paste(dataDirectory.sugar, "/sug_hp2.csv", sep="" ), header=T) sug.pf1 <- read.csv(paste(dataDirectory.sugar, "/sug_pf1.csv", sep="" ), header=T) sug.pf2 <- read.csv(paste(dataDirectory.sugar, "/sug_pf2.csv", sep="" ), header=T) sug.upc <- read.csv(paste(dataDirectory.sugar, "/sug_upc.csv", sep="" ), header=T) tiss.hp1 <- read.csv(paste(dataDirectory.toiletTissue, "/tiss_hp1.csv", sep="" ), header=T) tiss.hp2 <- read.csv(paste(dataDirectory.toiletTissue, "/tiss_hp2.csv", sep="" ), header=T) tiss.pf1 <- read.csv(paste(dataDirectory.toiletTissue, "/tiss_pf1.csv", sep="" ), header=T) tiss.pf2 <- read.csv(paste(dataDirectory.toiletTissue, "/tiss_pf2.csv", sep="" ), header=T) tiss.upc <- read.csv(paste(dataDirectory.toiletTissue, "/tiss_upc.csv", sep="" ), header=T) tuna.hp1 <- read.csv(paste(dataDirectory.cannedTuna, "/tuna_hp1.csv", sep="" ), header=T) tuna.hp2 <- read.csv(paste(dataDirectory.cannedTuna, "/tuna_hp2.csv", sep="" ), header=T) wht.pf1 <- read.csv(paste(dataDirectory.cannedTuna, "/wht_pf1.csv", sep="" ), header=T) wht.pf2 <- read.csv(paste(dataDirectory.cannedTuna, "/wht_pf2.csv", sep="" ), header=T) lite.pf1 <- read.csv(paste(dataDirectory.cannedTuna, "/lite_pf1.csv", sep="" ), header=T) lite.pf2 <- read.csv(paste(dataDirectory.cannedTuna, "/lite_pf2.csv", sep="" ), header=T) tuna.upc <- read.csv(paste(dataDirectory.cannedTuna, "/tuna_upc.csv", sep="" ), header=T)
equivalent R code to Chen’s Lecture node
November 8, 2009
R
code:
require(survival) options(contrasts=c('contr.SAS', 'contr.SAS')) larynx <- read.table("larynx.data", header=T) larynx$stage <- as.factor(larynx$stage) for (i in 2:length(unique(larynx$stage))) { tmp1 <- !(larynx$stage == i) tmp2 <- !(larynx$stage == (i-1)) larynx <- cbind(larynx, tmp1, tmp2) colnames(larynx)[ncol(larynx)-1] <- paste("Z", i-1, sep='') colnames(larynx)[ncol(larynx)] <- paste("stage", i-1, sep='') } larynx.survReg1 <- survreg(Surv(stime, censor) ~ Z1 + Z2 + Z3 + age, data=larynx, dist="weibull") larynx.survReg2 <- survreg(Surv(stime, censor) ~ stage + age, data=larynx, dist="weibull") larynx.survReg3 <- survreg(Surv(stime, censor) ~ stage1 + stage2 + stage3 + age, data=larynx, dist="weibull")
Digest: Crowder (1996)
November 5, 2009
According to Crowder (1996)1
, consider that if i.i.d. random variables $Y_1,Y_2,\ldots, Y_n$ have following survivial function:
\[S(y) = \exp\left\{\frac{\tau}{\alpha}[\kappa^\alpha – (\kappa + H(y))^\alpha]\right\}\]
then
\[f(\min\{y_i\}_{i=1}^n) = n\tau(\kappa+H(t))^{\alpha-1}[S(t)]^n\]
and
\[F(\max\{y_i\}_{i=1}^n) =[F(t)]^n, \]
\[S(\min\{y_i\}_{i=1}^n) =[S(t)]^n\]
For $\kappa=1$, $H(t)=\exp(t)$, then it becomes extreme value distribution:
\[ S(t) = \exp\left\{\frac{\tau}{\alpha}[1 – \exp(\alpha t)]\right\}\]
For $\kappa=0$, $\theta = \frac{\lambda}{\alpha}$ $H(t)=t$, it becomes Weibull:
\[S(t) = \exp(-\theta t^\alpha)\]
For $\kappa=0$, $\alpha=1$ and $H(t)=t$, it becomes exponential:
\[S(t) = \exp(-\lambda t)\]
While $\alpha\rightarrow 0$, it becomes compound exponential:
\[ \lim_{\alpha\rightarrow 0}S(t) = \left(1 +\frac{H(t)}{\kappa}\right)^{-\lambda}\]
1. Martin Crowder: Some Tests Based on Extreme Values for a Parametric Survival Model, Journal of Royal Statistical Society, Series B, 1996, Vol.58, No2, pages 417-424
一些Emacs + RunTime的問題
October 18, 2009
用了Emacs
也快十年了, 愈用愈覺得Emacs
的強大。
但,強大歸強大, 它還是有些讓我不滿意的地方, 讓我愈用火愈大:
Buffer lock
: 尤其是有在用Emacs+ESS+R
的人可能感觸特別深, 每次執行大程式的時候,Emacs
就被鎖在那個buffer
了, 按任切換鍵都沒有用, 一定得等它跑完才能切換到另一個buffer
。- 遠端執行: 有時候需要把程式送到
cluster
去算, 但中間只要ssh
的連線一出問題, 就卡死在那個buffer
, 動彈不得, 只能砍掉重啓。 EShell
的clean
完全沒有用。Emacsclient
是設計給mime
用的, 但是難以調整成你想用的樣子, 試試emacsclient -t -a 'emacs23 -f eshell'
便知道怎麼一回事。- 想到再加。
以上。
BASIC SAS <-> R random number function
October 4, 2009
RANBIN -> rbinom RANCAU -> rcauchy RANEXP -> rexp RANGAM -> rgamma RANNOR -> rnorm RANPOI -> rpois RANUNI -> runif
中肯的好文
September 18, 2009
我這篇是寫得有點晚了, 在這篇文章中, Jason Burke, SAS的R&D工程師(life and health science R&D head), 發表了一篇有關R在商業領域上的應用及SAS對R的態度的文章, 十分中肯。 我是一個R的daily user, 我認為Burke在文中的許多看法是十分顗切的,非常值得一讀。
random number generation: R v.s. SAS
September 18, 2009
In SAS:
data randomSample; do _n_ = 1 to 1000; ranNum = rannor(-1); output; end; run;
In R:
ranSample <- rnorm(1000)