I honestly have no book on R programming. In fact I have not a single book on programming at all (my coding proves that ;x). I am pretty sure that I am gonna order (just did!) that book.

You can get a look of Matloff’s text here (= pdf for ya)
Posts tagged with code
I honestly have no book on R programming. In fact I have not a single book on programming at all (my coding proves that ;x). I am pretty sure that I am gonna order (just did!) that book.

You can get a look of Matloff’s text here (= pdf for ya)
From the book website :
IPSUR stands for Introduction to Probability and Statistics Using R,
ISBN: 978-0-557-24979-4, which is a textbook written for an
undergraduate course in probability and statistics. The approximate
prerequisites are two or three semesters of calculus and some linear
algebra in a few places. Attendees of the class include mathematics,
engineering, and computer science majors.
Now, there is a new way to read R books (anyway, new to me!)
1 2 3 | install.packages("IPSUR", repos="http://cran.r-project.org") library(IPSUR) read(IPSUR) |
Recently I was in need of testing a mean vector. I wrote a few lines of code in R and had it done perfectly. Hotelling test is one of the least interesting test to me. never really figured out why…
At that time I had some time to search more about it. One of the most common things to search for a test is a robust version of it (at least that’s what I search for!). A little search in the 3rd page of google results leads to the following :
One-sample and two-sample robust Hotelling tests with fast and robust bootstrap
The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modified into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey’s biweight function, is tuned by default to have a breakdown point of 50% and 95% location efficiency. This could be changed through the control argument if desired.
Robust Hotelling T2 test
Performs one and two sample Hotelling T2 tests as well as robust one-sample Hotelling T2 test.
The first uses MM and S estimators while the latter a Minimum Covariance Determinant one. You can get info on those on the links in the end of the post. What might be crucial to you is that MM/S estimators would be more time comsuming compared to MCD. A little demonstation is the following.. Read the rest of this entry »
Inspired from a mail that came along the previous random generation post the following question rised :
How to draw random variates from the Von Mises distribution?
First of all let’s check the pdf of the probability rule, it is $$ f(x):=\frac{e^{b \text{Cos}[y-a]}}{2 \pi \text{BesselI}[0,b]}$$, for $$-\pi \leq x\leq \pi $$.
Ok, I admit that Bessels functions can be a bit frightening, but there is a work around we can do. The solution is a Metropolis algorithm simulation. It is not necessary to know the normalizing constant, because it will cancel in the computation of the ratio. The following code is adapted from James Gentle’s notes on Mathematical Statistics .
n <- 1000
x <- rep(NA,n)
a <-1
c <-3
yi <-3
j <-0
i<-2
while (i < n) {
i<-i+1
yip1 <- yi + 2*a*runif(1)- 1
if (yip1 < pi & yip1 > - pi) {
if (exp(c*(cos(yip1)-cos(yi))) > runif(1)) yi <- yip1
else yi <- x[i-1]
x[i] <- yip1
}
}
hist(x,probability=TRUE,fg = gray(0.7), bty="7")
lines(density(x,na.rm=TRUE),col="red",lwd=2)
There was a post here about obtaining non-standard p-values for testing the correlation coefficient. The R-library
SuppDists
deals with this problem efficiently.
library(SuppDists)
plot(function(x)dPearson(x,N=23,rho=0.7),-1,1,ylim=c(0,10),ylab="density")
plot(function(x)dPearson(x,N=23,rho=0),-1,1,add=TRUE,col="steelblue")
plot(function(x)dPearson(x,N=23,rho=-.2),-1,1,add=TRUE,col="green")
plot(function(x)dPearson(x,N=23,rho=.9),-1,1,add=TRUE,col="red");grid()
legend("topleft", col=c("black","steelblue","red","green"),lty=1,
legend=c("rho=0.7","rho=0","rho=-.2","rho=.9"))</pre>
This is how it looks like,

Now, let’s construct a table of critical values for some arbitrary or not significance levels.
q=c(.025,.05,.075,.1,.15,.2)
xtabs(qPearson(p=q, N=23, rho = 0, lower.tail = FALSE, log.p = FALSE) ~ q )
# q
# 0.025 0.05 0.075 0.1 0.15 0.2
# 0.4130710 0.3514298 0.3099236 0.2773518 0.2258566 0.1842217
We can calculate p-values as usual too…
1-pPearson(.41307,N=23,rho=0)
# [1] 0.0250003
It’s π-day today so we gonna have a little fun today with Buffon’s needle and of course R. A well known approximation to the value of $latex \pi$ is the experiment tha Buffon performed using a needle of length,$latex l$. What I do in the next is only to copy from the following file the function estPi and to use an ergodic sample plot… Lame,huh?
estPi<- function(n, l=1, t=2) {
m <- 0
for (i in 1:n) {
x <- runif(1)
theta <- runif(1, min=0, max=pi/2)
if (x < l/2 * sin(theta)) {
m <- m +1
}
}
return(2*l*n/(t*m))
}
So, an estimate would be…
Read the rest of this entry »
If you’re (and you should) interested in principal components then take a good look at this. The linked post will take you by hand to do everything from scratch. If you’re not in the mood then the dollowing R functions will help you.
An example.
# Generates sample matrix of five discrete clusters that have
# very different mean and standard deviation values.
z1 <- rnorm(10000, mean=1, sd=1);
z2 <- rnorm(10000, mean=3, sd=3);
z3 <- rnorm(10000, mean=5, sd=5);
z4 <- rnorm(10000, mean=7, sd=7);
z5 <- rnorm(10000, mean=9, sd=9);
mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T,
dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep="")))
# Performs principal component analysis after scaling the data.
# It returns a list with class "prcomp" that contains five components:
# (1) the standard deviations (sdev) of the principal components,
# (2) the matrix of eigenvectors (rotation),
# (3) the principal component data (x),
# (4) the centering (center) and
# (5) scaling (scale) used.
pca <- prcomp(mydata, scale=T)
Read the rest of this entry »
A common model for counts data is the Poisson. There are cases however that we only record positive counts, ie there is a truncation of 0. This is the truncated Poisson model.
To study this model we only need the total counts and the sample size. This comes from the sufficient statistic principle as the likelihood is
$$ logL=-n-\frac{e^{-\lambda } n}{1-e^{-\lambda }}+\frac{T}{\lambda }$$,
where $$ T=\sum _X$$. Let’s set $$ T=160$$ and $$ n=50$$.
sum.x=160
n=50
library(maxLik)
loglik <- function(theta, n, sum.x) {
- n* log(exp(theta) - 1) + sum.x * log(theta)
}
The gradient is
$$ \frac{e^{-2 \lambda } n}{\left(1-e^{-\lambda }\right)^2}+\frac{e^{-\lambda } n}{1-e^{-\lambda }}-\frac{T}{\lambda ^2}$$.
Didn’t know this…
a<-structure(c(25,34,12,5),.Names=c("0","2","4","7+"))
> data
0 2 4 7+
25 34 12 5
It’s becoming clear that I have learned R in the most unstructured way…I always do it in two stages :ashamed:
> data<-c(25,34,12,5)
> names(data)<-c("0","2","4","7+")
> data
0 2 4 7+
25 34 12 5
It’s really useful to wrap it all in a single function.
Attribute Specification Description: ‘structure’ returns the given object with further attributes set. Usage: structure(.Data, ...) Arguments: .Data: an object which will have various attributes attached to it. ...: attributes, specified in ‘tag=value’ form, which will be attached to data. Details: Adding a class ‘"factor"’ will ensure that numeric codes are given integer storage mode. For historical reasons (these names are used when deparsing), attributes ‘".Dim"’, ‘".Dimnames"’, ‘".Names"’, ‘".Tsp"’ and ‘".Label"’ are renamed to ‘"dim"’, ‘"dimnames"’, ‘"names"’, ‘"tsp"’ and ‘"levels"’. It is possible to give the same tag more than once, in which case the last value assigned wins. As with other ways of assigning attributes, using ‘tag=NULL’ removes attribute ‘tag’ from ‘.Data’ if it is present.
Christmas coming & time for fun! Wolfram’s demonstrations give you a sense of the holiday season along some nice demonstrations. I particulary like the “Ornamental Holiday Decoration”
Manipulate[
Module[{level0, level1, level2},
level0 = C[spikey, 0];
level1 =
Flatten[daughterPolyhedra[C[spikey, 0], {d1, ω1, s1}, ρ s1]];
level2 =
Flatten[daughterPolyhedra[#, {d2, ω2, s2}, ρ s1 s2] & /@
Cases[level1, _C]];
Graphics3D[{EdgeForm[],
{Red, egc[level0]}, gg1 = {col1, egc[level1]},
gg2 = {col2, ControlActive[{}, egc[level2]]},
Directive[GrayLevel[0.2], Specularity[colc, 12]],
ecc[{level1, level2}]}, Boxed -> False,
ImageSize -> {400, 400}]],
"layer 1:",
{{d1, 1.5, "distance"}, -3, 3, ImageSize -> Tiny},
{{ω1, 0, "rotation"}, -Pi, Pi, ImageSize -> Tiny},
{{s1, 1/2, "size"}, 0, 1, ImageSize -> Tiny},
{{col1, Yellow, "color"}, Blue, ControlType -> None},
Delimiter,
"layer 2:",
{{d2, 1, "distance"}, -3, 3, ImageSize -> Tiny},
{{ω2, 0, "rotation"}, -Pi, Pi, ImageSize -> Tiny},
{{s2, 1/2, "size"}, 0, 1, ImageSize -> Tiny},
{{col2, Green, "color"}, Green, ControlType -> None},
Delimiter,
"connectors:",
{{ρ, 0.3, "radius"}, 0, 1, ImageSize -> Tiny},
{{colc, Brown, "color"}, Yellow, ControlType -> None},
AutorunSequencing -> {1, 3, 5, 7},
Initialization :> {
spikey =
MapAt[Developer`ToPackedArray,
MapAt[Developer`ToPackedArray, N[PolyhedronData["Spikey"]][[1]],
1], {2, 1}];
mirrorRotateAndShift[gc_GraphicsComplex,
n_, {distance_, angle_, size_}, ρ_] :=
With[{aux =
mirrorRotateAndShiftCF[gc[[1]], gc[[1, n]], distance, angle,
size]}, {C[GraphicsComplex[aux, gc[[2]]], n],
Cylinder[{gc[[1, n]], aux[[n]]}, ρ]}];
mirrorRotateAndShiftCF =
Compile[{{vertices, _Real, 2}, {rPoint, _Real, 1}, distance,
angle, size}, Read the rest of this entry »