Posts tagged with code

0 comments

Book shoppin’…

I honestly have no book on R programming. In fact I have not a single book on programming at all (my coding proves that ;x). I am pretty sure that I am gonna order (just did!) that book.

You can get a look of Matloff’s text here (= pdf for ya)

From the book website :

IPSUR stands for Introduction to Probability and Statistics Using R,
ISBN: 978-0-557-24979-4, which is a textbook written for an
undergraduate course in probability and statistics. The approximate
prerequisites are two or three semesters of calculus and some linear
algebra in a few places. Attendees of the class include mathematics,
engineering, and computer science majors.

Now, there is a new way to read R books (anyway, new to me!)

1
2
3
install.packages("IPSUR", repos="http://cran.r-project.org")
library(IPSUR)
read(IPSUR)

Recently I was in need of testing a mean vector. I wrote a few lines of code in R and had it done perfectly. Hotelling test is one of the least interesting test to me. never really figured out why…

At that time I had some time to search more about it. One of the most common things to search for a test is a robust version of it (at least that’s what I search for!). A little search in the 3rd page of google results leads to the following :

One-sample and two-sample robust Hotelling tests with fast and robust bootstrap

The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modified into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey’s biweight function, is tuned by default to have a breakdown point of 50% and 95% location efficiency. This could be changed through the control argument if desired.

Robust Hotelling T2 test

Performs one and two sample Hotelling T2 tests as well as robust one-sample Hotelling T2 test.

The first uses MM and S estimators while the latter a Minimum Covariance Determinant one. You can get info on those on the links in the end of the post. What might be crucial to you is that MM/S estimators would be more time comsuming compared to MCD. A little demonstation is the following.. Read the rest of this entry »

Inspired from a mail that came along the previous random generation post the following question rised :

How to draw random variates from the Von Mises distribution?

First of all let’s check the pdf of the probability rule, it is $$ f(x):=\frac{e^{b \text{Cos}[y-a]}}{2 \pi \text{BesselI}[0,b]}$$, for $$-\pi \leq x\leq \pi $$.

Ok, I admit that Bessels functions can be a bit frightening, but there is a work around we can do. The solution is a Metropolis algorithm simulation. It is not necessary to know the normalizing constant, because it will cancel in the computation of the ratio. The following code is adapted from James Gentle’s notes on Mathematical Statistics .

n <- 1000
x <- rep(NA,n)
a <-1
c <-3
yi <-3
j <-0

i<-2
while (i < n) {
	i<-i+1
	yip1 <- yi + 2*a*runif(1)- 1
	if (yip1 < pi & yip1 > - pi) {
		if (exp(c*(cos(yip1)-cos(yi))) > runif(1)) yi <- yip1
		else yi <- x[i-1]
		x[i] <- yip1
	}
}
hist(x,probability=TRUE,fg = gray(0.7), bty="7")
lines(density(x,na.rm=TRUE),col="red",lwd=2)

There was a post here about obtaining non-standard p-values for testing the correlation coefficient. The R-library

SuppDists

deals with this problem efficiently.

library(SuppDists)

plot(function(x)dPearson(x,N=23,rho=0.7),-1,1,ylim=c(0,10),ylab="density")
plot(function(x)dPearson(x,N=23,rho=0),-1,1,add=TRUE,col="steelblue")
plot(function(x)dPearson(x,N=23,rho=-.2),-1,1,add=TRUE,col="green")
plot(function(x)dPearson(x,N=23,rho=.9),-1,1,add=TRUE,col="red");grid()

legend("topleft", col=c("black","steelblue","red","green"),lty=1,
		legend=c("rho=0.7","rho=0","rho=-.2","rho=.9"))</pre>

This is how it looks like,


Now, let’s construct a table of critical values for some arbitrary or not significance levels.

q=c(.025,.05,.075,.1,.15,.2)
xtabs(qPearson(p=q, N=23, rho = 0, lower.tail = FALSE, log.p = FALSE) ~ q )
# q
#     0.025      0.05     0.075       0.1      0.15       0.2
# 0.4130710 0.3514298 0.3099236 0.2773518 0.2258566 0.1842217

We can calculate p-values as usual too…

1-pPearson(.41307,N=23,rho=0)
# [1] 0.0250003
0 comments

\pi day!

It’s π-day today so we gonna have a little fun today with Buffon’s needle and of course R. A well known approximation to the value of $latex \pi$ is the experiment tha Buffon performed using a needle of length,$latex l$. What I do in the next is only to copy from the following file the function estPi and to use an ergodic sample plot… Lame,huh?

estPi<- function(n, l=1, t=2) {
 m <- 0
 for (i in 1:n) {
 x <- runif(1)
 theta <- runif(1, min=0, max=pi/2)
 if (x < l/2 * sin(theta)) {
 m <- m +1
 }
 }
 return(2*l*n/(t*m))
}

So, an estimate would be…
Read the rest of this entry »

1 comments

A quicky..

If you’re (and you should) interested in principal components then take a good look at this. The linked post will take you by hand to do everything from scratch. If you’re not in the mood then the dollowing R functions will help you.

An example.

# Generates sample matrix of five discrete clusters that have
# very different mean and standard deviation values.
z1 <- rnorm(10000, mean=1, sd=1);
z2 <- rnorm(10000, mean=3, sd=3);
z3 <- rnorm(10000, mean=5, sd=5);
z4 <- rnorm(10000, mean=7, sd=7);
z5 <- rnorm(10000, mean=9, sd=9);
mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T,
dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep="")))

# Performs principal component analysis after scaling the data.
# It returns a list with class "prcomp" that contains five components:
#   (1) the standard deviations (sdev) of the principal components,
#   (2) the matrix of eigenvectors (rotation),
#   (3) the principal component data (x),
#   (4) the centering (center) and
#   (5) scaling (scale) used.
pca <- prcomp(mydata, scale=T)
 Read the rest of this entry »
1 comments

The truncated Poisson

A common model for counts data is the Poisson. There are cases however that we only record positive counts, ie there is a truncation of 0. This is the truncated Poisson model.

To study this model we only need the total counts and the sample size. This comes from the sufficient statistic principle as the likelihood is

$$ logL=-n-\frac{e^{-\lambda } n}{1-e^{-\lambda }}+\frac{T}{\lambda }$$,

where $$ T=\sum _X$$. Let’s set $$ T=160$$ and  $$ n=50$$.

sum.x=160
n=50
library(maxLik)

loglik <- function(theta, n, sum.x) {
- n* log(exp(theta) - 1) + sum.x * log(theta)
}

The gradient is

$$ \frac{e^{-2 \lambda } n}{\left(1-e^{-\lambda }\right)^2}+\frac{e^{-\lambda } n}{1-e^{-\lambda }}-\frac{T}{\lambda ^2}$$.

Read the rest of this entry »

14 comments

Uh!

Didn’t know this…

a<-structure(c(25,34,12,5),.Names=c("0","2","4","7+"))
> data
 0  2  4 7+
25 34 12  5

It’s becoming clear that I have learned R in the most unstructured way…I always do it in two stages :ashamed:

> data<-c(25,34,12,5)
> names(data)<-c("0","2","4","7+")
> data
 0  2  4 7+
25 34 12  5

It’s really useful to wrap it all in a single function.

Attribute Specification

Description:

 ‘structure’ returns the given object with further attributes
 set.

Usage:

 structure(.Data, ...)

Arguments:

 .Data: an object which will have various attributes attached to it.

 ...: attributes, specified in ‘tag=value’ form, which will be
 attached to data.

Details:

 Adding a class ‘"factor"’ will ensure that numeric codes are
 given integer storage mode.

 For historical reasons (these names are used when deparsing),
 attributes ‘".Dim"’, ‘".Dimnames"’, ‘".Names"’,
 ‘".Tsp"’ and ‘".Label"’ are renamed to ‘"dim"’,
 ‘"dimnames"’, ‘"names"’, ‘"tsp"’ and ‘"levels"’.

 It is possible to give the same tag more than once, in which case
 the last value assigned wins.  As with other ways of assigning
 attributes, using ‘tag=NULL’ removes attribute ‘tag’ from
 ‘.Data’ if it is present.
0 comments

A merry wolfram xmas!

Christmas coming & time for fun!  Wolfram’s demonstrations give you a sense of the holiday season along some nice demonstrations. I particulary like the “Ornamental Holiday Decoration

Manipulate[
 Module[{level0, level1, level2},
 level0 = C[spikey, 0];
 level1 =
 Flatten[daughterPolyhedra[C[spikey, 0], {d1, ω1, s1}, ρ s1]];
 level2 =
 Flatten[daughterPolyhedra[#, {d2, ω2, s2}, ρ  s1 s2] & /@
 Cases[level1, _C]];
 Graphics3D[{EdgeForm[],
 {Red, egc[level0]}, gg1 = {col1, egc[level1]},
 gg2 = {col2, ControlActive[{}, egc[level2]]},
 Directive[GrayLevel[0.2], Specularity[colc, 12]],
 ecc[{level1, level2}]}, Boxed -> False,
 ImageSize -> {400, 400}]],
 "layer 1:",
 {{d1, 1.5, "distance"}, -3, 3, ImageSize -> Tiny},
 {{ω1, 0, "rotation"}, -Pi, Pi, ImageSize -> Tiny},
 {{s1, 1/2, "size"}, 0, 1, ImageSize -> Tiny},
 {{col1, Yellow, "color"}, Blue, ControlType -> None},
 Delimiter,
 "layer 2:",
 {{d2, 1, "distance"}, -3, 3, ImageSize -> Tiny},
 {{ω2, 0, "rotation"}, -Pi, Pi, ImageSize -> Tiny},
 {{s2, 1/2, "size"}, 0, 1, ImageSize -> Tiny},
 {{col2, Green, "color"}, Green, ControlType -> None},
 Delimiter,
 "connectors:",
 {{ρ, 0.3, "radius"}, 0, 1, ImageSize -> Tiny},
 {{colc, Brown, "color"}, Yellow, ControlType -> None},
 AutorunSequencing -> {1, 3, 5, 7},
 Initialization :> {
 spikey =
 MapAt[Developer`ToPackedArray,
 MapAt[Developer`ToPackedArray, N[PolyhedronData["Spikey"]][[1]],
 1], {2, 1}];
 mirrorRotateAndShift[gc_GraphicsComplex,
 n_, {distance_, angle_, size_}, ρ_] :=
 With[{aux =
 mirrorRotateAndShiftCF[gc[[1]], gc[[1, n]], distance, angle,
 size]}, {C[GraphicsComplex[aux, gc[[2]]], n],
 Cylinder[{gc[[1, n]], aux[[n]]}, ρ]}];
 mirrorRotateAndShiftCF =
 Compile[{{vertices, _Real, 2}, {rPoint, _Real, 1}, distance,
 angle, size}, Read the rest of this entry »
Real Time Web Analytics