Stats raving mad

The blog

Poor man’s pairs trading…

by M. Parzakonis on April 11, 2010

There is a central notion in Time Series Econometrics, cointegration. Loosely it refers to finding the long run equilibrium of two non-stationary series. As the most know non-stationary series examples comes from finance, cointegration is nowadays a tool for traders (not a common one though!). They use it as the theory behind pairs trading (aka Statistical Arbitrage).

In the following lines we use a simple pairs trading technique, studying the ratio of the two price evolution series. We use the New York versions of two of the greatest players in ATHEX, NBG and OTE.

stock <- "NBG"
stock1 <- "OTE"
start.date <- "2003-10-20"
end.date <- Sys.Date()
quote <- paste("http://ichart.finance.yahoo.com/table.csv?s=",
 stock,
 "&a=", substr(start.date,6,7),
 "&b=", substr(start.date, 9, 10),
 "&c=", substr(start.date, 1,4),
 "&d=", substr(end.date,6,7),
 "&e=", substr(end.date, 9, 10),
 "&f=", substr(end.date, 1,4),
 "&g=d&ignore=.csv", sep="")
quote1 <- paste("http://ichart.finance.yahoo.com/table.csv?s=",
 stock1,
 "&a=", substr(start.date,6,7),
 "&b=", substr(start.date, 9, 10),
 "&c=", substr(start.date, 1,4),
 "&d=", substr(end.date,6,7),
 "&e=", substr(end.date, 9, 10),
 "&f=", substr(end.date, 1,4),
 "&g=d&ignore=.csv", sep="")
dataNBG.l <- read.csv(quote, as.is=TRUE)
dataOTE.l <- read.csv(quote1, as.is=TRUE)
X2=dataOTE.l[order(dataOTE.l$Date),];Y2=dataNBG.l[order(dataNBG.l$Date),]

(more →)

Probability manipulations with Mathematica…

by M. Parzakonis on April 1, 2010

There comes a time that a statistician needs to do some ananytic calculations. There more than a bunch of tools to use but I usually prefer Mathematica or Maple. Today, I’m gonna use Mathematica to do a simple exhibition.

Let’s set this example upon the  $$ U(2 \theta _1-\theta _2\leq x\leq 2 \theta _1+\theta _2) $$  distribution.

pfun = PDF[UniformDistribution[{2*Subscript[θ, 1] - Subscript[θ, 2],
2*Subscript[θ, 1] + Subscript[θ, 2]}], x]

$$ \begin{cases}
\frac{1}{2 \theta _2} & 2 \theta _1-\theta _2\leq x\leq 2 \theta _1+\theta _2 \\
0 & \text{True}
\end{cases} $$

One of the most intensive calculations is the characteristic function (eq. the moment generating function). This is straightforward to derive.

cfun=CharacteristicFunction[UniformDistribution[
{2*Subscript[θ, 1]-Subscript[θ, 2],2*Subscript[θ, 1]+Subscript[θ, 2]}],x]

$$ -\frac{i \left(-e^{i x \left(2 \theta _1-\theta _2\right)}+e^{i x \left(2 \theta _1+\theta _2\right)}\right)}{2 x \theta _2} $$.

The Table[] command calculates for us the raw moments for our distribution.

Table[Limit[D[cfun, {x, n}], x -> 0]/I^n, {n, 4}]

$$ \left\{2 \theta _1,\frac{1}{3} \left(12 \theta _1^2+\theta _2^2\right),2 \theta _1 \left(4 \theta _1^2+\theta _2^2\right),16 \theta _1^4+8 \theta _1^2 \theta _2^2+\frac{\theta _2^4}{5}\right\} $$.

Calculate the sample statistics.

T=List[8.23,6.9,1.05,4.8,2.03,6.95];
{Mean[T],Variance[T]}

$$ \{4.99333,8.46171\} $$.

Now, we can use a simple moment matching technique to get estimates for the parameters.

Solve[{Mean[T]-2*Subscript[θ, 1]==0,-(2*Subscript[θ, 1])^2+
1/3 (12 Subscript[θ, 1]^2+\!\*SubsuperscriptBox[\(θ\), \(2\), \(2\)])-
Variance[T]==0},{Subscript[θ, 2],Subscript[θ, 1]}]

$$\left\{\left\{\theta _1\to 2.49667,\theta _2\to -5.03836\right\},\left\{\theta _1\to 2.49667,\theta _2\to 5.03836\right\}\right\} $$.

Check the true value for the $$ \theta _2$$.

Reduce[2 Subscript[θ, 1]-Subscript[θ, 2]<=2 Subscript[θ, 1]+Subscript[θ, 2],
Subscript[θ, 2]]

$$ \theta _1\in \text{Reals}\&\&\theta _2\geq 0 $$.

Then, $$\left\{\left\{\theta _1\to 2.49667,\theta _2\to 5.03836\right\}\right\} $$.

A von Mises variate…

by M. Parzakonis on March 25, 2010

Inspired from a mail that came along the previous random generation post the following question rised :

How to draw random variates from the Von Mises distribution?

First of all let’s check the pdf of the probability rule, it is $$ f(x):=\frac{e^{b \text{Cos}[y-a]}}{2 \pi \text{BesselI}[0,b]}$$, for $$-\pi \leq x\leq \pi $$.

Ok, I admit that Bessels functions can be a bit frightening, but there is a work around we can do. The solution is a Metropolis algorithm simulation. It is not necessary to know the normalizing constant, because it will cancel in the computation of the ratio. The following code is adapted from James Gentle’s notes on Mathematical Statistics .

n <- 1000
x <- rep(NA,n)
a <-1
c <-3
yi <-3
j <-0

i<-2
while (i < n) {
	i<-i+1
	yip1 <- yi + 2*a*runif(1)- 1
	if (yip1 < pi & yip1 > - pi) {
		if (exp(c*(cos(yip1)-cos(yi))) > runif(1)) yi <- yip1
		else yi <- x[i-1]
		x[i] <- yip1
	}
}
hist(x,probability=TRUE,fg = gray(0.7), bty="7")
lines(density(x,na.rm=TRUE),col="red",lwd=2)

R 2.11.0 due date

by M. Parzakonis on March 24, 2010

This is the announcement as posted in the mailing list :

This is to announce that we plan to release R version 2.11.0 on Thursday,
April 22, 2010.

Those directly involved should review the generic schedule at
http://developer.r-project.org/release-checklist.html

The source tarballs will be made available daily (barring build
troubles) via
http://cran.r-project.org/src/base-prerelease/

For the R Core Team
Peter Dalgaard

The distribution of rho…

by M. Parzakonis on March 21, 2010

There was a post here about obtaining non-standard p-values for testing the correlation coefficient. The R-library

SuppDists

deals with this problem efficiently.

library(SuppDists)

plot(function(x)dPearson(x,N=23,rho=0.7),-1,1,ylim=c(0,10),ylab="density")
plot(function(x)dPearson(x,N=23,rho=0),-1,1,add=TRUE,col="steelblue")
plot(function(x)dPearson(x,N=23,rho=-.2),-1,1,add=TRUE,col="green")
plot(function(x)dPearson(x,N=23,rho=.9),-1,1,add=TRUE,col="red");grid()

legend("topleft", col=c("black","steelblue","red","green"),lty=1,
		legend=c("rho=0.7","rho=0","rho=-.2","rho=.9"))</pre>

This is how it looks like,


Now, let’s construct a table of critical values for some arbitrary or not significance levels.

q=c(.025,.05,.075,.1,.15,.2)
xtabs(qPearson(p=q, N=23, rho = 0, lower.tail = FALSE, log.p = FALSE) ~ q )
# q
#     0.025      0.05     0.075       0.1      0.15       0.2
# 0.4130710 0.3514298 0.3099236 0.2773518 0.2258566 0.1842217

We can calculate p-values as usual too…

1-pPearson(.41307,N=23,rho=0)
# [1] 0.0250003

Oh, mr counselor!

by M. Parzakonis on March 19, 2010

Xm…

There were two men trying to decide what to do for a living.  They went to see a counselor, and he decided that they had good problem solving skills.

He tried a test to narrow the area of specialty.  He put each man in a room with a stove, a table, and a pot of water on the table.  He said “Boil the water”.  Both men moved the pot from the table to the stove and turned on the burner to boil the water.  Next, he put them into a room with a stove, a table, and a pot of water on the floor.  Again, he said “Boil the water”.  The first man put the pot on the stove and turned on the burner.  The counselor told him to be an Engineer, because he could solve each problem individually.  The second man moved the pot from the floor to the table, and then moved the pot from the table to the stove and turned on the burner.  The counselor told him to be a mathematician because he reduced the problem to a previously solved problem.

- From The Mathematician, The Physicist and The Engineer (and Others)

In search of a random gamma variate…

by M. Parzakonis on March 16, 2010

One of the most common exersices given to Statistical Computing,Simulation or relevant classes is the generation of random numbers from a gamma distribution. At first this might seem straightforward in terms of the lifesaving relation that exponential and gamma random variables share. So, it’s easy to get a gamma random variate using the fact that

$$ {{X}_{i}}\tilde{\ }Exp(\lambda )\Rightarrow \sum\limits_{i}{{{X}_{i}}}\tilde{\ }Ga(k,\lambda )$$.

The code to do this is the following

rexp1 <- function(lambda, n) {
  u <- runif(n)
  x <- -log(u)/lambda
  }

rgamma1 <- function(k, lambda) {
  sum(rexp1(lambda, k))
}

This works unfortunately only for the case $$ k\in \mathbb{N}$$.
(more →)