Stats raving mad

The blog

Book shoppin’…

by M. Parzakonis on November 24, 2011

I honestly have no book on R programming. In fact I have not a single book on programming at all (my coding proves that ;x). I am pretty sure that I am gonna order (just did!) that book.

You can get a look of Matloff’s text here (= pdf for ya)

Data is everywhere!

by M. Parzakonis on November 19, 2011

I was writing earlier today that I am getting really fed to using the same datasets over and over again. Of course using the same data over time with different methods (eg look this) serves really well on a comparison scope but still we can use other data in a web world. For example, you can get interesting sets from BuzzData (or similar services).

Η φωτογραφία προφίλ του χρήστη Manos Parzakonis  -  12:31 μ.μ.  -  Δημόσιο

Plz don’t make me type

data(BostonHousing)

ever again. I hate this dataset ;x
#rstats#rant

PS: Of course you can add me to your Google+ circles…

Blogs to visit…

by M. Parzakonis on August 23, 2010

Peter Skomoroch came up with a list of blogs with “data (analysis)” as their core subject. There you go…I post the first few, you can follow th link to the page of the original post to get more. Btw, I must get my blogroll/links list ready at some point….

[source]

Bookshelf remodelling

by M. Parzakonis on August 18, 2010

I found time and read Gelman and Hill’s “Data Analysis Using Regression and Multilevel / Hierarchical Models“…Now, please do yourself a favour and get it (of course the paperback version ;) ). Even for experienced or intermediate (myself) this will be a treat for your eyes and neurons.

PS : (Confession) I didn’t like the Bayesian Data Analysis book.

Who’s joking?

by M. Parzakonis on August 12, 2010

A little joke from Gary Ramseyer’s collection of statistical jokes

A one-way anova shouted at a two-way anova: “stop! Turn around – you are going the wrong way!”
The two-way anova yelled back: “sorry! I will turn when i see an interaction!”

[source]

Read a new book…

by M. Parzakonis on July 31, 2010

From the book website :

IPSUR stands for Introduction to Probability and Statistics Using R,
ISBN: 978-0-557-24979-4, which is a textbook written for an
undergraduate course in probability and statistics. The approximate
prerequisites are two or three semesters of calculus and some linear
algebra in a few places. Attendees of the class include mathematics,
engineering, and computer science majors.

Now, there is a new way to read R books (anyway, new to me!)

1
2
3
install.packages("IPSUR", repos="http://cran.r-project.org")
library(IPSUR)
read(IPSUR)

Guidance for the young ones…

by M. Parzakonis on July 13, 2010

I was flicking thru some papers I have printed in the last years and definetely Breiman’s is one of my favorite. A pragmatic and insightful reading for a new statistician (or data analyst if you prefer;)).

As I left consulting to go back to the university,these were the perceptions I had about working with data to find answers to problems:

(a) Focus on finding a good solution—that’s what consultants get paid for.
(b) Live with the data before you plunge into modeling.
(c) Search for a model that gives a good solution, either algorithmic or data.
(d) Predictive accuracy on test sets is the criterion for how good the model is.
(e) Computers are an indispensable partner.

Read more

Leo Breiman (2001), Statistical modeling: The two cultures, Statistical Science, 16:199-231 [pdf]

A robust Hotelling test…

by M. Parzakonis on July 12, 2010

Recently I was in need of testing a mean vector. I wrote a few lines of code in R and had it done perfectly. Hotelling test is one of the least interesting test to me. never really figured out why…

At that time I had some time to search more about it. One of the most common things to search for a test is a robust version of it (at least that’s what I search for!). A little search in the 3rd page of google results leads to the following :

One-sample and two-sample robust Hotelling tests with fast and robust bootstrap

The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modified into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey’s biweight function, is tuned by default to have a breakdown point of 50% and 95% location efficiency. This could be changed through the control argument if desired.

Robust Hotelling T2 test

Performs one and two sample Hotelling T2 tests as well as robust one-sample Hotelling T2 test.

The first uses MM and S estimators while the latter a Minimum Covariance Determinant one. You can get info on those on the links in the end of the post. What might be crucial to you is that MM/S estimators would be more time comsuming compared to MCD. A little demonstation is the following.. (more →)