Archive for August, 2007

The internet is Awesome

Thursday, August 30th, 2007

There was a stand up comedian who once made a joke about how we often misuse the word ‘awesome’. For instance, he said, we say things like, “This hot dog is awesome!” Awesome in the sense that it causes awe and wonder?

Anyhow, I want to say that the internet is truly awesome in it’s ability to deliver information and communication quickly, easily and exactly to whom it concerns.

Right at this moment I am watching a live lecture on Machine Learning by a German professor in Tuebingen, Germany. The awesome thing about it is that the picture is very clear and the audio very clear. Furthermore there is almost no skipping, pausing or buffering problems. It is like being in the next room and watching it via closed circuit tv.

Amazing!

This live lecture comes from a site, http://www.videolectures.net which has quite a few video lectures on a range of subjects, though the majority are on computer science/machine learning topics.

How many licks does it take….

Tuesday, August 28th, 2007

I’ve been doing a little of moonlighting these past couple of days while Malia’s been out of town. Actually it would better be described as hermitting…

While I’ve been in hiding I’ve spent a little (ok a lot) of time dusting off my stock prediction software and revamping it. After working on it I’ve come across an interesting problem.

(more…)

What are we Really Afraid of?

Tuesday, August 28th, 2007
scary robot picture

In the rash of movies and books published these days concerning the future, robots and Artificial Intelligence there is a common theme: super advanced computers become self-aware and resentful of their human “Masters”, becoming either deranged or justly embittered towards us. In some movies the robots become so irate that they go to war against us (The Matrix, iRobot, 2001: A Space Odyssey).

Is it even possible for computers to become self-aware? Evolutionists say, “Yes!’ After all, our brain is an incredible computer made up of 10 Billion interacting neurons and nothing experimentally or verifiably more. The creationists say, “No!”. After all , we are infused with life by the “breath of God”, and “made in His image”. Our brains, in the creationist view, are a platform for our “self” which has no experimental/physical properties.

We may never know whether computers can become intelligent or self-aware. Instead I’d like to touch on something else: why we keep making movies and books about murderous robots and villains who can’t escape their own crimes.

(more…)

quepash

Thursday, August 23rd, 2007

From deep in the heart of Texas, Kirsten writes short stories and prose that is both personal and spiritual. Very enjoyable reading.

Python Code for Downloading and Processing Stock Information

Wednesday, August 22nd, 2007

When I first started working on the stock market I needed a way to obtain information about stocks that I wanted to make predictions on. I knew that I could go to a site like http://finance.yahoo.com and download the individual .csv files which contain day by day records of the high price, low price, closing price, opening price and volume one by one for a selected date range. However downloading 100 separate files by hand once a day would get really old really quickly.

So with a little help from my friend Ray, and the powers of Python I put together the following script to do that automatically for me! Since I wrote this I found that there are a number of programs that do the same thing which are freely available…but I still like my script for it’s flexibility.

(more…)

R-Code for Simulating a Confidence Interval for the Difference in Two Binomial Random Variables

Wednesday, August 22nd, 2007

Hypothesis testing is quite common in statistics. Usually a hypothesis comes in the form: “Random Variable (i.e. statistic/measure) A is not different from Random Variable B at the p=.05 level”. We test the hypothesis that A is “not” different than B, because it is impossible to test whether A is the same as B when dealing with random variables. As my boss says, “Association does not imply causation.” However, if we can reject the hypothesis that A is not different than B then we can say that there is a statistically significant difference between A and B, which is what we ultimately want to show.

“Statistically significant” does not mean “a whole lot different from”, or “proved different” it means, “different enough that using the properties of the distribution of A and the distribution of B I can’t conclude from the given evidence that A and B have the same distribution or are generated by the same function.”

Who knew that significant could have such a nuanced meaning. The phrase “Significant other” really is much more meaningful under this connotation…i.e. my girlfriend/boyfriend is not generated by the same function as me (read “we aint brother and sister”)…

Anyhow, usually one will test a hypothesis by first assuming which distribution both A and B come from, work out the necessary math for their differential distribution, mark out a test statistic and badda bing arrive at confidence interval. (For a deeper explanation see http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#hypothtest ) The confidence interval is itself a random variable and is usually interpreted as an interval which contains the true measure x% of the time.

However, in most real cases the distribution of A and B are such that doing the required math for the differential distribution is rather cumbersome. Thus, being mathematicians (in my case an applied mathematician), we might look for an approximation or a shortcut.

For instance, in Bracoo’s post on Athletes and Lawlessness , he states that the population of NBA basketball players with a criminal record is 40% while the population of average US citizens with a criminal record is only 21% . It seems that this is a striking difference, but is it?. Furthermore, we can look a the lift in NBA criminal records as compared to the average US citizen, which will give us a percentage difference between the two:

percentgain = (NBA-US)/US = (.40-.21)/.21 = 90.4 %

Evaluating the claim that there is a 90.4% greater likelihood of an NBA basketball player commiting a crime than a US citizen could get quite sticky if we tried to work out the theoretical answer via mathematical statistics.

Using R we can quickly simulate, or approximate, a confidence interval and a hypothesis test.

First, let’s assume that the population of basketball players is 360 (30 teams with 12 players each) and that we have sampled them all and gotten honest responses. Furthermore we have sampled a truly random selection of US citizens, and gotten 360 true responses. We know that population percentages are distributed according to a binomial distribution.

NBA ~ iid Binomial(p=.4) with N=360

US ~ iid Binomial(p=.21) with N=360

Since Binomial r.v.’s look alot like Normal distributions when n is high (i.e. greater than ~ 30) we can approximate our above distributions with:

Binomial(p) with N ~ Normal(p, p(1-p)/N)
NBA ~ iid Normal(.4, .4(1-.4)/360)
US ~ iid Normal(.21, .21(1-.21)/360)

The following R-code will quickly produce a .95 confidence interval (significance at the p=.05 level)


NBA = rnorm(1000, .4, sqrt(.4*(1-.4)/360))

US = rnorm(1000, .21, sqrt(.21*(1-.21)/360))

lift = (NBA-US)/US

confidence.95 = quantile(lift, c(.025, .975))

print(confidence.95)

If our test statistic is above or below the two numbers given then we can say that statistically the lift between NBA criminal record rates and US citizen rates is significantly different than the test statistic.

In this case, our confidence interval for our lift metric is: 52% and 143% . Since our test statistic (0 = (NBA-US)/US => they are the same) falls outside the interval (52%, 143%) then we can say that at the p=.05 level, we cannot ascribe the difference between NBA crime record rates and US general population rates to random variance in the population.

The advantage to simulating confidence intervals is that with relatively low error, and little time we can get good estimation of a distribution that would otherwise be very difficult and time consuming to calculate by hand. If we wanted to publish these results, we would likely need to do the real math. But for a quick result the above method will usually suffice.

R-Code for Generating a Cumulative Distribution Function

Wednesday, August 22nd, 2007

I was looking for a way to create a cumulative distribution function (CDF) in R today and for once, it doesn’t have something I’m looking for! Actually it’s more likely that I just wasn’t looking for the right thing. Anyways, I figured out how to produce a nice little plot of the CDF. I’m sure that you could generate a nicer one with some interpolation and such but in case you need a quick one here goes.

Update: A nice visitor to the blog showed me that indeed I had overlooked a very simple way to do what I will illustrate in less elegant code immediately after this note.

# x is a vector of items that you wish to find the CDF for
plot(ecdf(x))

End Update

And now for the less elegant way of doing it.

# x is a vector of items that you wish to find the CDF for
x.hist = hist(x, plot=FALSE, breaks=100)
x.counts = x.hist$counts
x.mids = x.hist$mids
x.cdf = cumsum(x.counts)/sum(x.counts)
#and plot it
plot(x.mids, x.cdf, type="s", main="title", xlab="value", ylab="cumulative probabilities")

You could define your histogram of x to be more or less detailed (breaks=n) and it will still plot correctly.

(more…)

Mystical Mathematics

Tuesday, August 21st, 2007

Pythagoras, yes the guy who made the Pythagorean theorem, was one of the first well known mathematicians to fall in love with Mathematics and form a cult. Yes a cult (see this very fascinating article on it http://en.wikipedia.org/wiki/Pythagoras#Pythagoreans). The Pythagoreans were well known for their mixing of religion and science, and avoiding beans. You could say that these guys were part of the original “Geek Squad”.

The Pythagoreans generally get credit for the theory about triangles: [tex=Pythagorean Theorem]c^{2}=a^{2}+b^{2}[/tex] where a and b are legs of the triangle and c is the hypotenuse of a right triangle. The Egyptians also knew about it…so did the Mayans. How else could they have built their incredibly square pyramids and temples? Pythagoras also believed that reality was numbers! He even believed that the planets moved according to equations and that music was simply mathematics. If one discovered these equations one would discover a beautiful planetary symphony!

Any student of mathematics has at one point been tempted to become a Mathematical Mystic. Many religious people have been tempted to use the “powers” of numbers to predict the future (like the Bible code people http://www.nmsr.org/biblecod.htm) Can anyone really love Math this much?

(more…)

Fun with Math

Monday, August 6th, 2007

Hello everyone.

And now…the moment you have been waiting for.

[tex=latex]LaTeX{}[/tex] is now available for use in your posts.

To write a mathematical formula you simply must use latex scripting and enclose it in tags like on the following page.

With latex, you can make things like

[tex=sum formula]\sum_{i=1}^{n}i=\frac{n(n+1)}{2}[/tex]

look really pretty!

Enjoy! I know I will.