Archive for the ‘Code’ Category

Problem with ‘Table Full’ for MyISAM DB?

Monday, November 3rd, 2008

If you need to expand your MyISAM table size for your MySQL DB becuase you simply have too much data then this is a nice quick tutorial on accomplishing your goal.

http://jeremy.zawodny.com/blog/archives/000796.html

Basically do the following

mysql> alter table your_table max_rows = 200000000000 avg_row_length = 50;

And that will increase it to its maximum size.

				

Ode to Linux (and Open Source)

Tuesday, September 16th, 2008

The trusty ol’ T43 got sick this weekend with the “Windows Anti-Virus 2008″ virus.  Try as I might, I could not get it off of Windows XP.  I tried everything and every software to no avail.  Luckily I didn’t decide to the buy the “anti-virus” software they were pitching to solve the “virsuses” they had detected on my computer (this particular virus is actually mal-ware which tries to convince you that you have viruses and to buy their software to fix it).

I read tales of woe of those who “bought” the software only to have their accounts drained.

But as I was giving up and cursing windows for making such an insecure Operating System (really? Windows, you couldn’t have prevented such an attack?), I remebered that ntaylor0909 had just given me a copy of Ubuntu Server 8.04….

So I wiped ‘er clean and am now up on Ubuntu Linux.

I have to say so far I’m quite impressed.  It’s free… it works… all the programs for it are free… and they work too… and I hear that it’s even quite secure.

Thank you Linux!

Fun with heliostats

Wednesday, September 3rd, 2008

While I was in Africa this summer I worked out how I could aim several mirrors at my old water heater.  I had already stripped off half the casing and foam insulation on the water and heater and painted it black knowing that I wanted to acheive some sort of solar water heating but had never really worked out how to position the mirrors to make it really hot!

So, being a mathematician I set about working out the requisite formulas which would help me to space out the mirrors and determine at which angle I should keep them.

4x Solar concentration

It turns out that instead of tracking the sun it’s actually quite simple to place the water heater perpendicular to the sun, and array the mirrors facing north.  Aligning the mirrors in the center with the long edge running perpendicular to the sun means that at midday there will be an additional 8 sq. ft of solar power to heat the water that will eventually be contained in the heater.

the middle two

In the first picture we can also see that there are 2 additional mirrors placed at 45 degrees in either direction to the middle mirrors.  These mirrors are testing my hypothesis that at about 11 am (a little past quarter day) and 3pm there will be some additional sunlight that could be caught.

Once I get a little more funding for this project I’ll add a few more mirrors (4-10) more and also start work on a smaller concentrator for my home office.

Later I’ll connect up some actual water pipes, and circulate the hot water into the house for our new radiant heating system that I’m hoping to install this fall.

If you’d like to code for your own project please contact me.

R code for crawling the web, and a BASH Unix script for stripping links from a document.

Sunday, December 16th, 2007

Suppose you have a document with a link on each line of the document.

Now suppose you want to download every link on that document and put them in a directory.

Then finally you may wish to take each document and strip the links from the document, retrieve each of those links etc. ad infinitum (or until your hard drive crashes).

The following two pieces of code should accomplish your tasks. You can download the R script which contains some extra NLP functions for R here, and the Unix BASH script here.

(more…)

Setting Up a MySQL connection in R and Windows XP

Wednesday, November 14th, 2007

This past weekend I was doing some work on Scroggles, building an automatic spam classifier and post categorizer. The blogging platform that Scroggles! uses runs on php and MySQL. Since building a web based (php) classifier wouldn’t work as well, easily or quickly as a classifier built in R which has many of the machine learning packages readily available, I decided to build something in R that interfaces with the MySQL database. It took some time to figure out how to get everything (R, my server, and MySQL) working together so I thought I would put together a how-to for getting MySQL and R to work together.

(more…)

R-code for reading in the last n lines of a csv file

Monday, October 29th, 2007

In R there are several ways to read a data file. There are ways to skip the first k lines…and only read n lines, but there seems to be no easy way to read the last n lines of a file. I faced this problem will doing some work this weekend and found that the following solution which invokes some UNIX features can be used.


file.name = "yourfile.csv" #.txt can work too
file.rows.wc = system(paste('wc -l ', filename, sep=""), intern=T)
file.rows = as.numeric(system('cut -d" " -f1', input=file.rows.wc, intern=T))

n = 30 # number of rows from the end we want
data = read.csv(file=file.name, header=T, skip=(file.rows-n), nrows=n)

Using this approach can save time and memory especially if the data to be read is large, and the data to be used is small.

R-Code for Generating a Cumulative Distribution Function

Wednesday, August 22nd, 2007

I was looking for a way to create a cumulative distribution function (CDF) in R today and for once, it doesn’t have something I’m looking for! Actually it’s more likely that I just wasn’t looking for the right thing. Anyways, I figured out how to produce a nice little plot of the CDF. I’m sure that you could generate a nicer one with some interpolation and such but in case you need a quick one here goes.

Update: A nice visitor to the blog showed me that indeed I had overlooked a very simple way to do what I will illustrate in less elegant code immediately after this note.

# x is a vector of items that you wish to find the CDF for
plot(ecdf(x))

End Update

And now for the less elegant way of doing it.

# x is a vector of items that you wish to find the CDF for
x.hist = hist(x, plot=FALSE, breaks=100)
x.counts = x.hist$counts
x.mids = x.hist$mids
x.cdf = cumsum(x.counts)/sum(x.counts)
#and plot it
plot(x.mids, x.cdf, type="s", main="title", xlab="value", ylab="cumulative probabilities")

You could define your histogram of x to be more or less detailed (breaks=n) and it will still plot correctly.

(more…)