Using Stata on a Cluster

So I use Stata for things.  Sometimes I have to (my job, other programs I get), sometimes I like to (survival, LDA, mixed effects models).  I also like doing things on clusters, and sometimes in parallel.  These don’t always mix, but if you use in your Stata code

local taskid : env SGE_TASK_ID

then you can do (embarrassingly) parallel computing!

with a shell script like, where ZZ is the number of things you want to run:


#$ -t 1-ZZ

stata -b do


Here are the results from my Gmail Meter. (I didn’t post the link to my report because some threads are sensitive in nature – surprise party maybe?) Overall, I’m sending most email out after noon up until 6PM with a slight increase after midnight, but don’t really send out much in the morning (classes take care of that a lot).

Also – the majority of my emails are sent out on Monday, with decreasing responsiveness with almost none sent on Friday (general meeting day), and marginal email response on weekends.

This is really cool, but I wish I had more metrics that I could work with (like break all of this down my label/work email vs. not and such). I guess the one option is to have separate gmails for each, but I like one big mess so I don’t check all over.

So it didn’t send me crazy spam, but I know some people were put off by the big red warning, so I wanted to show it didn’t frag my machine or email (well not yet at least anyway).

Reading in 311 Data for Baltimore

So I was looking around at some Baltimore Data and Found you can read it into R and see the most recent posts on the 311 system.

I was trying to look at maybe how fast complaints are closed or how many complaints there are on average by neighborhood. The data is good, but only gives you the most 100 recent complaints. I would love to know if you can get access to the larger set (but it may be too large and would maybe have to use SQL or JQuery to get at).