Using Stata on a Cluster

Posted on February 21, 2013 by strictlystat

So I use Stata for things. Sometimes I have to (my job, other programs I get), sometimes I like to (survival, LDA, mixed effects models). I also like doing things on clusters, and sometimes in parallel. These don’t always mix, but if you use in your Stata code

local taskid : env SGE_TASK_ID

then you can do (embarrassingly) parallel computing!

with a shell script like, where ZZ is the number of things you want to run:

#!/bin/bash

#$ -t 1-ZZ

stata -b do Stata_Script.do

Share LaTeX

Posted on February 19, 2013 by strictlystat

Share LaTeX

Here’s an interesting and improved tool for collaboration with LaTeX online (like Google Docs)

Good Open Source Statistics Book

Posted on February 14, 2013 by strictlystat

An old colleague of mine has co-started/co-wrote a great site/book on an introduction to statistics – with examples and code! If you’ve never seen it, it’s worth checking out at http://www.openintro.org/

@hspter inspired me

Posted on February 14, 2013 by strictlystat

So after discussion with @hspter and 10 seconds of googling, I decided to maybe switch to wordpress. We’ll see how the transition goes (if at all) in next few days.

Now with Comments!

Posted on February 14, 2013 by strictlystat

Thanks Leo! Apparently disqus is the easiest commenter ever.

Posted on February 13, 2013 by strictlystat

Here are the results from my Gmail Meter. (I didn’t post the link to my report because some threads are sensitive in nature – surprise party maybe?) Overall, I’m sending most email out after noon up until 6PM with a slight increase after midnight, but don’t really send out much in the morning (classes take care of that a lot).

Also – the majority of my emails are sent out on Monday, with decreasing responsiveness with almost none sent on Friday (general meeting day), and marginal email response on weekends.

This is really cool, but I wish I had more metrics that I could work with (like break all of this down my label/work email vs. not and such). I guess the one option is to have separate gmails for each, but I like one big mess so I don’t check all over.

So it didn’t send me crazy spam, but I know some people were put off by the big red warning, so I wanted to show it didn’t frag my machine or email (well not yet at least anyway).

Gmail Meter – Mail Metrics

Posted on February 12, 2013 by strictlystat

I was looking around, trying to figure out an estimator of the data in my inbox (thinking maybe a Cox PH model for “Time to Read Email” or Kaplan Meier Cruve), and came across this
http://www.gmailmeter.com/

Looks pretty good -building my report now. It may take 30-90 minutes to do so! I hope they allow you to export some of the data an d to play with yourself.

Reading in 311 Data for Baltimore

Posted on February 12, 2013 by strictlystat

So I was looking around at some Baltimore Data and Found you can read it into R and see the most recent posts on the 311 system.

I was trying to look at maybe how fast complaints are closed or how many complaints there are on average by neighborhood. The data is good, but only gives you the most 100 recent complaints. I would love to know if you can get access to the larger set (but it may be too large and would maybe have to use SQL or JQuery to get at).

Code

A HopStat and Jump Away

Trying to at least Doggie Paddle through the Sea of Data, Contributor to http://bmorebiostat.com

Monthly Archives: February 2013

Using Stata on a Cluster

Share LaTeX

Good Open Source Statistics Book

@hspter inspired me

Now with Comments!

Gmail Meter – Mail Metrics

Reading in 311 Data for Baltimore