Abstract/Summary
This blog post is about options for making dynamic documents in Stata using Markdown, discussing the options of StatWeave and a do file created from a user, knitr.do
. I will discuss some the capacities of these options and show options for custom use if you know how to use RMarkdown.
Knitr: Dynamic Documents
If you use R
, or even if you don't, you may have heard of the phrases “dynamic documents”, “reproducible reports”, “markdown”, “Rmarkdown”, or more specifically “knitting/knitr”. For pronunciation: according to the knitr
documentation in R
:
The pronunciation of knitr is similar to neater (neater than what?) or you can think of knitter (but it is single t). The name comes from knit + R (while Sweave = S + weave).
Now, if you haven't heard it, well I guess now you have. But more importantly, do some research on knitr. It's awesome, and there's even a Book from the author of knitr
, Yihui Xie and a corresponding GitHub repository. Also, you may want to read why you should care about reproducible results.
Overall, knitr
is a system that allows for dynamic documents, which I will define as files that contain code and prose/text/words/comments/notes.
Knitr Languages
Why am I talking about Stata? Well, I use Stata. Also, if you're using SAS, Python, CoffeeScript or some other languages, then knitr
has already incorporated these into the R
system: http://yihui.name/knitr/demo/engines/.
Let's just list some resources for doing some knitting in Stata:
-
There is a post on the Stata list-serv:
http://www.stata.com/statalist/archive/2012-07/msg00323.html -
And the corresponding github repo from that post (link fixed): https://github.com/amarder/stata-tutorial, which is intended to be run completely in Stata, which I will refer to as
knitr.do
. -
I tried that out, and a question discussion spurned on Stack Overflow on someone who asked me about it: http://stackoverflow.com/questions/20539177/markdown-in-other-statistics-packages-than-r/20573096
-
StatWeave (a precursor to
knitr
) is still a viable option from Russell Lenth from U Iowa http://homepage.cs.uiowa.edu/~rlenth/StatWeave/.
Now, I highly suggest taking a look at the github repo and knitr.do
and StatWeave. Actually, no. Stop reading this post and check it out. I can wait. Go. I'm going to talk about how to do this within R
.
And… We're back
So these options are good and are mainly options to create a markdown document that Stata will run/manipulate. This is vital for someone who doesn't know R
. Here are some notes I have:
-
knitr
has a lot of good options already made and is expanding. No inventing the wheel with respect to highlighting/parsing/etc. Also, a large community uses it. - I want to know one syntax for markdown. OK, maybe two, one for html, the other for LaTeX.
-
knitr.do
uses parsing based on indenting from what I can see. I like demarcating code; I feel like it's safer/robust. This could easily be changed if the user wanted it. StatWeave allows code demarcation by\begin{Statacode}
and \end{Statacode}. -
knitr.do
didn't seem to have an inline code option. StatWeave allows you to add inline text. For example, stating that the dataset had 100 rows and the maximum age was 30. StatWeave uses the Sweave syntax, but usesStataexpr
instead of\Sexpr
, so that you could fill in that 30 by using\Stataexpr{max(age)}
instead of writing 30. This is a huge capability for truly dynamic documents. - StatWeave is maintained mainly, I believe, by one person (Russell Lenth). This is how
knitr
started in some capacity before it became more popular, but it was built upon a community-used systemR
that had a pre-existing software that was similar (Sweave
). Hence, I thinkknitr
has more longevity and more learning capital compared to either option. Also, StatWeave (or its functionality) may be integrated intoknitr
. - StatWeave can only be written in LaTeX syntax (since OpenOffice bug precludes it from making odt docs).
knitr.do
can do markdown, which can be converted to pdf, docx, html, or many other formats usingpandoc
. - Neither option allows for automatically saving and naming plots in any system I can see. This must be done in Stata code using normal graph saving methods, e.g.
graph export
. -
knitr.do
inherently uses logs. I can't really determine what StatWeave uses because it's written in Java.
Now, I'm going to assume how to use knitr
and see how we could do some reporting using knitr
.
99 Problems and they're Stata problems
If you are running knitr
from R
, again, Yihui has incorporated a lot of other languages to process. What are some potential problems with processing the same way in Stata?
- Stata is inherently just a command line, but when you call it, it calls a GUI if you don't have Stata(console). More on Stata(console) that later.
- In order to use Stata from the command line, you probably need to put the path to Stata in your PATH variable: http://www.stata.com/support/faqs/mac/advanced-topics/. For example, the path
/Applications/Stata/Stata.app/Contents/MacOS/
is in my PATH, so that I can go to the Terminal and typeStata
. (Side note: this is the way to start multiple Stata sessions on a Mac). Let's assume you didn't do this though.
Let's just make a test .do file:
cat Stata_Markdown.do
clear disp "hello world!" exit, clear
Now how to run it? Let's use bash
, which is supported by knitr
. So I just have in my knitr
code chunk options, engine='bash'
. Don't forget comment=""
if you don't want #
to be printed (which is the default comment character).
stata -b Stata_Markdown.do echo $? ### print last return
127
Since echo $?
is supposed to print 0 if there is no error, there was an error. Worse off, there was a silent error in the sense it didn't print a message of error as output for bash. This error occurs because my bash
doesn't have a stata
or Stata
command. We can either make aliases in .bash_profile
or .bashrc
or again put Stata
in my path, but let's just be explicit about the Stata command by using the full path: for me, it's /Applications/Stata/Stata.app/Contents/MacOS/stata
. We also don't see anything from the log file, which makes sense because nothing happened.
- But a real problem is the Stata log file is not made in a “timely” manner in this process. Let's rerun the code with the full path for Stata:
/Applications/Stata/Stata.app/Contents/MacOS/stata -b "Stata_Markdown.do" echo $? cat Stata_Markdown.log
0 ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 11.2 Copyright 1985-2009 StataCorp LP Statistics/Data Analysis StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) 35-student Stata lab perpetual license: Serial number: 30110513240 Licensed to: Biostat Johns Hopkins University Notes: 1. 10.00 MB allocated to data 2. Stata running in batch mode . do Stata_Markdown.do . clear . disp "hello world!" hello world! . exit, clear end of do-file
-
Success! Well, it worked by the error being 0, but not really a “success” as nothing was printed. So what does this code for running Stata mean?
/Applications/Stata/Stata.app/Contents/MacOS/stata
says “run stata”-b
says I want to run in “batch mode”, which is much different than “beast mode”.Stata_Markdown.do
filename I want to run
Now, if there was a space in the path to Stata, it needs to be quoted with"
. But IMPORTANTLY, the Stata console came up and I had to hit “OK”, INTERACTIVELY!! Not very automated, but we'll fix this in a moment.
-
But what about the
cat Stata_Markdown.log
, which is auto-generated by the Stata command? Was the log empty?cat Stata_Markdown.log
___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 11.2 Copyright 1985-2009 StataCorp LP Statistics/Data Analysis StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) 35-student Stata lab perpetual license: Serial number: 30110513240 Licensed to: Biostat Johns Hopkins University Notes: 1. 10.00 MB allocated to data 2. Stata running in batch mode . do Stata_Markdown.do . clear . disp "hello world!" hello world! . exit, clear end of do-file
WHAT? Running the command again gives us what we want? Now, we can either do 2 code chunks, but if we set the
results='hold'
option inknitr
, then things work fine.- You can get around this unwanted “interactivity” using the console version of Stata, but I didn't set it up and Stata for Mac says:
> Can I display graphs with Stata(console)?
> No. Stata(console) is a text-based application and has no graphical display capabilities. However, it can generate and save Stata graphs, which can then be viewed with Stata(GUI). Stata(console) can also convert Stata graphs to PostScript and save them as files.
- You can get around this unwanted “interactivity” using the console version of Stata, but I didn't set it up and Stata for Mac says:
Also, Stata(console) for Mac needs Stata/SE or State/MP (aka more costly Stata) according to Section C.4 Stata(console) for Mac OS X. So for most users you'd have to buy a different Stata.
- Another way of getting around this interaction would be having Stata auto-exit; let's do that. Exiting Stata is possible without having interaction with a specific option when you exit, so you have
exit, clear STATA
. Let's look at our new scriptStata_Markdown_Exit.do
:
cat Stata_Markdown_Exit.do
clear disp "hello world!" exit, clear STATA
Now let's run it.
/Applications/Stata/Stata.app/Contents/MacOS/stata -b "Stata_Markdown_Exit.do" echo $? cat Stata_Markdown_Exit.log
0 ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 11.2 Copyright 1985-2009 StataCorp LP Statistics/Data Analysis StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) 35-student Stata lab perpetual license: Serial number: 30110513240 Licensed to: Biostat Johns Hopkins University Notes: 1. 10.00 MB allocated to data 2. Stata running in batch mode . do Stata_Markdown_Exit.do . clear . disp "hello world!" hello world! . exit, clear STATA
It looks the same as before with no output, but I did not have to interact with Stata. Note: if you use &
at the end of the command, the echo $?
will come up zero, because bash
will see it a background process.
But I don't want to show the whole script all the time
You may notice that I printed with cat
the entire log that was created with Stata. Honestly, I don't like Stata logs. They seem like a nuisance. I have a script and can make outputs, so do I why need a log? But here, it seems useful. But what happens when you want to show parts of a script at different points? You can obviously make a series of .do files. Not really a good solution.
What's a better solution? Create logs in your Stata code and then cat
them to different code chunks. Here's an example:
cat Stata_Markdown_logs.do
clear log using print_hello.log, replace disp "hello world!" log close log using run_summ.log, replace set obs 100 gen x = rnormal(100) summ x log close exit, clear STATA
/Applications/Stata/Stata.app/Contents/MacOS/stata -b "Stata_Markdown_logs.do"
Now, since print_hello.log
, and run_summ.log
were created, I can just do:
cat print_hello.log
-------------------------------------------------------------------------------------------------------- name: <unnamed> log: /Users/muschellij2/Dropbox/Public/WordPress_Hopstat/Stata_Markdown/print_hello.log log type: text opened on: 11 Jan 2014, 18:20:29 . disp "hello world!" hello world! . log close name: <unnamed> log: /Users/muschellij2/Dropbox/Public/WordPress_Hopstat/Stata_Markdown/print_hello.log log type: text closed on: 11 Jan 2014, 18:20:29 --------------------------------------------------------------------------------------------------------
and then later print:
cat run_summ.log
-------------------------------------------------------------------------------------------------------- name: <unnamed> log: /Users/muschellij2/Dropbox/Public/WordPress_Hopstat/Stata_Markdown/run_summ.log log type: text opened on: 11 Jan 2014, 18:20:29 . set obs 100 obs was 0, now 100 . gen x = rnormal(100) . summ x Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x | 100 100.0006 1.061928 97.11491 101.8377 . log close name: <unnamed> log: /Users/muschellij2/Dropbox/Public/WordPress_Hopstat/Stata_Markdown/run_summ.log log type: text closed on: 11 Jan 2014, 18:20:29 --------------------------------------------------------------------------------------------------------
No header/footer from log
This works, but you have a header and footer, that you probably can't delete with some simple option. Now, obviously you can read them in R
and do string manipulation and then print them back out, but that's a little convoluted. Regardless, I wrote a simple function in R
that will do it (R code):
catlog <- function(filename, runcat = TRUE, comment = "") { x = readLines(filename) lenx = length(x) x = x[7:(lenx - 6)] writeLines(x, filename) if (runcat) cat(x, sep = "\n") } catlog("run_summ.log")
. set obs 100 obs was 0, now 100 . gen x = rnormal(100) . summ x Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x | 100 100.0006 1.061928 97.11491 101.8377
which simply drops the first 6 and last 6 lines of the log. Thus, you can then print it totally using R
or then just use the saved log file can print it using cat
from bash:
cat run_summ.log
. set obs 100 obs was 0, now 100 . gen x = rnormal(100) . summ x Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x | 100 100.0006 1.061928 97.11491 101.8377
or in bash
, one example would be:
nlines=`awk 'END{print NR}' print_hello.log` nhead=`expr $nlines - 6` ntail=`expr $nlines - 12` head -$nhead print_hello.log | tail -$ntail
. disp "hello world!" hello world!
or even better, let's make a function for bash
that will do it:
catlog () { nlines=`awk 'END{print NR}' $1` nhead=`expr $nlines - 6` ntail=`expr $nlines - 12` head -$nhead $1 | tail -$ntail } catlog print_hello.log
. disp "hello world!" hello world!
OK – I can see the allure of using StatWeave in some capacity at this point. But still, if you use knitr
, this may make sense or the way you want to do it without going to StatWeave.
Cleanup
You can just do some .log clean up using:
rm *.log
if you want to delete all logs in your folder (assuming you never changed directories).
Thoughts
You can do “markdown” in Stata. My thoughts:
1. It's complicated.
2. The knitr.do
file is a good start and great that's it's totally within Stata (you still need a Markdown converter), but doesn't have code demarcation. It also doesn't do inline commands, which are a requirement for a dynamic doc, so you don't have to fill in the numbers and can do it dynamically with code
3. StatWeave has more functionality than knitr.do
and inline functions, but uses added software (a Java program), and can't do general markdown; the user needs to understand LaTeX.
3. Plotting hasn't really been integrated. You can always do a graph export myplot.pdf, as(pdf)
(on Mac) or whatever and then just put in <img src="myplot.pdf">
in your html, or \includegraphics{myplot.pdf}
in LaTeX, but that's not as integrated as it is in other systems.
4. If you make it to a Markdown document, you can use the great pandoc to potentially then just make it a Word .doc.
5. It will likely be integrated in the future. The question is how close is that “future”?
Conclusion
I like both options for respective pieces but my main concern with either option is putting in a lot of time for these and then they becoming obsolete with knitr
integration. That's not a big problem, since I know knitr
but something to think about for someone who doesn't. My recommendation, if you know and want to use LaTeX or need inline numbers, go with StatWeave. Otherwise knitr.do
may do the trick. Also, I've given you some directions on “growing your own”, which is the most customizable for you but even worse with respect to time, reinventing the wheel, and no support from others.
Anyway, those are the current options I know about when doing Markdown with Stata.
I was just thinking about a similar thing the other day! I think, at least.
Thanks for taking the time to put this all together.
I’m still a bit of newbie to all of this, so I apologise if my question seems a bit silly. It seems that if I’m looking for a way to write reproducible documents that call both R and STATA code, then StatWeave will be the best option – but it seems possible (maybe) that knitr will have STATA functionality in the future?
Cheers!
It seems possible, and I just asked Yihui how to implement a engine = “bash” in knitr.
For now, incorporating both sets of code I feel as though there are a few options:
1. Statweave
2. Use knitr and have a .do file that does everything you need to in Stata and either use logs or file write (Stata commands) to a generic text file that you can read into R. Call this Stata file from bash in R.
3. Use a Makefile-type of system (most advanced).
Before undertaking any of these options, obviously make sure you need to incorporate Stata.
Good luck.
John
Thanks for your reply John,
I think in the mean time I’ll be using Statweave, as we really need the dynamic documentation. I just need to sink some time into using Statweave.
Hopefully STATA will be added into the the engines that knitr can run…it would save a great deal of time!
slightly off topic, but can you recommend any good tutorials for estout, outreg2, or tabout?
Cheers,
Nick.
Nick – at the moment I can’t think of any.
I tend to use estout (actually esttab, which wraps for estout) and estpost functions to do my tabling in Stata to LaTeX, but tend to tweak them a lot still. I would likely pick just one and go with it.
What are you trying to do? If I know, I can write a quick tutorial post on it.
Cheers,
John
Pingback: Tools for statistical writing and reproducible research | The Incidental Economist
how about markdoc, ketchup, and weaver ssc package for dynamic documents?
and rsource ssc packages for R integration?
You might find https://pypi.python.org/pypi/Statpipe/ useful… it takes some of the pain away from calling Stata with the right params to avoid popups etc.
Knitr now has a stata engine so that you can run your stata code from within RStudio just as you would with R! However, it runs Stata in batch mode so figures would have to be saved to png or whatever and, perhaps more awkwardly, stata “forgets” everything that has happened in previous chunks (I haven’t yet tried out caching though…) And because of that, I doubt whether the inline text would work to create truely dynamic documents…. But in any case, it’s an exciting development I think….
You might want to look at the Stata command
markstat
available through the SSC archive and fully documented at http://data.princeton.edu/stata/markdown. You prepare a script that has a Markdown narrative and Stata commands, much likermarkdown
combines Markdown and R, It can produce HTML as well as PDF via LaTeX from the same script. A lot of the power comes from using Pandoc under the hood. Upcoming updates will generate presentations and will also run R code.from this source http://exotic-video-chat.ru/news/Shustrie-ogurci-319.html