Be Careful with Using Model Design in R

Posted on June 26, 2014 by strictlystat

In R, useful functions for making design matrices are model.frame and model.matrix. I will to discuss some of the differences of behavior across and within the two functions. I also have an example where I have run into this problme and it caused me to lose time.

Using `model.frame` for a design matrix

Whenever I use the word “design” I mean the sytematic part of a model; in this case, linear models. For example, if you say

$\displaystyle Y = X\beta + \varepsilon$

I'm referring to the $X$ as the design.

model.frame creates a design data.frame of the covariates given, keeping any factor variables as factors with the same levels. Let's create a toy data.frame called df, where Y is a normal random variable linearly related to two variables in the dataset:

n = 100
df = data.frame(X1 = rnorm(n), 
                X2 = rpois(n, lambda = 5), 
                X3= rnorm(100, mean = 4, sd = 2), 
                Sex = factor(rep(c("Male", "Female"), each = 50)))
df$Y = with(df, X1 + 3*X2 + rnorm(100, sd = 10))

Now, if Y is included on the left hand side of the formula, then it is included in the output of model.frame as such:

model.df = model.frame(Y ~ X1 + X2 + X3 + Sex, data=df)
head(model.df, 2)

       Y      X1 X2    X3  Sex
1  9.223  0.3849  2 5.960 Male
2 12.467 -0.5061  5 1.651 Male

This gives you a data.frame with the outcome and the covariates fitting that outcome (not including an intercept).

If Y is not included on the left hand side of the formula:

model.df2 = model.frame(~ X1 + X2 + X3 + Sex, data=df)
head(model.df2, 2)

       X1 X2    X3  Sex
1  0.3849  2 5.960 Male
2 -0.5061  5 1.651 Male

then we see that Y is not included in the output of model.frame. Thus, if you want to create a “design data.frame”, then you likely will want to remove Y from the formula.

Note, in both cases, we see that there is no intercept term added to the data.frame and nothing is done to factor variables.

Using `model.matrix`

Most cases I'm making model design elements is using model.matrix to then use matrix multiplications to make procedures faster or do “smarter” (i.e. fewer) computations. I will discuss the differences between model.frame and model.matrix using our toy dataset and also dicuss one gotcha) for using model.matrix and lm.

Let's use model.matrix with and without Y on the left hand side of the formula.

model.mat = model.matrix(Y ~ X1 + X2 + X3 + Sex, data=df)
model.mat2 = model.matrix(~ X1 + X2 + X3 + Sex, data=df)
all.equal(model.mat, model.mat2)

[1] TRUE

We see that using any element on the left hand side doesn't affect the output of model.matrix. Difference #1 from model.frame.

Let's look at the output from model.matrix.

head(model.mat, 3)

  (Intercept)      X1 X2    X3 SexMale
1           1  0.3849  2 5.960       1
2           1 -0.5061  5 1.651       1
3           1 -1.3739  3 3.197       1

We see a column was added named (Intercept) with a column of ones for the $\beta_0$ usually in a model. Difference #2 from model.frame. Also, we see that our factor Sex was converted to an indicator (numeric) variable. Difference #3 from model.frame. We only have 2 levels in Sex in this example. In general, a factor with L levels will generate L - 1 indicator variables using model.matrix.

Review over, how did this affect me?

I wanted to discuss the differences above to note them if you haven't seen them before. Also, I want to show that using model.matrix and a -1 or 0 in a formula can affect how some of your results are calculated using linear models with lm. Running the model with our now-ready model matrix:

mod = lm(df$Y ~ model.mat)
summary(mod)

Call:
lm(formula = df$Y ~ model.mat)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.644  -8.617   0.448   7.648  30.245 

Coefficients: (1 not defined because of singularities)
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -3.1871     3.9533   -0.81     0.42    
model.mat(Intercept)       NA         NA      NA       NA    
model.matX1            1.1894     1.0987    1.08     0.28    
model.matX2            3.6243     0.5790    6.26  1.1e-08 ***
model.matX3            0.0164     0.5422    0.03     0.98    
model.matSexMale       2.6174     2.2726    1.15     0.25    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.2 on 95 degrees of freedom
Multiple R-squared:  0.312, Adjusted R-squared:  0.283 
F-statistic: 10.8 on 4 and 95 DF,  p-value: 3.03e-07

We see that the intercept term created in model.matrix was made NA because it's identical to the intercept term inherently generated by R and is linearly dependent. This is also seen with the warning: “(1 not defined because of singularities)”. This is good to know, but not revelatory or new; just be aware.

When `model.matrix` goes … differently

Well model.mat already has an intercept, so why not just take out the intercept term with a -1? The model should be the same, right? I would assume this is the case, but let's do it:

mod.noint = lm(df$Y ~ model.mat - 1)
summary(mod.noint)

Call:
lm(formula = df$Y ~ model.mat - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.644  -8.617   0.448   7.648  30.245 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
model.mat(Intercept)  -3.1871     3.9533   -0.81     0.42    
model.matX1            1.1894     1.0987    1.08     0.28    
model.matX2            3.6243     0.5790    6.26  1.1e-08 ***
model.matX3            0.0164     0.5422    0.03     0.98    
model.matSexMale       2.6174     2.2726    1.15     0.25    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.2 on 95 degrees of freedom
Multiple R-squared:  0.717, Adjusted R-squared:  0.702 
F-statistic: 48.1 on 5 and 95 DF,  p-value: <2e-16

We see the intercepts look exactly the same (except we have removed the NA). But note the r.squared, adjusted.r.squared and F-statistic values!

Let's focus on $R^2$ :

summary(mod)$r.squared

[1] 0.3121

summary(mod.noint)$r.squared

[1] 0.7168

These are different – way different – which seems off. Why? If you look into the summary.lm code, you will notice a some of statements involve the expression:

attr(z$terms, "intercept")

and calculate quantities differently depending on whether it flags that test as TRUE or FALSE.

Let's look our two models again from model.matrix:

attr(mod$terms, "intercept")

[1] 1

attr(mod.noint$terms, "intercept")

[1] 0

We see that when you construct the intercept yourself, this code evaluates to FALSE, even though the model has an “intercept”. The model has an intercept, but R hasn't assigned it that attribute. This effects the calculation of the model sum of squares (from summary.lm):

mss <- if (attr(z$terms, "intercept")) 
sum((f - mean(f))^2)
else sum(f^2)

as well as others. So be aware of this behavior.

Conclusion

I was writing something for a linear model that allowed me to compute a large number of regressions (> 1,000,000) on a matrix of outcomes with a fixed adjustment matrix and changing 1 piece of the design matrix. I was doing a voxel-on-scalar regression with covariate adjustment, but also wanted to incorporate the ability to compute the results on a matrix of permutations of Y.

Either way, I ran into a problem checking my results against the output from lm and it took a while to see why the r.squared values were different but all other elements the same. I realized that this was because I was constructing my own design matrix using model.matrix and was using -1 when running lm and those results were not being calculated correctly. Hope you don't run into this problem ever.

Aside: What I wanted to do

Just to be precise, my model was:

$\displaystyle Y = X\beta + Z\theta + \varepsilon$

where $X$ was $n (n=100) \times V (V=100,000)$ , $Z$ was $n \times p (p=5)$ and $Y$ was $n\times 1$ , but wanted to be run $1000$ times with different permutations. If I simply wanted p-values, I could switch $X$ and $Y$ to get those and run $1000$ lm commands versus running $100,000$ lm commands. (Doing this is not efficient – solving matrix inversions is time-consuming and should not be redundant). I wanted $R^2$ values and $\beta$ coefficients as well, so I needed something more powerful. I know packages such as vows or limma can do these regressions – but they are usually when the design is fixed and not changing for every voxel and usually the end result is a p-value.

I have it working and may release it into the wild soon. Let me know if you know of anything that will do this, including covariate adjustment and where you can run for a matrix of permuted $Y$ values.

fslr: An R Package Interfacing with FSL for Neuroimaging Analysis

Posted on June 17, 2014 by strictlystat

I use a set of neuroimaging tools, but my language of choice is R. FSL, which is from the University of Oxford's Functional MRI of the Brain (FMRIB), and stands for FMRIB Software Library, is one tool I commonly use. I wrote some wrapper functions into an R package called fslr and wanted to discuss some of the functions.

FSL – what is it?

FSL is has a command line interface with shell-type syntax as well as a GUI (which I generally don't use). It has a lot of functions that are good for general imaging purposes – fslstats and fslmaths have loads of functionality. Using FSL during your pipeline is fine, but I don't like switching between shell and R too much in an analysis and like to have scripts that don't jump around too much, so I created fslr.

One of the main problems I have is that I read an image into R, manipulate it in some way, and then want to use an FSL function on that manipulated. This presents a problem because I have an R object and not a NIfTI file, which FSL expects. I also want the result back in R, not in a file that I have to then find. fslr bridges the gap between R and FSL.

fslr functionality

fslr is implicitly linked to the oro.nifti by Brandon Whitcher. The workflow is this:
1. Read NIfTI data into R using readNIfTI from oro.nifti. This is now an R object of class nifti.
2. Manipulate the data in some way. Maybe you applied a mask or changed some values.
3. Pass this nifti object to a function in fslr, which will write the object to a temporary NIfTI file, run the FSL command, and read the result into an R nifti object again using readNIfTI.

From the user's perspective, it's how it always is in R: pass in R object to R function and get out R object.

Alternatively, if the user specifies a filename of a NIfTI file instead of passing in an R object, fslr commands will simply run the FSL command without the file ever having to be read into R, yet the result can still be returned in R as an R object. This method may be much faster, especially if the image for the function to operate on is on the hard disk (otherwise it would be read in and written out before an FSL command was run).

`outfile` and `retimg` options

For any function that has an image output should have two arguments: outfile and retimg. The argument outfile is default set to NULL, which will create a temporary file using tempfile(), which will be deleted when your R session is terminated. Alternatively, you can specify a path to the output file if the user wants it saved to disk.

The retimg argument is a logical indicator for whether you want the output read into R and returned (get it – return image). If retimg=FALSE and outfile=NULL, then the function will error as the outfile will be a temporary file and deleted and no image will be returned. Thus, the user will not ever be able to use the output image from the function which I believe is done in error.

Function call-outs

All functions use the system command in R to execute FSL commands. If you are using an R GUI instead of the command line in a shell, then you will want to specify options(fsl.path=). If you are using the command line and FSL is in your PATH, the path to FSL will be found using Sys.getenv("FSLDIR") as your shell evnironment variables will be available. The function have.fsl() provides a logical check as to whether you can run an fslr command. I have not tested this on a Windows machine. I did not use system2 because I ran into some problems, but may want to change this in a future release for portability. Let's look at an example of using fslr

Example of `fslr` commands

Let's check to make sure we have FSL in our PATH:

library(fslr)
options(fsl.path="/usr/local/fsl")
have.fsl()

## [1] TRUE

Similarly to fsl.path, you should specify an output type, usually NIFTI_GZ. See here for more information about output type.

options(fsl.outputtype = "NIFTI_GZ")

Reading in Data

Here I'm going to read in a template T1 MRI brain image (no skull), with 1mm resolution and visualize it.

fname = file.path( getOption("fsl.path"), "data", "standard", "MNI152_T1_1mm_brain.nii.gz")
img = readNIfTI(fname)
print(img)

## NIfTI-1 format
##   Type            : nifti
##   Data Type       : 4 (INT16)
##   Bits per Pixel  : 16
##   Slice Code      : 0 (Unknown)
##   Intent Code     : 0 (None)
##   Qform Code      : 4 (MNI_152)
##   Sform Code      : 4 (MNI_152)
##   Dimension       : 182 x 218 x 182
##   Pixel Dimension : 1 x 1 x 1
##   Voxel Units     : mm
##   Time Units      : sec

orthographic(img)

plot of chunk readin

Smoothing

Let's smooth the data using a 5mm Gaussian kernel and view the results:

smooth = fslsmooth(img, sigma = 5, retimg=TRUE)
orthographic(smooth)

plot of chunk smooth

That example is a little boring in that this could be much easier done in FSL than using fslr. If the data is manipulated beforehand, and in an explorative way, it may be easier to see fslr's use. Let's Z-score the image and then keep the z-scores above 2.

Z-score and threshold

thresh.img = img
thresh.img = (thresh.img - mean(thresh.img))/sd(thresh.img)
thresh.img[thresh.img < 2] = NA
orthographic(thresh.img)

plot of chunk theshimg
What? An empty picture? That doesn't make sense. Looking at the histogram of thresh.img we see there is data:

hist(thresh.img)

plot of chunk thresh_hist

So what gives? The orthographic function from oro.nifti uses the slots cal_min and cal_max to determine the grayscale for the picture. fslr also has some helper functions to make it easier to rescale these values so that you can visualize the images again.

Setting `cal_min` and `cal_max`

thresh.img = cal_img(thresh.img)
orthographic(thresh.img)

plot of chunk thresh_ortho
There we go. Looks like some white matter regions. Now let's smooth these z-scores using a 4mm Gaussian kernel.

smooth.thresh = fslsmooth(thresh.img, sigma = 4, retimg=TRUE)
orthographic(smooth.thresh)

plot of chunk smooth.thresh

Now you can do all your fun statistics and be able to call FSL and keep everything in R! Also for most of the functions, there is a FUNCTION.help function that will print out FSL's help file:

fslhd.help()

## 
## Usage: fslhd [-x] <input>
##        -x : instead print an XML-style NIFTI header

But other packages exist!!

Some may think “Hmm I had a package with that functionality decades ago”. Other packages in R exist and have functions for images. I know this. But I believe:

R should be able to integrate other established and available neuroimaging software. Python is doing this everywhere and seems to be helpful for the Python part of the community.
Some of the functions the FSL package have are not implemented in R. What package do I call for skull-stripping an image like the BET from FSL? You can call fslbet by the way in fslr.
Some functions are faster in FSL than in some functions of R. Large 3D Gaussian smoothers take a long time in some packages of R.
It allows you to still your pipeline written in R even if it calls FSL or other languages.
If someone wrote it before, I don't want to write it again.

I am writing up a more comprehensive white paper hopefully in the coming weeks. Don't forget to check out other packages like ANTsR which is a wrapper for ANTS in R. I'm excited to start using it, especially for its MRI inhomogeneity correction.

Work to be done

I haven't implemented much of the software for FEAT, which is part of FSL's fMRI analysis pipeline. I hope to do that in the future, but would love feedback if people would want that integration. Always – feedback is welcome on other parts as well.

How to Write a Lot

Posted on June 15, 2014 by strictlystat

I recently just finished reading How to Write a Lot: A Practical Guide to Productive Academic Writing by Paul Silvia. Hilary Parker had recommended this book a few years ago and I just got around to reading it. I highly recommend it: it's not expensive ($10) on Amazon and it's free if you swing by my office and borrow my copy. In this post I wanted to summarize some of the key points and reflect on my experience after trying the strategies recommended.

Make a Writing Schedule and Stick to It

If you didn't read the section header, let me reiterate:

Make a Writing Schedule and Stick to It

Silvia argues that making a schedule and sticking to it is the only strategy that works for writing. Though this one statement summarizes the book's message, you should still read it. The title of the book denotes it as “A Practical Guide to Productive Academic Writing”. The book tells you not only that making a writing schedule and sticking to it is what you must do to write, but how to do it. In addition, chapter 2 – my favorite – list specious barriers to writing (aka excuses) that people (including me) make that stops them from even starting to write. This chapter helps you realize that no thing or person other than you is stopping you from writing.

Outline Your Writing

One of the things I've heard since grade school for writing that I still am not good doing is outlining. You wouldn't build a car, toy, or building without a schematic, concept, or blueprint. Write an outline first – it can change later – before the full text.

Make Concrete Goals and Track your progress

As a biostatistician I'm trained to look at data – all kinds of data.

Whenever someone makes a claim, I reflectively think “Where is the data to back up that claim?”. When I say I'm going to write more, I need data. Therefore, I have to track it, even if only for myself. Silvia promotes a database or Excel spread sheet for tracking. Though I tend to discourage Excel for data collection, Excel is not a bad option for this single-user single-use purpose. Pick something that's easy for you to use for tracking and format the data so it can be analyzed with statistical software. I will track my progress and may report the results in another post.

Similar to the stage of an outline for your manuscript, tracking your progress takes planning. At the beginning of a session, you must set your goals (plan) and record if you met those goals or not. The goals must be concrete. “Write X paper” is not concrete; “writing 100 words on paper X” is. You don't have to -and probably shouldn't- write 10,000 words in a session. Goals don't have have to be actual “writing”; doing a literature review, editing a paper, incorporating comments, or formatting are all part of the writing process. Your scheduled time is when you should do all of these parts of writing.

Start a Agraphia (Writing) Group

Writing is hard – friends help. Silvia calls the writing group an agraphia group as agraphia is the loss in the ability to communicate through writing. Peer pressure exists, even in graduate school; use it to your advantage. You are not alone your fear/disdain of writing and your fellow grad students are a valuable resource. Having a bunch of “Not met” goals on your progress sheet is different than telling someone that you didn't meet your goals 3 weeks in a row. No one wants feel like a failure, so this positive peer pressure will push you to perform.

Also, editing papers can be boring; you've been with the topic and paper for so long it's no longer exciting to you. To others, it's usually novel and easily seen as great work. Mistakes and unclear thoughts can be corrected. You may think something is clear, but fresh eyes can determine that for sure. Use your group to peer edit. You can use this editing to find out what your classmates/colleagues are doing in their research as well.

You (and the Rest of Us) will get Rejected

Your paper will likely get rejected. Now you can submit to the Journal of Universal Rejection, and you'll have 100% guarantee of rejection. For some journals, that may not be much higher than their actual rejection rates. If you get rejected, you're in the majority. Silvia notes that getting a paper back for revision is a good thing – it passed the level of flat-out rejection. I didn't always see it that way before. Moreover, Silvia says to write assuming your paper will be rejected. He says that this will make your writing less defensive and better. So you'll get rejected, but remember:

Take the criticism constructively – most reviewers want to make your paper better. Realize that.
Be quick and methodical with revisions. Revisions are higher priority than first drafts. Make sure you respond to all comments or explain why you haven't incorporated some reviewer's comments.
Don't let mean reviewers get you down. One quote I remember from a friend when I was younger that stuck with me: “I gotta be doing something right – cause I got HATERS!”. Let them fuel your hate fire. If you've ever heard the phrase “dust your shoulders off” and didn't know where it came from read this. Use If you can revise, incorporate their comments. Getting angry or writing angry letters just wastes time where you could be doing more writing on of your topics.
If it wasn't clear to the reviewers, it's not clear to the readers.

NB: What I could find for statistical journal acceptance rates: http://www.hindawi.com/journals/jps/, http://imstat.org/officials/reports/AnnualReports2010.pdf, and http://www.hsph.harvard.edu/bcoull/ENARJrWorkshop/XLPub2006.pdf.

Reflections

Time is a Zero-sum Game (or is it a flat circle?)

The time in a day is fixed and finite; each day is as long as the others. One of my friends and fellow Biostat grad students Alyssa Frazee likes to say frequently that “Time is a zero-sum game” in the sense that the activities we do now take up time that could be used for other activities.

As a result, many times I ask myself “When is it okay to relax?”. This feeling is common when I am writing a paper. Scheduling relieves much of the stress of when I am supposed to write. Meeting the goals for the days allows me to let go more easily and feel that it is OK to relax if my duties are done. There are fringe benefits to making a schedule.

I Write More

Again – I've only done it for about 2 weeks, but I feel as though I'm getting more done for my papers and writing more. The data will tell, and I don't know if I have a good comparison sample.

Don't Stop Writing

At one point, Silvia notes that you should award yourself when meeting goals but that award should NOT be with skipping a writing session. He likens it to awarding yourself with a cigarette after successfully not smoking for a period of time.

Conclusion

Creating a writing schedule is easy; sticking to it is hard. Try it for yourself and read his book. I think you'll be writing more than you think with this strategy. Let me know how things turn out!

Extra Links for Writing

Research Tips
Another outline of the book http://rfmh.nyspi.org/KAD/Portals/16/File/How%20to%20Write%20a%20Lot.pdf
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/web/how-to.html

Making Back-to-Back Histograms

Posted on June 10, 2014 by strictlystat

A colleage of mine asked me how to do back to back histograms (instead of on top of each other). I feel as though there should be a function like voilin plot from the vioplot package. Voilin plots are good for displaying data, but the violin must have the left and right (or top and bottom) of the violin to be from the same distribution, and therefore are symmetrical. Many times people want to compare two distributions.

Cookbook for R ) shows how to overlay histograms (or densities) on top of each other, so go there if that's what you want. (NB: that is the way I tend to compare distributions, especially more than 2. I provide the code below because some have different preferences.)

ggplot implementation:

library(ggplot2)
df = data.frame(x = rnorm(100), x2 = rnorm(100, mean=2))

g = ggplot(df, aes(x)) + geom_histogram( aes(x = x, y = ..density..),
                                             binwidth = diff(range(df$x))/30, fill="blue") + 
  geom_histogram( aes(x = x2, y = -..density..), binwidth = diff(range(df$x))/30, fill= "green")
print(g)

## Warning: Stacking not well defined when ymin != 0

plot of chunk unnamed-chunk-1

I simply simulated 2 normal distributions of 100 points and then plotted them. Not the ..density call in the aes for the histograms. This just scales the histogram to a density and not a count. The -..density.. flips the second histogram around zero so that they are back-to-back. We see that ggplot doesn't like stacking when you have negative data, but it's ok for this exmaple and don't overlap.

print(g + coord_flip())

## Warning: Stacking not well defined when ymin != 0

plot of chunk unnamed-chunk-2

Using coord_flip plots back-to-back histograms horizontally. This code can easily be extended using geom_density and actually a volcano plot version is in the help for stat_density.

Base implementation

Not everyone likes ggplot2 so I figured I would provide in implementation in base graphics.

## using base
h1 = hist(df$x, plot=FALSE)
h2 = hist(df$x2, plot=FALSE)
h2$counts = - h2$counts
hmax = max(h1$counts)
hmin = min(h2$counts)
X = c(h1$breaks, h2$breaks)
xmax = max(X)
xmin = min(X)
plot(h1, ylim=c(hmin, hmax), col="green", xlim=c(xmin, xmax))
lines(h2, col="blue")

plot of chunk unnamed-chunk-3

The code calculates the histograms for each distribution and stores the information. I simply take the negative number of counts to flip the histogram over the x-axis.

Go forth and prosper

You can adjust the axes to positive numbers, make more implementations with densities/etc, but this is a simple graphic I've seen people use. Hope this helps someone out.

Typinator: Text is Better Expanded

Posted on June 2, 2014 by strictlystat

Last year, Aaron Fisher spoke at a computing club about a text expander named Typinator. In the past year, I have used it for the majority of my LaTeX and math writing and wanted to discuss a bit why I use Typinator.

Seeing your Math symbols

The main reason I use Typinator is to expand text to unicode – symbols such as β instead of writing \beta in LaTeX. When I say “expand text” is I type a string that I set in Typinator and it replaces that string with the symbol or phrase that I designated as the replacement. I type :alpha and out comes an α symbol.

Why should you care

Writing \alpha or :alpha saves no time – it's the same number of characters. I like using unicode because I like reading in the LaTeX:

Y = X β + ε

instead of

Y = X \beta + \varepsilon

and “the β estimate is” versus “the $\beta$ estimate is”. I think it's cleaner, easier to read, and easier to edit. One problem is: unicode doesn't work with LaTeX right off.

`pdflatex` doesn't show my characters! Use XeLaTeX

Running pdflatex on your LaTeX document will not render these unicode symbols out of the box, depending on your encoding. Using the package LaTeX inputenc with a command such as \usepackage[utf8x]{inputenc} can incorporate unicode (according to this StackExchange Post), but I have not used this so I cannot confirm this.

I use XeLaTeX, which has inherent unicode support. In my preamble I have

\usepackage{ifxetex}
\ifxetex
  \usepackage{unicode-math}
  \setmathfont{[Asana-Math]}
\fi

to tell the compiler that I want this font for my math. I then run the xelatex command on the document and the unicode α symbol appears in the PDF and all is right with the world.

You can also incorporate xelatex in your knitr documents in RStudio by going to RStudio -> Preferences -> Sweave Tab -> Typset LaTeX into PDF using and change this option to XeLaTeX. Now you're ready to knit with unicode!

Other uses for Unicode than LaTeX

If you don't use LaTeX, this information above is not relevant to you but Unicode can be used in other settings than LaTeX. Here are some instances where I use Unicode other than LaTeX:

Twitter. Using β or ↑/↓ can be helpful in conveying information as well as saving characters or writing things such as 𝜃̂.
E-mail. Using symbols such as σ versus \sigma are helpful within Gmail or when emailing a class (such as in CoursePlus) for conveying information within the email compared to attaching a LaTeX'd PDF.
Word Documents. I don't like the Microsoft Word Equation Editor. By don't like I mean get angry with and then stop using. Inserting symbols are more straightforward and using a text expansion is easier than clicking them on the symbol keyboard.
Grading. When annotating PDFs for grading assignments, many times I use the same comment – people tend to make the same errors. I make a grading typeset where I can write a small key such as :missCLT for missing the Central Limit Theorem in a proof so that I type less and grade faster. Who doesn't want to grade faster?
Setting Variables. I don't do this nor do I recommend it, but technically in R you can use unicode to set a variable:

σ = 5
print(σ)

## [1] 5

My Typinator sets.

My set of Typinator keys that you can download and import into Typinator are located here.

Math Symbols for Greek and other math-related symbols. (This was my first typeset so not well organized.)
Bars for making bars on letters such as 𝑥̄.
Hats for making hats on letters such as 𝜃̂.
Arrows just ↑ and ↓ for now.

NB: GitHub thinks the .tyset file is a folder and not an object, so the .txt files are here for Math Symbols, Bars, Hats, and Arrows, which can be imported into Typinator.

If you comment, be sure to use a Unicode. symbol

A HopStat and Jump Away

Trying to at least Doggie Paddle through the Sea of Data, Contributor to http://bmorebiostat.com

Using model.frame for a design matrix

Using model.matrix

Review over, how did this affect me?

When model.matrix goes … differently

Conclusion

Aside: What I wanted to do

FSL – what is it?

fslr functionality

outfile and retimg options

Function call-outs

Example of fslr commands

Reading in Data

Smoothing

Z-score and threshold

Setting cal_min and cal_max

But other packages exist!!

Work to be done

Make a Writing Schedule and Stick to It

Outline Your Writing

Make Concrete Goals and Track your progress

Start a Agraphia (Writing) Group

You (and the Rest of Us) will get Rejected

Reflections

Time is a Zero-sum Game (or is it a flat circle?)

I Write More

Don't Stop Writing

Conclusion

Extra Links for Writing

ggplot implementation:

Base implementation

Go forth and prosper

Seeing your Math symbols

Why should you care

pdflatex doesn't show my characters! Use XeLaTeX

Other uses for Unicode than LaTeX

My Typinator sets.

Using `model.frame` for a design matrix

Using `model.matrix`

When `model.matrix` goes … differently

`outfile` and `retimg` options

Example of `fslr` commands

Setting `cal_min` and `cal_max`

`pdflatex` doesn't show my characters! Use XeLaTeX