I've been doing some classification with logistic regression in brain imaging recently. I have been using the ROCR package, which is helpful at estimating performance measures and plotting these measures over a range of cutoffs.

The `prediction`

and `performance`

functions are the workhorses of most of the analyses in ROCR I've been doing. For those who haven't used `ROCR`

before, the format of the `prediction`

function is:

prediction(predictions, labels, label.ordering = NULL)

where `predictions`

are some predicted measure (usually continuous) for the “truth”, which are the `labels`

. In many applications, `predictions`

are estimated probabilities (or log odds) and the `labels`

are binary values. Both arguments can take a vector, matrix, or data.frame for prediction, but `dim(predictions)`

must equal `dim(labels)`

.

In this post, I'll go through creating `prediction`

and `performance`

objects and extracting the results.

## Prediction objects

### Simple example: one set of prediction and labels

Let's show a simple example from the `prediction`

help file, that uses a prediction and label vector (i.e. not a matrix). We see the data is some continuous prediction and binary label:

library(ROCR) data(ROCR.simple) head(cbind(ROCR.simple$predictions, ROCR.simple$labels), 5)

[,1] [,2] [1,] 0.6125478 1 [2,] 0.3642710 1 [3,] 0.4321361 0 [4,] 0.1402911 0 [5,] 0.3848959 0

Now, let's makde the prediction object and show its contents:

pred <- prediction(ROCR.simple$predictions,ROCR.simple$labels) class(pred)

[1] "prediction" attr(,"package") [1] "ROCR"

slotNames(pred)

[1] "predictions" "labels" "cutoffs" "fp" "tp" [6] "tn" "fn" "n.pos" "n.neg" "n.pos.pred" [11] "n.neg.pred"

We see the the returned result of `prediction`

is an object of class `prediction`

, which an S4 object with a series of slots. Let's look at the length of each slot and the class:

sn = slotNames(pred) sapply(sn, function(x) length(slot(pred, x)))

predictions labels cutoffs fp tp tn 1 1 1 1 1 1 fn n.pos n.neg n.pos.pred n.neg.pred 1 1 1 1 1

sapply(sn, function(x) class(slot(pred, x)))

predictions labels cutoffs fp tp tn "list" "list" "list" "list" "list" "list" fn n.pos n.neg n.pos.pred n.neg.pred "list" "list" "list" "list" "list"

We see that each slot has length 1 and is a list.

### Example: multiple sets of prediction and labels

Let's use the `ROCR.hiv`

dataset to show how this works if more than one set of predictions and labels are supplied. Here we pass a list of \(10\) predictions and a list of labels to the `prediction`

function:

data(ROCR.hiv) manypred = prediction(ROCR.hiv$hiv.nn$predictions, ROCR.hiv$hiv.nn$labels) sapply(sn, function(x) length(slot(manypred, x)))

predictions labels cutoffs fp tp tn 10 10 10 10 10 10 fn n.pos n.neg n.pos.pred n.neg.pred 10 10 10 10 10

sapply(sn, function(x) class(slot(manypred, x)))

predictions labels cutoffs fp tp tn "list" "list" "list" "list" "list" "list" fn n.pos n.neg n.pos.pred n.neg.pred "list" "list" "list" "list" "list"

We see that all the slots are still lists, but now they have length \(10\), corresponding to the \(10\) predictions/labels. We would get the same result if the 2 arguments were matrices, but that would require all predictions and labels to have the same length. Using a list of predictions/labels is a bit more flexible.

## Performance objects

From the help file of `performance`

, the syntax for this function is:

performance(prediction.obj, measure, x.measure="cutoff", ...)

We see that the first argument is a `prediction`

object, and the second is a `measure`

. If you run `?performance`

, you can see all the performance measures implemented.

We will do example of some commonly estimated measures: receiver operating characteristic (ROC) curves, accuracy, area under the curve (AUC), and partial AUC (pAUC).

### ROC Curve

### Simple example: one set of prediction and labels

We will do an ROC curve, which plots the false positive rate (FPR) on the x-axis and the true positive rate (TPR) on the y-axis:

roc.perf = performance(pred, measure = "tpr", x.measure = "fpr") plot(roc.perf) abline(a=0, b= 1)

At every cutoff, the TPR and FPR are calculated and plotted. The smoother the graph, the more cutoffs the predictions have. We also plotted a 45-degree line, which represents, on average, the performance of a Uniform(0, 1) random variable. The further away from the diagonal line, the better. Overall, we see that we see gains in sensitivity (true positive rate, > 80%), trading off a false positive rate (1- specificity), up until about 15% FPR. After an FPR of 15%, we don't see significant gains in TPR for a tradeoff of increased FPR.

### Example: multiple sets of prediction and labels

The same can be done if you have many predictions and labels:

many.roc.perf = performance(manypred, measure = "tpr", x.measure = "fpr") plot(many.roc.perf, col=1:10) abline(a=0, b= 1)

Essentially, the `plot`

function on a `performance`

object with multiple predictions and labels will loop over the lists and plot the ROC for each one.

Overall, we see the performance of each prediction is similar. The pROC package, described in the conclusion, can test the performance between ROC curves.

** Advanced:** If you want to see how performance objects are plotted, use

`getMethod("plot", signature = c(x="performance",y="missing"))`

and `ROCR:::.plot.performance`

.### Limiting to a FPR: partial ROC curve

You may only want to accept a false positive rate of a certain level, let's say 10%. The function `pROC`

below will only keep values less than or equal to the FPR you set.

pROC = function(pred, fpr.stop){ perf <- performance(pred,"tpr","fpr") for (iperf in seq_along(perf@x.values)){ ind = which(perf@x.values[[iperf]] <= fpr.stop) perf@y.values[[iperf]] = perf@y.values[[iperf]][ind] perf@x.values[[iperf]] = perf@x.values[[iperf]][ind] } return(perf) }

Let's use this on the simple cases and plot the partial ROC curve:

proc.perf = pROC(pred, fpr.stop=0.1) plot(proc.perf) abline(a=0, b= 1)

Thus, if we can only accept a FPR of 10%, the model is only giving 50% sensitivity (TPR) at 10% FPR (1-specificity).

### Getting an “optimal” cut point

In some applications of ROC curves, you want the point closest to the TPR of \(1\) and FPR of \(0\). This cut point is “optimal” in the sense it weighs both sensitivity and specificity equally. To deterimine this cutoff, you can use the code below. The code takes in BOTH the `performance`

object and `prediction`

object and gives the optimal cutoff value of your predictions:

opt.cut = function(perf, pred){ cut.ind = mapply(FUN=function(x, y, p){ d = (x - 0)^2 + (y-1)^2 ind = which(d == min(d)) c(sensitivity = y[[ind]], specificity = 1-x[[ind]], cutoff = p[[ind]]) }, perf@x.values, perf@y.values, pred@cutoffs) } print(opt.cut(roc.perf, pred))

[,1] sensitivity 0.8494624 specificity 0.8504673 cutoff 0.5014893

Now, there is a `cost`

measure in the ROCR package that you can use to create a `performance`

object. If you use it to find the minimum cost, then it will give you the same cutoff as `opt.cut`

, but not give you the sensitivity and specificity.

cost.perf = performance(pred, "cost") pred@cutoffs[[1]][which.min(cost.perf@y.values[[1]])]

[1] 0.5014893

### Different costs for FP and FN

The output from `opt.cut`

and a `performance`

object with measure `cost`

are NOT equivalent if false positives and false negatives are not weighted equally. The `cost.fn`

and `cost.fp`

arguments can be passed to `performance`

, corresponding to the cost of a false negative and false positive, respectively. Let's say false positives are twice as costly as false negatives, and let's get a cut point:

cost.perf = performance(pred, "cost", cost.fp = 2, cost.fn = 1) pred@cutoffs[[1]][which.min(cost.perf@y.values[[1]])]

[1] 0.5294022

Thus, we have a different “optimal” cut point with this changed cost function. In many real-life applications of biomarkers, the cost of a false positive and false negative are not the same. For example, missing someone with a disease based on a test may cost a hospital $1,000,000 in lawsuits, but treating someone who did not have the disease may cost $100,000 in treatments. In that case, the cost of a false negative is 10 times that of a false positive, strictly in monetary measures. No cost analysis is this simple and is usually based on many factors, but most analyses do not have equal cost for a false positive versus a false negative.

The code is the same for the optimal cutoff for the multiple prediction data:

print(opt.cut(many.roc.perf, manypred))

[,1] [,2] [,3] [,4] [,5] sensitivity 0.8076923 0.8205128 0.7692308 0.8205128 0.7564103 specificity 0.7902622 0.7827715 0.8501873 0.8164794 0.8464419 cutoff -0.5749773 -0.5640632 -0.4311301 -0.5336958 -0.4863360 [,6] [,7] [,8] [,9] [,10] sensitivity 0.7820513 0.7948718 0.7820513 0.7435897 0.7435897 specificity 0.8089888 0.8314607 0.8089888 0.8352060 0.8501873 cutoff -0.5364402 -0.4816705 -0.5388664 -0.4777073 -0.4714354

## Accuracy

### Simple example: one set of prediction and labels

Another cost measure that is popular is overall accuracy. This measure optimizes the correct results, but may be skewed if there are many more negatives than positives, or vice versa. Let's get the overall accuracy for the simple predictions and plot it:

acc.perf = performance(pred, measure = "acc") plot(acc.perf)

What if we actually want to extract the maximum accuracy and the cutoff corresponding to that? In the `performance`

object, we have the slot `x.values`

, which corresponds to the `cutoff`

in this case, and `y.values`

, which corresponds to the accuracy of each cutoff. We'll grab the index for maximum accuracy and then grab the corresponding cutoff:

ind = which.max( slot(acc.perf, "y.values")[[1]] ) acc = slot(acc.perf, "y.values")[[1]][ind] cutoff = slot(acc.perf, "x.values")[[1]][ind] print(c(accuracy= acc, cutoff = cutoff))

accuracy cutoff 0.8500000 0.5014893

Hooray! Then you can go forth and threshold your model using the `cutoff`

for (in hopes) maximum accuracy in your test data.

### Example: multiple sets of prediction and labels

Again, we will do the same with many predictions and labels, but must iterate over the results (using a `mapply`

statement):

many.acc.perf = performance(manypred, measure = "acc") sapply(manypred@labels, function(x) mean(x == 1))

[1] 0.226087 0.226087 0.226087 0.226087 0.226087 0.226087 0.226087 [8] 0.226087 0.226087 0.226087

mapply(function(x, y){ ind = which.max( y ) acc = y[ind] cutoff = x[ind] return(c(accuracy= acc, cutoff = cutoff)) }, slot(many.acc.perf, "x.values"), slot(many.acc.perf, "y.values"))

[,1] [,2] [,3] [,4] [,5] [,6] accuracy 0.86376812 0.881159420 0.8666667 0.8724638 0.8724638 0.8753623 cutoff 0.02461465 -0.006091327 0.2303707 -0.1758013 0.1251976 -0.2153779 [,7] [,8] [,9] [,10] accuracy 0.8753623 0.8724638 0.8637681 0.86376812 cutoff -0.2066697 0.1506282 0.2880392 0.06536471

We see that these cutoffs are not the same as those using the `opt.cut`

from above. This is due to the the fact that the proportion of positive cases is much less than 50%.

## Area under the curve (AUC) and partial AUC (pAUC)

### Simple example: one set of prediction and labels

The area under curve summarizes the ROC curve just by taking the area between the curve and the x-axis. the Let's get the area under the curve for the simple predictions:

auc.perf = performance(pred, measure = "auc") auc.perf@y.values

[[1]] [1] 0.8341875

As you can see, the result is a scalar number, the area under the curve (AUC). This number ranges from \(0\) to \(1\) with \(1\) indicating 100% specificity and 100% sensitivity.

As before, if you only want to accept a fixed FPR, we can calculate a partial AUC, using the `fpr.stop`

argument:

pauc.perf = performance(pred, measure = "auc", fpr.stop=0.1) pauc.perf@y.values

[[1]] [1] 0.02780625

Now, we see the pAUC to be **much** lower. It is of note that this value can range from \(0\) to whatever `fpr.stop`

is. In order to standardize it to \(1\), you can divide it by `fpr.stop`

to give a \([0, 1]\) measure:

pauc.perf@y.values = lapply(pauc.perf@y.values, function(x) x / 0.1) pauc.perf@y.values

[[1]] [1] 0.2780625

Although this measure is more comparable to the full AUC measure, it is still low. Note, there is no “one” cutoff for AUC or pAUC, as it measures the performance over all cutoffs. Also, plotting functions for scalar outcome measures (such as AUC) do not work for `performance`

objects. The code for the multiple predictions is the same.

manypauc.perf = performance(manypred, measure = "auc", fpr.stop=0.1) manypauc.perf@y.values = lapply(manypauc.perf@y.values, function(x) x / 0.1) manypauc.perf@y.values

[[1]] [1] 0.500048 [[2]] [1] 0.5692404 [[3]] [1] 0.5182944 [[4]] [1] 0.5622299 [[5]] [1] 0.5379814 [[6]] [1] 0.5408624 [[7]] [1] 0.5509939 [[8]] [1] 0.5334678 [[9]] [1] 0.4979353 [[10]] [1] 0.4870354

Note, use `sapply`

instead of `lapply`

if you want the result to be a vector.

## Conclusion

For ROC analysis the ROCR package has good methods and many built in measures. Other packages, such as the pROC package, can be useful for many functions and analyses, especially testing the difference between ROC and pROC curves. In some ways, you may want to use `proc`

admissibly over ROCR, especially because (when I checked Dec 18, 2014) the ROCR package was orphaned. But if you are working in ROCR, I hope this give you some examples of how to fit the objects and extract the results.

I have used the ROCR package for my master thesis in 2009 for creating ROC curves to visualize my classifiers. * Oh, the Memories *

Pingback: Adding Cost Functions to ROCR performance objects | A HopStat and Jump Away

Pingback: Condensed News | Data Analytics & R

Thanks for the tutorial on ROC as well as ROCR. And I thought it would be a shame to let it go to waste as orphaned so decided to be a package maintainer for it. It is no longer orphaned as of now.

anyone know how to use the label.ordering call in pred()? Have read the help function and tried multiple ways of changing it, but nothing works.

My ROC-curve is inverted and haven’t been able to redefine the labels using label.ordering.

Thanks.

O.

You can put your labels in as a factor, with factor(labels, levels = your_inverted_levels) so that it is correct.

Thanks, but that doesn’t work.

I’ve tried the following (say you want b>a, instead of a>b in a data.frame called df)

pred <- prediction(df$label, df$predictor, label.ordering = factor(df$label, levels=c("b","a")))

Error in prediction(df$label, df$predictor, label.ordering = factor(df$label… : Format of predictions is invalid.

I also tried this:

labels = factor(df$label, levels=c("b","a"))

pred <- prediction(labels, predictor)

Error in prediction(labels, predictor) : Format of predictions is invalid.

And I've tried this

df$label sessionInfo()

R version 3.1.2 (2014-10-31)

Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] ROCR_1.0-7 reshape2_1.4.1 pheatmap_1.0.2 RColorBrewer_1.1-2 Heatplus_2.12.0

[6] gplots_2.16.0 ggplot2_1.0.1 dplyr_0.4.1 plyr_1.8.1

Sorry seems to have deleted a part of my post,

what I also tried was this:

df$label <- relevel(as.factor(df$label), ref="b")

but that doesn't work either.

Can you provide a minimal working example (http://stackoverflow.com/help/mcve)

Sure. Below’s a data.frame with a predictor and a label. Try flipping them around. It wasn’t possible for me using all the commands I typed above.

I will note, that I have re-coded the orignial labelling of my labels to “a”,”b” (compared to the original data.frame) which seems to plot it in the right direction, but that doesn’t really matter, because I want to know how to be able to flip the ROC-curve anyways.

Thanks for your help.

df’s here:

https://www.dropbox.com/s/60go59xpwhw3kq2/SO_Example_ROCR.txt?dl=0

Your code has the arguments for prediction in the wrong order see ?prediction:

library(ROCR)

x = read.delim2(“~/Downloads/SO_Example_ROCR.txt”)

x$predictor = as.numeric(as.character(x$predictor))

pred = prediction(x$predictor, x$label) ############# see ?prediction

perf = performance(pred, “tpr”, “fpr”)

plot(perf)

yeah, sorry I saw that later as well. My mistake. Had to anonymise everything as I’m not allowed to share any data that would make you know what I’m working on (paranoid boss etc.).

But still, how do you reorder the labels within prediction() using the label.ordering() argument? Are you able to flip the current ROC curve using the df I gave?

and by the way – why do you use as.numeric(as.charcter..) ? Never seen that before.

Cheers

O.

Please run the exact code as I have above. The ROC is not flipped.

Again, note in my code:

pred = prediction(x$predictor, x$label) ############# see ?prediction

but in yours:

pred <- prediction(df$label, df$predictor, label.ordering = factor(df$label, levels=c("b","a")))

Do you see that the order label and predictor are given are switched in yours? I understand if you have to anonymize, that's why I asked for a minimal working example (please read this before posting again: http://stackoverflow.com/help/mcve)

Thanks mate, yes I posted the label and predictor the wrong way round in the beginning, which was a stupid mistake I made, but I was trying to stay general and for a moment forgot the right order. But that’s not the key issue of my question.

I know the rules of min working examples. Use SO regularly. That’s why I said that the ROC-curve using the provided data isn’t flipped, but I instead asked if you know how to deliberately flip it, as using the original data, the ROC is plotted the wrong way around.

But anyways, here’s enough for you to be able to replicate the problem I believe.

### dput structure of df

dput(df)

structure(list(Gene = c(9.477109132, 9.771966558, 9.557933243,

4.823233754, 5.991528152, 7.681446349, 3.0173589, 2.794981716,

2.079188485, 5.706217288, 2.596545245, 5.55731649, 6.841288155,

0.510677874, 13.65813306, 5.496814819, 6.226265519, 4.930416726,

10.41393032, 7.468527793, 16.13103733, 7.151691428, 8.898309137,

8.235777529, 0.618672308, 7.456318772, 3.965642387, 7.293784571,

1.656592013, 1.215279806, 4.401111195, 7.763950095, 7.557975677

), Dis1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,

1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c(“Norm”, “Disease”), class = “factor”)), row.names = c(NA,

-33L), class = “data.frame”, .Names = c(“Gene”, “Dis1”))

pred <- prediction(df$Gene, df$Dis1)

perf <- performance(pred, "tpr", "fpr")

plot(perf)

### plots the inverted ROC-Curve

Thanks for your help.

Well it’s not the “wrong way” necessarily. But either way here are 4 examples (3 that work). It’s not sufficient that the labels are a factor, but they are an ORDERED factor.

pred <- prediction(df$Gene, df$Dis1,

label.ordering = c("Disease", "Norm"))

perf <- performance(pred, "tpr", "fpr")

plot(perf) # inverted curve like you said

############

# This uses label.ordering like you asked

############

pred <- prediction(df$Gene, df$Dis1, label.ordering = c("Norm", "Disease"))

perf <- performance(pred, "tpr", "fpr")

plot(perf) # "non-inverted" curve

##########################

# Create an ORDERED factor

#################

df$Dis2 = as.ordered(df$Dis1)

pred <- prediction(df$Gene, df$Dis2)

perf <- performance(pred, "tpr", "fpr")

plot(perf) # "non-inverted" curve

############

# I like this route as well

############

df$Y = as.numeric(df$Dis1 == "Disease")

pred <- prediction(df$Gene, df$Y)

perf <- performance(pred, "tpr", "fpr")

plot(perf) # "non-inverted" curve

Thank you, that’s very kind of you and I appreciate your help greatly. It’s interesting. There must be something with how I’ve labelled the disease entities, that prediction() doesn’t like. I tried your methods, that work on the anonymised set (Disease vs. Norm) but it doesn’t work with the original labelling. It doesn’t matter, I’ll just relevel the factors.

By the way have you tried the new ggROC package? It inherits from ROCR but plots within ggplot2 (which I use a lot). Only issue is that in the vignette, it’s not quite clear what you have to plot, as the vignette code is incomplete. I’ll be emailing the author.

This reflects what’s noted in the documentation for prediction in the details section:

Since scoring classifiers give relative tendencies towards a

negative (low scores) or positive (high scores) class, it has to

be declared which class label denotes the negative, and which the

positive class. Ideally, labels should be supplied as ordered

factor(s), the lower level corresponding to the negative class,

the upper level to the positive class. If the labels are factors

(unordered), numeric, logical or characters, ordering of the

labels is inferred from R’s built-in ‘<’ relation (e.g. 0 < 1, -1

< 1, 'a' < 'b', FALSE < TRUE). Use ‘label.ordering’ to override

this default ordering. Please note that the ordering can be

locale-dependent e.g. for character labels '-1' and '1'.

Pingback: ROC Curves | exploring r

Link exchange is nothing else but it is simply placing the other person’s blog

link on your page at proper place and other person will also do same for

you.

Hello, Thanks so much for the helpful tutorial. I am having a problem and would be glad if you helped me out.

I am working with the penalized and ROCR packages in R and have a prediction S4 object containing values from 200 cross validations.I figured how to see the names of the slots, as below:

names(getSlots(class(c.pred)))

1 “predictions” “labels” “cutoffs” “fp” “tp” “tn” “fn” “n.pos” “n.neg” “n.pos.pred” “n.neg.pred”

and also checked their length.

However, I cannot figure out how to extract the mean/median for each of these slots..like for fp,tp. I basically would like to get a value for sens,spec,npv,ppv from these cross validations and I suppose I would have to get the fp,tp and then calculate? I thought I would have to unlist each slot and get the summary, but am stuck ;(

Appreciate your help.

Thanks.

You can get the slots by doing c.pred@predictions, or slot(c.pred, “predictions”).

I would recommend using the ROCR::performance function:

sens = performance(c.pred, “sens”)

spec = performance(c.pred, “spec”)

npv = performance(c.pred, “npv”)

ppv = performance(c.pred, “ppv”)

Hey thanks! But what is the best way to get sens spec values from 200 cross validations?Should get the corresponding values for the median AUC I retrieve? Or just take the median or mean of the 200 values? Also, is it enpugh to split the data to train for 10 fold cross validation and test it? or should i have split the data into 3 parts for validation? Many thanks again.

It depends on what you want to do. You can take the maximum/minimum/mean sensitivity/specificity for the 200 cross-validations. You can do AUC for each one and do median/mean.

Hey Hi again…I was wondering if the procedure to extract the optimal points for multiple sets of predictions should be a bit different?

Thanks.

Sorry I had another doubt…when we have multiple sets of predictions and we get all the values for accuracy and cutoof and NPV,PPV with cutoffs in the same way, what is the best way to iterate over the structure and get measures like median etc?

Pingback: A guide for using ROCR package to plot ROC and determine optimal cut-off | go2analytics

Hello. Does anyone know how to make a ROC curve appear as a smooth line, instead of stepped straight lines?

Pingback: r - Decidere di soglia per glm modello di regressione logistica in R

Pingback: r - Décider de seuil pour glm modèle de régression logistique dans la R