Saturday, March 26, 2011

How to backtest a strategy in R

This is the third post in the Backtesting in Excel and R series and it will show how to backtest a simple strategy in R.  It will follow the 4 steps Damian outlined in his post on how to backtest a simple strategy in Excel.

Step 1: Get the data
The getSymbols function in quantmod makes this step easy if you can use daily data from Yahoo Finance.  There are also "methods" (not in the strict sense) to pull data from other sources (FRED, Google, Oanda, R save files, databases, etc.).  You could also use them as a template to write a custom function for a particular vendor you use.

# run the command below if quantmod isn't already installed
# install.packages("quantmod")
# use the quantmod package (loads TTR, xts, and zoo)
require(quantmod)
# pull SPX data from Yahoo (getSymbols returns an xts object)
getSymbols("^GSPC")

Step 2: Create your indicator
The TTR package contains a multitude of indicators.  The indicators are written to make it easy to combine them in creative and unconventional ways.  Starting with revision 106 on R-forge, TTR has a DVI indicator.

# calculate DVI indicator
dvi <- DVI(Cl(GSPC))  # Cl() extracts the close price column

Step 3: Construct your trading rule
Since this trading rule is simple--we're long 100% if the DVI is below 0.5 and short 100% otherwise--it can be written in a single line.  More elaborate rules and/or position sizings can be done as well, but require more code (RSI(2) with Position Sizing is an example of more complex position sizing rules).  Also notice that the signal vector is lagged, which avoids look-ahead bias.

# create signal: (long (short) if DVI is below (above) 0.5)
# lag so yesterday's signal is applied to today's returns
sig <- Lag(ifelse(dvi$dvi < 0.5, 1, -1))

Step 4: The trading rules/equity curve
As in Damian's example, the code below is a simplified approach that is frictionless and does not account for slippage.  The code below takes today's percentage return and multiplies it by yesterday's signal / position size (always +/- 100% in this example).  I also subset the system returns to match the results in the Excel file.

# calculate signal-based returns
ret <- ROC(Cl(GSPC))*sig
# subset returns to match data in Excel file
ret <- ret['2009-06-02/2010-09-07']

Step 5: Evaluate strategy performance
Damian mentioned the importance of evaluating your strategy.  Fortunately for R users, the PerformanceAnalytics package makes this easy.  With a few lines of code we can view the drawdowns, downside risks, and a performance summary.

# use the PerformanceAnalytics package
# install.packages("PerformanceAnalytics")
require(PerformanceAnalytics)
# create table showing drawdown statistics
table.Drawdowns(ret, top=10)
# create table of downside risk estimates
table.DownsideRisk(ret)
# chart equity curve, daily performance, and drawdowns
charts.PerformanceSummary(ret)

That's all there is to backtesting a simple strategy in R.  It wasn't that intimidating, was it?  Please leave feedback if you're moving your backtesting from Excel to R and there's something you're hung up on or you have an awesome tip you'd like to share.

Here's a succinct version of the code in the above post if you want to be able to copy / paste it all in one block:

require(quantmod)
require(PerformanceAnalytics)

# Step 1: Get the data
getSymbols("^GSPC")

# Step 2: Create your indicator
dvi <- DVI(Cl(GSPC))

# Step 3: Construct your trading rule
sig <- Lag(ifelse(dvi$dvi < 0.5, 1, -1))

# Step 4: The trading rules/equity curve
ret <- ROC(Cl(GSPC))*sig
ret <- ret['2009-06-02/2010-09-07']
eq <- exp(cumsum(ret))
plot(eq)

# Step 5: Evaluate strategy performance
table.Drawdowns(ret, top=10)
table.DownsideRisk(ret)
charts.PerformanceSummary(ret)

43 comments:

jscn said...

Thanks for the tutorial.

However, when I ran it, I got this error:

Error: could not find function "DVI"

Zachary said...

I can't seem to get the DVI indicator out of TTR. I've tried installing the package from CRAN, using install.packages('TTR',dependencies=TRUE) and from r-forge, using install.packages("TTR",repos="http://r-forge.r-project.org"). In both cases it installs fine, and other indicators work, but there's no DVI indicator.

Thank you!

Zachary said...

By the way, CRAN installs 'TTR_0.20-2.tgz' and R-forge installs 'TTR_0.20-3.tgz'

Zachary said...

HMmm, ok I removed TTR and re-installed from R-Forge. Now I get this error when I try to start it:

"Loading required package: xts
Loading required package: zoo
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/Library/Frameworks/R.framework/Versions/2.12/Resources/library/TTR/libs/x86_64/TTR.so':
dlopen(/Library/Frameworks/R.framework/Versions/2.12/Resources/library/TTR/libs/x86_64/TTR.so, 6): Library not loaded: /usr/local/lib/libgfortran.2.dylib
Referenced from: /Library/Frameworks/R.framework/Versions/2.12/Resources/library/TTR/libs/x86_64/TTR.so
Reason: image not found
Error: package/namespace load failed for 'TTR'"

I'm on a mac, if that helps

Zachary said...

I installed the Fortran compilers for the mac (http://r.research.att.com/gfortran-4.2.3.dmg) and the installation works. Weirdly, when I type ?DVI I see the help page for it, but I can't actually access the function. If you have any advice, I'd appreciate it, otherwise I'll just keep banging away at this on my own.

Joshua Ulrich said...

Hi Zachary,

CRAN houses the stable version of TTR, while R-forge contains the latest development versions.

The reason you can see ?DVI but can't use it is because I forgot to export DVI. I exported it last night, but the daily builds haven't run on my latest changes yet. You have a couple options. You can replace "TTR:::DVI" instead of "DVI", checkout the latest revision and build from source, or wait until the daily builds complete.

Zachary said...

Thanks Josh! Sorry to clog your comments-- I didn't realize until a minute ago that you were also the package maintainer.

Kvantitativ Analys said...

R seems like a good choice if you're just backtesting standard indicators. But why would i use it over Excel for proprietary indicators? The main reason why I use Excel is because it's easy to test any idea without much limitation.

How about computing power? My problem with Excel is that when the ideas get more complex the computing speed gets insufficient. For example computing a RSI(2) strategy 100 000 times takes a looooong time. How much faster is R?

Thanks for a great blog!

Dekalog said...

Re: Kvantitativ Analys

You could always code your proprietary indicators in R; I think you will find things much quicker than if you stuck with Excel. It was my disappointment with Excel in this regard that prompted me to move to a combination of Octave and R.

Joshua Ulrich said...

Kvantitativ Analys,

You can create proprietary indicators in R just like you can in Excel. The TTR package provides all the source code for the indicators in it. You can use those as templates for your proprietary indicators.

It's difficult to discuss relative performance in vague terms like this. My intuition is that R would be faster, but it's impossible to say "how much faster" without a specific example. You can always move the slowest parts to compiled code, which is much easier to do in R than Excel (even via XLW).

teramonagi said...

Hi.
this article is very interesting!
I would like to translate this into Japanese.
do you allow me to do that?

Subhash said...

Joshua

Thanks for the post. I am teaching a Course in Financial Modeling with R. I use some of the packages that you and Jeff Ryan have authored. Thanks for that.

I'd be glad if you could direct me to other resources like this your blog.
Thanks

Joshua Ulrich said...

jscn, you probably had the same issue as Zachary. Try re-installing TTR from R-forge.

teramonagi, you're welcome to translate this post (and any others) to Japanese. Just please give me credit. ;-)

Subhash, you're welcome. I'm glad you've found our work helpful. I don't know of any resources other than what's in my blogroll. You may find RMetrics suite helpful too.

vedaranyam said...

Subhas - is it possible to put the link to the course that you are teaching ?

Milk Trader said...

The signal vector is quite elegant. I see you can take three positions(long, short, flat) by simply nesting the ifelse statement.

To my fellow Mac R users, I can attest that I have downloaded TTR_0.20-3.tar.gz from https://r-forge.r-project.org/src/contrib/ and after navigating to my ~/Downloads directory, the simple command of R CMD INSTALL TTR_0.20-3.tar.gz from command line did the trick of installing the latest/greatest TTR, complete with DVI. (You do need XCode that the ATT Fortran compiler (found at http://r.research.att.com/tools/) installed)

Heikolino said...

Joshua,

do you also have an example of how to implement pending orders (stop loss orders, for example), and scaling positions in and out, using ifelse()?
That is, how to implement trading decisions that depend on trade and equity history over a certain past period, and trades that are triggered at a certain price? Example: Open a position with attached stop loss order according to some entry criterium, open another position not before the first one is in the positive, and so on until a certain position size is reached. Then scale a part of the position out and build the position up again as above.
Now, while such strategies can be implemented fairly easily using nested for-loops and the like, doing so would make backtests far too slow. I'm looking for ways to code it in a vectorized way, such as by using the ifelse() construct. I would be grateful for any suggestions.

Joshua Ulrich said...

Heikolino,

You can't implement path-dependent rules with simple ifelse() function calls. Much of what you want to do can be accomplished with the quantstrat package. I plan to write an "Introduction to quantstrat" post in the near future.

Milk Trader said...

I noticed that the ROC() function is very similar to quantmod::Delt and quantmod::dailyReturn except that you can select a discrete calculation (which returns values corresponding to the other quantmod functions) or you can select continuous, which is the default value. Can you explain why you chose to make this distinction and why you chose to use type="continuous" to calculate your returns?

Joshua Ulrich said...

Milk Trader,

The distinction is between continuous compounding and discrete compounding. I made "continuous" the default because that's what I was using in my research when I originally wrote the package in 2006.

veriae666 said...

Great tutorial, thank you for taking the time!

eric said...

I'm confused by the equity calculation: eq <- cumprod(1+ret).

Since ret is based on ROC and the default is continuous (log) returns, then wouldn't eq = exp(cumsum(ret)) ?

Joshua Ulrich said...

Eric, you are correct.

Mateo said...

Thanks for the post, it's excellent.

One question: when you say the signal is lagged, that means we have to apply the +/-1 signal for the next day, correct? Otherwise we would get a signal when all information is known. Am I missing something here? I am new to the subject (although eager to learn).

Cheers!

Joshua Ulrich said...

Mateo,

You're correct. If we use today's data to determine the signal, we would be snooping/peeking if we applied that signal to today's return. We lag the signal so it will be applied to tomorrow's return.

E said...
This comment has been removed by the author.
Juan L.P. said...

Thank you for the great tutorial.

I'm fresh to trading yet find this easy to follow.

The only question I have is why in step 4, we evaluated the cumsum and then exp it? I get the cumsum but am not sure about the exp.

Thank you.

Joshua Ulrich said...

Hi Juan,

I'm glad you found this post helpful.

The cumsum function calculates the cumulative return at each point in time. But we want the equity value at each point in time. The call to exp gives us the equity value (in effect, compounding the returns).

See Pat Burns' A tale of two returns for a good comparison between arithmetic and log returns.

Dave said...

I'm new to R, and I'm hoping someone can provide some pointers on three backtest topics:

1. I currently trade two rotational strategies (one weekly and one monthly) that use two moving averages and volatility to produce a rank for each of the 10 ETFs in my portfolio. Based on the rank, the top three ETFs are chosen. I've seen the rotational system on the Systematic Investor blog, but was wondering how to do this with quantstrat.

2. Are there any R packages that provide the capability to create a database of stock history from Yahoo, and then update this DB nightly? As the number of symbols that I'm following grows, the response from my getSymbols calls are taking longer.

3. Are there any R packages that provide the capability to append the current/intraday stock price to an xts dataseries resulting from a getSymbols call? I enter MOC orders with both of my rotational systems and I use the stock price about 30 minutes from the close to generate my signals.

Thanks in advance,
Dave

Joshua Ulrich said...

Hi Dave,

1) Those types of rotational momentum strategies are very simple, so I would be inclined to run it without quantstrat. I'll ping Brian about how to go about doing it with quantstrat though.

2) I wouldn't use a database. Pull the data you need and write it to a binary R file using save(). For daily data, you could easily store data for hundreds of symbols in one file. When you want to update the data, just pull the days you don't have and append it to the saved objects (see 3).

3) Just use quantmod and xts. quantmod::getQuote will get the "current" (delayed) quote. Then you just have to xts::rbind it to your xts object that contains the historical data.

rittysbud said...

We liked quantmod so much we extended support for live trading quantmod strategies through another open source platform tradelink :

http://tradelink.org

http://www.pracplay.com/landings/bridge_r

cells200 said...

Joshua,
I am new to R. How may we place other automatically using R with e.g. TD trade or etrade? Or it all depends on the API with different brokerage firms.
Simon

Joshua Ulrich said...

Chris,

That is completely dependent on the API provided by the specific firm and whether or not someone has built R functions to interact with the API.

Unknown said...

Hi Joshua, Very Eye opening post. I have the same
questions as Dave. I wish to develop my own
portfolio backtesting system using R.
Where should I start.
Also How do I get traditional metrics like sharpe ratio etc from the performanceAnalytics pkg.
Lastly are there good manuals for the quant* pkgs.

CandleForex said...

Interesting. Never knew about TRR software.

Sean Fallon said...
This comment has been removed by the author.
Sean Fallon said...

Am new to R and still learning. I hit a error when I came to table.Drawdowns(ret, top=10). It said - "Error in dimnames(cd) <- list(as.character(index(x)), colnames(x)) : 'dimnames' applied to non-array". Quantmod and PerformanceAnalytics are both loaded. Am I missing something??

Sean Fallon said...

Silly me?? :p missed two lines of code in between. Apologies. Its working now.

Bapi Vinnakota said...

Hello,

A noob question for an excellent post.

I replaced the dvi signal with an sma signal from ttr.

After replacement, I replaced dvi$dvi with sma$sma in the signal generation line.

I see the following error:

Error in `$.zoo`(sma, sma) :
only possible for zoo series with column names

Regards
Bapi

Joshua Ulrich said...

Hi Bapi,

The DVI function returns an object with 3 columns, named dvi.mag, dvi.str, and dvi. The SMA function only returns one (unnamed) column.

You can use this code if you want to name the one column of the SMA output:

sma <- setNames(SMA(Cl(GSPC)), "sma")

Best,
Joshua

Bapi Vinnakota said...

Joshua,

Thank you.

Bapi

Graeme Walsh said...

Hey Joshua,

Fantastic blog! I really enjoy reading your posts.

In this post I think I may have picked up on a slight error (not in the code, though!).

In the text you say: "we're long 100% if the DVI is below 0.5 and short 100% otherwise"

and then you go on to say: "create signal: (long (short) if DVI is above (below) 0.5)"

In the first case you say that we're long if the DVI is below 0.5, but in the second case you say that we're long if the DVI is above 0.5!

Maybe you could iron out this typo to avoid confusion?

Cheers,
GW

Joshua Ulrich said...

Graeme,

The copy is correct and the code comment is wrong. I've now corrected the comment to align with the copy. Thanks for catching this typo!

XE X X said...

Thanks for this info. Coming from Excel, and completely new to R. Is there a way to see what each line does? Like in your lag(), can we see what arguments lag takes and what output it has?