Monday, November 4, 2013

quantstrat is slow

The complaint I hear most frequently about quantstrat is that it's slow, especially for large data.  Some of this slow performance is due to quantstrat treating all strategies as path-dependent by default.  Path dependence requires rules to be re-evaluated for each timestamp with a signal.  More signals equates to longer run-times.

If your strategy is not path-dependent, you can get a fairly substantial performance improvement by turning path-dependence off.  If your strategy truly is path-dependent, keep reading...

I started working with Brian Peterson in late August of this year, and we've been working on a series of very large backtests over the last several weeks.  Each backtest consisted of ~7 months of 5-second data on 72 instruments with 15-20 different configurations for each.

These backtests really pushed quantstrat to its limits.  The longest-running job took 32 hours.  I had some time while they were running, so I decided to profile quantstrat.  I was able to make some substantial improvements, so I thought I'd write a post to tell you what's changed and highlight some of the performance gains we're seeing.

The biggest issue was how the internal function ruleProc was constructing and evaluating the rule function call for each signal.  Basically, the code that constructed the call was evaluating the mktdata object before calling the rule function.  This meant that the rule function call looked like this:
ruleFunction(c(50.04, 50.23, 50.42, 50.37, 50.24, 50.13, 50.12, 50.42, 50.42, 50.37, 50.24, 50.22, 49.95, 50.23, 50.26, 50.22, 50.11, 49.99, 50.12, 50.4, 50.33, 50.33, 50.18, 49.99), ...)
instead of this:
ruleFunction(mktdata, ...)
You can only imagine how large that first call would be for a 10-million-row mktdata object.  Changing the rule function call to use the name (symbol) of the mktdata object instead of the mktdata object itself helped the 32-hour job finish in just over 2 hours.

If you think I would be happy enough with that, you don't know me.  Several other changes helped get that 2-hour run down to under 30 minutes.
  • We now calculate periodicity(mktdata) in applyRules and pass that value to ruleOrderProc.  This avoids re-calculating that value for every order, since mktdata doesn't change inside applyRules.
  • We also pass current row index value to ruleOrderProc and ruleSignal because xts subsetting via an integer is much faster than via a POSIXct object.
  • applyStrategy only accumulates values returned from applyIndicators, applySignals, and applyRules if debug=TRUE.  This saves a little time, but can save a lot of memory for large mktdata objects.
  • The dimension reduction algorithm has to look for the first time the price crosses the limit order price.  We were doing that with a call to which(sigThreshold(...))[1].  The relational operators (<<=>>=, and ==) and which operate on the entire vector, but we only need the first value, so I replaced that code with a with C-based .firstThreshold function that stops as soon as it finds the first cross.
All these changes are most significant for large data sets.  The small demo strategies included with quantstrat are also faster, but the net performance gains increase as the size of the data, the number of signals (and therefore the number of rule evaluations), and number of instruments increases.

You're still reading?  What are you waiting for?  Go install the latest from R-Forge and try it for yourself!

Thursday, October 17, 2013

R/Finance 2014 Call for Papers

We're getting ready for this year's R/Finance conference.  Here's the call for papers.  I hope to see you there!

R/Finance 2014: Applied Finance with R
May 16 and 17, 2014
University of Illinois at Chicago

The sixth annual R/Finance conference for applied finance using R will be held on May 16 and 17, 2014 in Chicago, IL, USA at the University of Illinois at Chicago.  The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past five years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2014.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format), although more complete papers are preferred.  We welcome submissions for both full talks and abbreviated "lightning talks".  Both academic and practitioner proposals related to R are encouraged.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper.  Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

The conference will award two (or more) $1000 prizes for best papers.  A submission must be a full paper to be eligible for a best paper award. Extended abstracts, even if a full paper is provided by conference time, are not eligible for a best paper award.  Financial assistance for travel and accommodation may be available to presenters at the discretion of the conference committee. Requests for assistance should be made at the time of submission.

Please submit your papers or abstracts online at: goo.gl/OmKnu7.  The submission deadline is January 31, 2014.  Submitters will be notified via email by February 28, 2014 of acceptance, presentation length, and decisions on requested funding.

Additional details will be announced via the conference website www.RinFinance.com as they become available. Information on previous years' presenters and their presentations are also at the conference website.

For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
  Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Tuesday, May 28, 2013

R/Finance 2013 Review

It's been one week since the 5th Annual R/Finance conference, and I finally feel sufficiently recovered enough to share my thoughts. The conference is a two-day whirlwind of applied quantitative finance, fantastic networking, and general geekery.

The comments below are based on my personal experience.  If I don't comment on a seminar or presentation, it doesn't mean I didn't like it or it wasn't good; it may have been over my head or I may have been distracted with my duties as a committee member.  All the currently available conference slides are available on the website.
Friday morning seminar:
I went to (and live-tweeted) Jeff Ryan's seminar because I wanted to learn more about how he uses mmap+indexing with options data.  There I realized that POSIXlt components use a zero-based index because they mirror the underlying tm struct, and that mmap+indexing files can be shared across cores and you can read them from other languages (e.g. Python).

Friday talks:
The first presentation was by keynote Ryan Sheftel, who talked about how he uses R on his bond trading desk.  David Ardia showed how expected returns can be estimated via the covariance matrix.  Ronald Hochreiter gave an overview of modeling optimization via his modopt package.  Tammer Kamel gave a live demo of the Quandl package and said, "Quandl hopes to do to Bloomberg what Wikipedia did to Britannica."

I had the pleasure of introducing both Doug Martin, who talked about robust covariance estimation, and Giles Heywood, who discussed several ways of estimating and forecasting covariance, and proposed an "open source equity risk and backtest system" as a means of matching talent with capital.

Ruey Tsay was the next keynote, and spoke about using principal volatility components to simplify multivariate volatility modeling.  Alexios Ghalanos spoke about modeling multivariate time-varying skewness and kurtosis.  Unfortunately, I missed both Kris Boudt's and David Matteson's presentations, but I did get to see Winston Chang's live demo of Shiny.

Friday food/networking:
The two-hour conference reception at UIC was a great time to have a drink, talk with speakers, and say hello to people I had never met in person.  Next was the (optional) dinner at The Terrace at Trump.  Unfortunately, it was cold and windy, so we only spent 15-20 minutes on the terrace before moving inside.  The food was fantastic, but the conversations were even better.

Saturday talks:
I missed the first block of lightning talks.  Samantha Azzarello discussed her work with Blu Putnam, which used a dynamic linear model to evaluate the Fed's performance vis-a-vis the Taylor Rule.  Jiahan Li used constrained least squares on 4 economic fundamentals to forecast foreign exchange rates.  Thomas Harte talked about regulatory requirements of foreign exchange pricing (and wins the award for most slides, 270); basically documentation is important, Sweave to the rescue!

Sanjiv Das gave a keynote on 4 applications: 1) network analysis on SEC and FDIC filings to determine banks that pose systematic risk, 2) determining which home mortgage modification is optimal, 3) portfolio optimization with mental accounting, 4) venture capital communities.

I had the pleasure of introducing the following speakers: Dirk Eddelbuettel showed how it's easy to write fast linear algebra code with RcppArmadillo.  Klaus Spanderen showed how to use QuantLib from R, and even how to to call C++ from R from C++.  Bryan Lewis talked about SciDB and the scidb package (SciDB contains fast linear algebra routines that operate on the database!).  Matthew Dowle gave an introduction to data.table (in addition to a seminar).

Attilio Meucci gave his keynote on visualizing advanced risk management and portfolio optimization.  Immediately following, Brian Peterson gave a lightning on implementing Meucci's work in R (Attilio works in Matlab), which was part of a Google Summer of Code project last year.

Thomas Hanson presented his work with Don Chance (and others) on computational issues in estimating the volatility smile.  Jeffrey Ryan showed how to manipulate options data in R with the greeks package.

The conference wrapped up by giving away three books, generously donated by Springer, to three random people who submitted feedback surveys.  I performed the random drawing live on stage, using my patent-pending TUMC method (I tossed the papers up in the air).

The committee also presented the awards for best papers.  The winners were: 
  • Regime switches in volatility and correlation of ļ¬nancial institutions, Boudt et. al.
  • A Bayesian interpretation of the Federal Reserve's dual mandate and the Taylor Rule, Putnam & Azzarello
  • Nonparametric Estimation of Stationarity and Change Points in Finance, Matteson et. al.
  • Estimating High Dimensional Covariance Matrix Using a Factor Model, Sun (best student paper)
Saturday food/networking:
The whirlwind came to a close at Jaks Tap.  I was finally able to ask speed-obsessed Matthew Dowle about potential implementations of a multi-type xts object (a Google Summer of Code project this year).    I also spoke to a few people about how to add options strategy backtesting to quantstrat.

Last, but not least: none of this would be possible without the support of fantastic sponsors: International Center for Futures and Derivatives at UIC, Revolution Analytics, MS-Computational Finance at University of Washington, Google, lemnica, OpenGamma, OneMarketData, and RStudio.

Friday, March 29, 2013

R/Finance 2013 Registration Open

The registration for R/Finance 2013 -- which will take place May 17 and 18 in Chicago -- is NOW OPEN!

Building on the success of the previous conferences in 2009, 2010, 2011 and 2012, we expect more than 250 attendees from around the world. R users from industry, academia, and government will joining 30+ presenters covering all areas of finance with R.

We are very excited about the four keynotes by Sanjiv Das, Attilio Meucci, Ryan Sheftel, and Ruey Tsay. The main agenda (currently) includes seventeen full presentations and fifteen shorter "lightning talks". We are also excited to offer five optional pre-conference seminars on Friday morning.

To celebrate the fifth year of the conference in style, the dinner will be held at The Terrace at Trump Hotel. Overlooking the Chicago river and skyline, it is a perfect venue to continue conversations while dining and drinking.

More details of the agenda are available at:
http://www.RinFinance.com/agenda/

Registration information is available at:
http://www.RinFinance.com/register/

and can also be directly accessed by going to:
http://www.regonline.com/RFinance2013

We would to thank our 2013 Sponsors for the continued support enabling us to host such an exciting conference:

International Center for Futures and Derivatives at UIC

Revolution Analytics
MS-Computational Finance at University of Washington

Google
lemnica
OpenGamma
OneMarketData
RStudio

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Monday, March 18, 2013

TTR_0.22-0 on CRAN


An updated version of TTR is now on CRAN.  The biggest changes to be aware of are that all moving averages attempt to set colnames, CCI retuns an object with colnames, and the initial gap for SAR is not hard-coded at 0.01.  There are also some much-needed bug fixes - most notably to Yang Zhang volatility, MACD, SAR, EMA/EVWMA, and adjRatios.

There are some exciting new features, including a rolling single-factor model function (rollSFM, based on a prototype from James Toll), a runPercentRank function from Charlie Friedemann, stoch and WPR return 0.5 instead of NaN when there's insufficient price movement, and a faster aroon function.

Here are all of the updates (from the CHANGES file):

#-#-#-#-#-#-#-#-#-#    Changes in TTR version 0.22-0    #-#-#-#-#-#-#-#-#-#

SIGNIFICANT USER-VISIBLE CHANGES
  • CCI now returns an object with colnames ("cci").
  • All moving average functions now attempt to set colnames.
  • Added clarification on the displaced nature of DPO.
  • SAR now sets the initial gap based on the standard deviation of the high-low range instead of hard-coding it at 0.01.
NEW FEATURES
  • Added rollSFM function that calculates alpha, beta, and R-squared for a single-factor model, thanks to James Toll for the prototype.
  • Added runPercentRank function, thanks to Charlie Friedemann.
  • Moved slowest portion of aroon to C.
  • DonchianChannel gains an 'include.lag=FALSE' argument, which includes the current period's data in the calculation. Setting it to TRUE replicates the original calculation. Thanks to Garrett See and John Bollinger.
  • The Stochastic Oscillator and Williams' %R now return 0.5 (instead of NaN) when a securities' price doesn't change over a sufficient period.
  • All moving average functions gain '...'.
  • Users can now change alpha in Yang Zhang volatility calculation.
BUG FIXES
  • Fixed MACD when maType is a list. Now mavg.slow=maType[[2]] and mavg.fast=maType[[1]], as users expected based on the order of the nFast and nSlow arguments. Thanks to Phani Nukala and Jonathan Roy.
  • Fixed bug in lags function, thanks to Michael Weylandt.
  • Corrected error in Yang Zhang volatility calculation, thanks to several people for identifying this error.
  • Correction to SAR extreme point calculations, thanks to Vamsi Galigutta.
  • adjRatios now ensures all inputs are univariate, thanks to Garrett See.
  • EMA and EVWMA now ensure n < number of non-NA values, thanks to Roger Bos.
  • Fix to BBands docs, thanks to Evelyn Mitchell.