Friday, July 7, 2017

xts 0.10-0 on CRAN!

A new, and long overdue, release of xts is now on CRAN!  The major change is the completely new plot.xts() written by Michael Weylandt and Ross Bennett, and which is based on Jeff Ryan's quantmod::chart_Series code.

Do note that the new plot.xts() includes breaking changes to the original (and rather limited) plot.xts().  However, we believe the new functionality more than compensates for the potential one-time inconvenience.  And I will no longer have to tell people that I use plot.zoo() on xts objects!

This release also includes more bug fixes than you can shake a stick at.  We squashed several bugs that could have crashed your R session.  We also fixed some (always pesky and tricky) timezone issues.  We've also done more sanity checking (e.g. for NA in the index), and provide more informative errors when things aren't right.  And last, but not least, unit tests are running again!

I'm sure you were hoping to see some examples of the new plot.xts() functionality.  Rather than clutter up this blog post with code, check out the basic examples, and the panel functionality examples that Ross Bennett created.

I'm looking forward to your questions and feedback!  If you have a question, please ask on Stack  Overflow and use the [r] and [xts] tags.  Or you can send an email to the R-SIG-Finance
mailing list (you must subscribe to post).  Open an issue on GitHub if you find a bug or want to request a feature, but please read the contributing guide first!

Wednesday, June 21, 2017

Importing and Managing Financial Data

I'm excited to announce my DataCamp course on importing and managing financial data in R! I'm also honored that it is included in DataCamp's Quantitative Analyst with R Career Track!

You can explore the first chapter for free, so be sure to check it out!

Course Description

Financial and economic time series data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course covers importing data from local files as well as from internet sources.

Course Outline

Chapter 1: Introduction and downloading data
A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.

Chapter 2: Extracting and transforming data
You've learned how to import data from online sources, now it's time to see how to extract columns from the imported data. After you've learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.

Chapter 3: Managing data from multiple sources
Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to getSymbols(). You will see how to customize defaults by data source, and then how to customize defaults by symbol. You will also learn how to handle problematic instrument symbols

Chapter 4: Aligning data with different periodicities
You've learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series. Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.

Chapter 5: Importing text data, and adjusting for corporate actions
You've learned the core workflow of importing and manipulating financial data. Now you will see how to import data from text files of various formats. Then you will learn how to check data for weirdness and handle missing values. Finally, you will learn how to adjust stock prices for splits and dividends.

Wednesday, June 7, 2017

quantmod 0.4-9 on CRAN

A new release of quantmod is now on CRAN! The only change was to address changes to Yahoo! Finance and their effects on getSymbols.yahoo().  GitHub issue #157 contains some details about the fix implementation.

Unfortunately, the URL wasn't the only thing that changed.  The actual data available for download changed as well.

The most noticeable difference is that the adjusted close column is no longer dividend-adjusted (i.e. it's only split-adjusted).  Also, only the close price is unadjusted; the open, high, and low are split-adjusted.

There also appear to be issues with the adjusted prices in some instruments.  For example, users reported issues with split data for XLF and SPXL in GitHub issue #160.  For XLF, there a split and a dividend on 2016-09-16, even on the Yahoo! Finance historical price page for XLF. As far as I can tell, there was only a special dividend.  The problem with SPXL is that the adjusted close price isn't adjusted for the 4/1 split on 2017-05-01, which is also reflected on the Yahoo! Finance historical prices page for SPXL.

Another change is that the downloaded data may contain rows where all the values are "null".  These appear on the website as "0".  This is a major issue for some instruments.  Take XLU for example; 188 of the 624 days of data are missing between 2014-12-04 and 2017-05-26 (ouch!).  You can see this is even true on the Yahoo! Finance historical price page for XLU.

If these changes have made you look for a new data provider, see my post: Yahoo! Finance Alternatives.

Yahoo Finance Alternatives

I assume that you're reading this because you are one of many people who were affected by the changes to Yahoo Finance data in May (2017).  Not only did the URL change, but the actual data changed as well!

The most noticeable difference is that the adjusted close column is now only split-adjusted, whereas it used to be split- and dividend-adjusted.  Another oddity is that only the close prices is unadjusted (strangely, the open, high, and low are split-adjusted).

All these issues can be dealt with using tools that are currently available.  For example, you can unadjust the open, high, and low prices using the ratio of close to adjusted close prices.  And you can adjust for both splits and dividends using quantmod::adjustOHLC().

Unfortunately, there also appear to be issues with data quality.  Some instruments have rows where all the prices and volume are zeros (e.g. XLU).  The adjusted close in some instruments is incorrect because of missing split events, or double-counting splits and special dividends.

So, what are your alternatives?  If you're just tinkering, you can try other free data sources like Google Finance or Quandl.  Note that Google Finance data is already split-adjusted, so you might need to adjust for dividends, or un-adjust for splits, depending on your needs.  Quandl has a wiki of end-of-day stock prices curated by the community.  You only need a free account to access the data.

If you're using the data to make actual investment decisions, you should really be using a professional data provider.  At the very least, you get someone to yell at when the data have errors. :)  First, you should check if your broker provides the historical data you need (e.g. Interactive Brokers provides historical and real-time data to account-holders).

If your broker doesn't provide historical data, here are a few providers you may want to consider:

- Provide limited historical data for free
- For a one-time fee:
  - $20-$50 for 10 years of daily data
  - $40-$100 for 20 years of daily data

- Massive historical equity database
- $600 annually for 30 years of daily data
- Ability to adjust for splits and dividends

- Mainly a real-time data provider, but also has historical data
- Pricing, starts at $78/month

Leave a comment if you know of another end-of-day data provider that I didn't list!

*FULL DISCLOSURE: I receive a referral fee for annual subscriptions to CSI products if you use the FOSS coupon code.

Wednesday, April 19, 2017

quantmod 0.4-8 on CRAN

I pushed a bug-fix release of quantmod to CRAN last night. The major changes were to
  • getSymbols.FRED (#141)
  • getSymbols.oanda (#144)
  • getSymbols.yahoo (#149)
All three providers made breaking changes to their URLs/interfaces.

getSymbols.google also got some love. It now honors all arguments set via setSymbolLookup (#138), and it correctly parses the date column in non-English locales (#140).

There's a handy new argument to getDividends: split.adjust. It allows you to request dividends unadjusted for splits (#128). Yahoo provides split-adjusted dividends, so you previously had to manually unadjust them for splits if you wanted the original raw values. To import the raw unadjusted dividends, just call:

rawDiv <- getDividends("IBM", split.adjust = FALSE)

Note that the default is split.adjust = TRUE to maintain backward-compatibility.

Tuesday, February 14, 2017

Stack Financials: Analyze Financial Statement Data

A quantmod user asked an interesting question on StackOverflow: Looping viewFinancials from quantmod. Basically, they wanted to create a data.frame that contained financial statement data for several companies for several years. I answered their question, and thought others might find the function I wrote useful… hence, this post!

I called the function stackFinancials() because it would use getFinancials() and viewFinancials() to pull financial statement data for multiple symbols, and stack them together in long form. I chose a long data format because I don’t know whether the output of viewFinancials() always has the same number of rows and columns for a given type and period. The long format makes it easy to put all the data in one object.

stackFinancials <-
function(symbols, type = c("BS", "IS", "CF"), period = c("A", "Q")) {
  # Ensure the type and period arguments match viewFinancials
  type <- match.arg(toupper(type[1]), c("BS", "IS", "CF"))
  period <- match.arg(toupper(period[1]), c("A", "Q"))

  # Simple function to get financials for one symbol
  getOne <- function(symbol, type, period) {
    gf <- getFinancials(symbol, auto.assign = FALSE)
    vf <- viewFinancials(gf, type = type, period = period)
    # Put viewFinancials output into a data.frame
    df <- data.frame(vf, line.item = rownames(vf), type = type,
                     period = period, symbol = symbol,
                     stringsAsFactors = FALSE, check.names = FALSE)
    # Reshape data.frame into long format
    long <- reshape(df, direction="long", varying=seq(ncol(vf)),
                    v.names="value", idvar="line.item",
                    times=colnames(vf))
    # Reset row.names to "automatic"
    rownames(long) <- NULL
    # Return data
    long
  }
  # Loop over all symbols
  allData <- lapply(symbols, getOne, type = type, period = period)
  # rbind() all into one data.frame
  do.call(rbind, allData)
}

Here’s a simple example of how to use stackFinancials() to pull the quarterly (period = "Q") income statements (type = "IS") for General Electric and Apple:

library(quantmod)
Data <- stackFinancials(c("GE", "AAPL"), type = "IS", period = "Q")
head(Data, 4)
##                line.item type period symbol       time value
## 1                Revenue   IS      Q     GE 2016-12-31 33088
## 2   Other Revenue, Total   IS      Q     GE 2016-12-31    NA
## 3          Total Revenue   IS      Q     GE 2016-12-31 33088
## 4 Cost of Revenue, Total   IS      Q     GE 2016-12-31 24775

Now that we have the output in Data, let’s do something with it. You could simply subset Data to extract the components you want. For example, if you wanted to look at Apple’s quarterly revenue, you could subset Data where symbol == "AAPL" and line.item == "Total Revenue". But if you’re going to slicing-and-dicing a lot, it can often help to write a general function to simplify things. So I wrote extractLineItem(). It takes the output of stackFinancials() and a regular expression of the line item you want, and it returns an xts object that contains the given line items for all symbols in the data.

extractLineItem <- function(stackedFinancials, line.item) {
  if (missing(stackedFinancials) || missing(line.item)) {
    stop("You must provide output from stackFinancials(),",
         "and the line.item to extract")
  }
  # Select line items matching user input
  match.rows <- grepl(line.item, Data$line.item, ignore.case = TRUE)
  sfSubset <- Data[match.rows,]
  getItem <- function(x) {
    # Create xts object
    output <- xts(x$value, as.yearmon(x$time))
    # Ensure column names are syntactically valid
    valid.names <- make.names(paste(x$symbol[1], x$line.item[1]))
    # Remove repeating periods
    colnames(output) <- gsub("\\.+", "\\.", valid.names)
    output
  }
  # Split subset by line.item and symbol
  symbol.item <- split(sfSubset, sfSubset[, c("symbol", "line.item")])
  # Apply getItem() to each chunk, and merge into one object
  do.call(merge, lapply(symbol.item, getItem))
}

Let’s use extractLineItem() to compare total revenue for GE and AAPL.

totalRevenue <- extractLineItem(Data, "total revenue")
totalRevenue
##          AAPL.Total.Revenue GE.Total.Revenue
## Dec 2015              75872            24654
## Mar 2016              50557            27845
## Jun 2016              42358            61339
## Sep 2016              46852            90605
## Dec 2016              78351            33088
plot(totalRevenue, main = "Quarterly Total Revenue, AAPL (black) vs GE (red)")

You could also combine multiple calls to extractLineItem() to calculate ratios not included in the output from viewFinancials(). For example, you could divide operating income by total revenue to calculate operating margin.

operatingIncome <- extractLineItem(Data, "operating income")
operatingIncome 
##          AAPL.Operating.Income GE.Operating.Income
## Dec 2015                 24171                2863
## Mar 2016                 13987                 545
## Jun 2016                 10105                4736
## Sep 2016                 11761                6138
## Dec 2016                 23359                2892
plot(operatingIncome / totalRevenue, main = "Quarterly Operating Margin, AAPL (black) vs GE (red)")