Principal Component Analysis in Google Sheets for Portfolios

John Mihalik
4 min readOct 16, 2020
Photo by Markus Spiske on Unsplash

Principal Component Analysis (PCA) is a process to help understand the most important aspects of a data set. These important aspects are the “principal components” of the dataset, which can help to show you the most important features/attributes, key groupings or clusters of the data, and also help to reduce the dimensionality of large datasets with possibly thousands of features.

In the context of financial analysis, PCA can help show you what stocks or securities are similar in a portfolio, identify a simplified financial model for entry/exit signals, or even understanding fundamental bond yield curve movements.

For this article, I’ll concentrate on portfolio analysis building upon the work in my previous article on Portfolio Standard Deviation and Correlation Matrix Calculator in Google Sheets. We’ll add to the Google Sheet created there to automate the PCA with a simple button click. The link for new version of this Google Sheet is:

https://docs.google.com/spreadsheets/d/1mzCCIFbAf5K4Q2AAIF1A8cQ0cf8IY8XYwlOPyVihmL0/edit?usp=sharing

To open this link you will need a free Google account.

Click the link above and when the sheet opens choose “File -> Make a Copy” from the Sheets main menu. This will copy it to your own Google Sheets account where you can edit your own copy and you are ready to go.

Here’s how it all works!

Eigenvector Decomposition of the Correlation Matrix

In my last article, we calculated the correlation matrix of a portfolio of stocks and securities. We can use the correlation matrix as the basis for a Principal Component Analysis. We can do this by calculating the eigenvectors and eigenvalues of the correlation matrix. The eigenvectors with the highest eigenvalues will turn out to be the principal components.

Unfortunately, Google Sheets does not have a built in function for eigenvector decomposition. So, I created a custom function to do that and bundled it up as a Google AppScript. You can execute the custom function by simply clicking the “Run PCA” button.

Power Iteration and Deflation

The algorithm used to do the eigenvector decomposition is Power Iteration. Power iteration is perhaps one of the most simple algorithms to implement and for smaller matrices its computationally easy. This is important as Google AppScripts have strict performance restrictions and the computational power is limited.

Power iteration is an iterative algorithm that starts with a random guess vector. That vector is then multiplied by the original matrix and then normalized. This is repeated and can be shown to converge to the dominant eigenvector.

Once we have the dominant eigenvector, we can use the process of deflation to find the remaining eigenvectors and eigenvalues. Deflation basically “peels off” the contribution of the last eigenvector from the original matrix.

Where lambda is the last computed eigenvalue and “a” is the corresponding eigenvector. We then can use power iteration again to find the next most dominant eigenvector.

You can check out the complete source code for the Google AppScript on my GitHub page here:

Visualizing Principal Components

Once we have the principal components calculated, we can visualize the results to get a better understanding of the portfolio correlations. This is automatically displayed from the spreadsheet when you enter your stock symbols.

From the chart below, we can see that the first principal component is by far the most dominant.

Graphing the principal component values of each stock for the first and second principal components lets us see some interesting differences in the securities.

Charting PCA1 versus PCA2 is also another interesting way to see groupings of securities.

--

--