Excel 32bit and 64bit usage, what you don’t read on the internet

I had a phone call from an old acquaintance a while back asking for some advice on the use of Microsoft Excel.  In previous lives I have been heavily involved with Excel development such as...
  1. Spreadsheet I built from scratch for an investment bank with 15,000 lines of VBA and interfaces to Reuters real-time data, Bloomberg real-time data, Factset and a custom S-Plus quant model.
  2. Vendor portfolio modelling and rebalancing solution entirely written in Excel VBA deployed on-site at many global asset management firms
  3. Server-side solution that created Excel worksheets based upon database configuration to provide vastly scalable portfolio modelling, compliance and rebalancing to asset managers.
  4. Re-engineered an Investment Bank proprietary trading Excel based system to drastically improve performance, remove redundant code, increase reliability and remove accumulated bugs and “features”.
If you have time prior to reading this, a brief look at these links will help:
http://www.jkp-ads.com/articles/apideclarations.asp
http://msdn.microsoft.com/en-us/library/ee691831(office.14).aspx
http://www.utteraccess.com/wiki/index.php/Conditional_Compliation

Quite a lot of work I have done in this arena has been along the lines of “We need this to work, it usually works but it’s unreliable.  Can you go figure out and fix it? Oh, and we need it to work for quarter end in three weeks and if it fails we miss a regulatory reporting deadline and then we get into trouble with group compliance. Oh, we don’t have a proper desk for you so can you sit next to the cash equity traders?”
So, the question was around the use of a single spreadsheet in Excel 2013.  The spreadsheet works on most machines and not on one.  As always, it’s a pretty critical spreadsheet but the guy who wrote it and his subsequent replacements did not provide any documentation and since none of them work there anymore, the users are hanging in the wind. A familiar story...
A bit of poking around in the code showed that it was a mixture of decent VBA and some dreadful design patterns, naming conventions (a mixture of “proper” Hungarian notation, some MSDN cut and pastes, some obfuscation and some obscure, geeky and or bizarre), redundant code, inaccessible code and all of the mess that build up over time.
Anyway, after that initial examination I saw a number of Win32 API calls.  I asked the acquaintance to run msinfo32 from the command line on the PC where this failed and on a PC where it worked.  This revealed a nice and simple difference – the failing PC is x64 based, the rest are 32 bit computers.
A few modifications to the win32 API declarations in the code and a modified version of the spreadsheet worked on the 64 bit PC.  A few further modifications to the code and through the joys of conditional compilation it will now work on 32bit and 64bit computers.
One further task I need to look at is the impact of this on the size of the workbook that contains the code.
A simple worked example:
Notes:
All steps executed on a single PC running Win8.1 and Excel 2013
All strings quoted should have the quote marks removed if you wish to repeat this process
Step 1
I create a spreadsheet with one Worksheet called ExcelOptimisation.xlsm.  I make the following changes:
In A1 I type the following “dim strABCDEF” (remove the quotes if you want to replicate this)
In A2 I type the following “ as string” (there is a space before “as”)
In B1 I type “1”
In B2 I type “2”
In B3 I type “3”
I now use the autofill feature to drag this number series down to cell B5000
In C1 I type “=CONCATENATE(A$1,B1,A$2)”
I now use the autofill feature to drag this number series down to cell C5000
I now save and close ExcelOptimisation.xlsm.  The resulting file is 102,460 bytes.


Step 2
I create a spreadsheet with one Worksheet called UncompiledExample.xlsm.  I make the following changes:
I add a single regular module
To the added module I add Option Explicit at the top of the window
Below Option Explicit I copy and paste the contents of ExcelOptimisation.xlsm sheet1 cells C1:C5000
So I now have 5000 declarations of the form Dim strABCDEF1 As String… Dim strABCDEF5000 As String
I now add a simple subroutine
Sub HelloWorld()
MsgBox "Hello World"
End Sub
At this point I close and save this file and I see UncompiledExample.xlsm as 95,699 bytes.

Step 3
In Windows Explorer I copy UncompiledExample.xlsm and name the copy CompiledExample.xlsm
I open CompiledExample.xlsm in Excel, compile the VBA and save and close this file.
The resulting file is 123,732 bytes

Step 4
In Windows Explorer I copy CompiledExample.xlsm and name the copy CleanedCompiledExample.xlsm
I open CleanedCompiledExample.xlsm in Excel, delete the 5000 declarations, compile the VBA and save and close this file.
The resulting file is 91,076 bytes.

Step 5
I create a spreadsheet with one Worksheet called OnlyHelloWorld.xlsm.  I make the following changes:
I add a single regular module
To the added module I add Option Explicit at the top of the window
Below Option Explicit I now add a simple subroutine
Sub HelloWorld()
MsgBox "Hello World"
End Sub
At this point I close and save this file and I see OnlyHelloWorld.xlsm as 11,007 bytes.
So – let’s look at this….
OnlyHelloWorld.xlsm and CleanedCompiledExample.xlsm both have only one worksheet which is entirely blank.
OnlyHelloWorld.xlsm and CleanedCompiledExample.xlsm both have only one module which has four lines –

Option Explicit
Sub HelloWorld()
MsgBox "Hello World"
End Sub

OnlyHelloWorld.xlsm is 11KB and CleanedCompiledExample.xlsm is 89KB. What’s going on there?  They are visually and structurally identical!!!
Why? Simply put, the Excel VBA Compiler is a very nifty piece of kit, but it is not optimised for removal of variables.  So when a variable is created a token is used in the compiled code to represent that variable.  But when the variable is deleted from the code, the token remains.  The token is not seen by the user so cannot manually be removed from the development environment.

What’s the impact of this?
Well, for most spreadsheets, most of the time, absolutely nothing.  But when you are writing VBA heavy, advanced spreadsheets used in financial modelling, real time data analysis, portfolio modelling and so on, this causes problems.  Excessive memory usage in spreadsheets slows down processing.  Not usually a massive slowdown but one that can be observed in advanced VBA systems

How do you resolve this problem of spreadsheet bloat?
There are a number of techniques.  One technique I created and recommend is the use of a spreadsheet industrialisation technique.  Simply put, the use of outside code to clean up and remove all of these lost variables and reclaim the memory.  This spreadsheet industrialisation is not needed for simple Excel spreadsheets but does come into play in the VBA heavy environments of most Investment Bank trading rooms.

For a bloated single spreadsheet, the way to resolve this issue manually is to create an identical spreadsheet with all of the same user interface elements.  Then re-create the VBA environment by creating standard and class modules with the same names as the bloated spreadsheets modules.  Now cut and paste the code from the bloated spreadsheet modules over to the new spreadsheet modules.  Compile and save.  You now have a bloated spreadsheet and a visually and structurally identical spreadsheet.  Look at the file sizes and you will see the difference in file sizes.
When I have some more time and access to 32 and 64 bit hardware, I will look at the impact of 32 and 64 bit dual use spreadsheets.  Specifically, does the move between these platforms cause a more advanced compilation to remove the tokens and unbloat the worksheet?

Comments