FIX implementation: Cost cutting proposal part I

The last few years have seen an incredible focus on cost cutting. I've seen a few implementations over this period where firms have picked up the low hanging fruit options. So several firms have switched from Solaris/Sparc to Linux/x86 hardware.
This post is going to look at a more advanced architectural option.

Look at a regular bank FIX implementation (so for client buy-side flow, not prime brokerage DMA or any other flow that is very latency sensitive) of a FIX engine environment.
  • Multiple physical servers sat behind a load balancer pair that exposes one IP address per each extranet (so one IP on TNS, another on BT Radianz, one for Fixnetix and so on).
  • Each client FIX session will be bound uniquely to one of the FIX engines sat behind the load balancer. 
  • Each FIX engine will be bound to a paired high availability engine running at the same site as a hot standby.
  • FIX engines communicate via some message oriented middleware to the order management system.
  • The FIX engines sit within a DMZ model - one firewall on the outside, one between the FIX engine and the bank network core.
  • The FIX engines access leased lines or extranets for physical connectivity.
  • Any internet connectivity is segregated from the rest of the bank internet connectivity (dedicated physical infrastructure).
  • The FIX engines typically need to have a dedicated VLAN such that the high throughput of messages is segregated from the rest of the network.
Note that the above list is high level, I could go on at length around the degree of redundancy, n-1 deployment of code and all manner of other more geeky stuff, but I want to keep this high level enough to avoid the more business facing reader falling asleep.

Within the sort of set-up as described above we see that each FIX engine generates a lot of log files to allow staff to investigate issues.  This is a pretty standard. Any modern FIX engine will not have logging on the critical path for message throughput and processing, but the process of writing log files requires compute and other system resources.

Remember also that some FIX engine vendors license their software on a per server basis.

So - what can be done?

Implement one of the solutions available to run packet capture on the VLAN containing the FIX engines. Stop running logging on the FIX engines. Use the packet capture files (when re-assembled from raw TCP/IP dump data) to actually perform the FIX engine logging.

This offloads compute and disk i/o from the FIX engine servers. Results depend on the FIX engine vendor, hardware in use and many other variables, but a meaningful reduction in FIX engine server processor load should be observed. Which will then allow for a reduction in the number of physical servers used.
 
This is not a one-size-fits-all model, implementing this requires analysis of:
  • internal network topology
  • FIX engine hardware
  • HA model
  • DR model
  • VLAN capabilities
  • number of client FIX connections
  • electronic trading SLA to the core business
  • pure cost/benefit analysis
My experience is that many banks have implemented FIX over the last 15 years and simply updated hardware over time in a periodic technology refresh but they have never re-engineered the environment.  That re-engineering is the new way to cut costs...
 
And here's a picture: