- FIX engine application status
- Hardware and OS that run the FIX engine status
- Network connectivity between FIX engine HA pairs
- Network connectivity between FIX engine and outside world
- Network connectivity between FIX engine and sell-side internal systems
- Packet capture devices
- Middleware hardware
Note that this full-stack monitoring is in addition to the specialised FIX engine monitoring provided by systems as listed in a previous post "Production support software". There is also a requirement for "Testing and Certification Software" which we have previously covered here.
At the edge of the sell-side network will be a series of routers connected to extranets such as TNS, IPC, BT Radianz, Fixnetix and similar for buy-side clients that connect directly. Over those connections will also be connections to FIX hubs such as ThomsonReuters Autex, Fidessa Express, Ullink MCS, LSE Hub and others.
The sell-side may also have a series of VPN connections over the internet. Ideally the trading infrastructure will not share the sell-side internet connection for browsing and other interactive services. I have seen cases where internet browsing traffic has created a "denial-of-service" for trading traffic. It's possible to implement traffic shaping to mitigate that problem but based on experience I would advise a distinct infrastructure.
Since this post is not covering router, switch and firewall topology in depth I will gloss over that, perhaps to cover that in a further post.
Next up are FIX engines. A sensible design pattern is for a primary datacentre to run FIX engines in High Availability pairs. Depending on traffic volumes multiple pairs may be required.
The servers that are paired up in an HA pair need to be able to communicate with each other very quickly to allow for graceful failover. All of the commercial FIX engines I have seen that implement HA use variations on a theme of heartbeats between the pair and the survivor in a failover issuing a "Gratuitous Address Resolution Protocol" request - GARP. Some implementations also look for well known network addresses to ensure that it's not a network failing rather than a server failing.
- reduced operational total cost of ownership
- reduced headcount needed for operational support
- improvement in uptime
- reduction in operational risk
- reduction in cost of regression testing required when sell-side systems change
- anti-fragile - removing single points of failure and "working by accident" technology
FIX Testing and Certification: Implementation
What is a FIX engine?