FIX k8s Gateway

Table of Contents

This is the full prompt referenced in Domain Experts Dominate AI. It demonstrates what domain expertise looks like when it meets AI tooling — not a one-liner, but a complete specification prompt shaped by years of operating FIX sessions in production, as well as years of Kubernetes experience.

FIX k8s Gateway — Architecture & CRD Design #

Objective #

Design the FIX k8s Gateway — a large-scale Kubernetes-based FIX gateway. One YAML. One route. One connection to the right engine — at scale, with zero session awareness in the gateway itself.

The gateway inspects inbound FIX Logon messages, evaluates declarative CRD routing rules, and transparently proxies the TCP stream to the matched engine. It never participates in the FIX session. Optional TLS termination at the gateway; plaintext to engines within the cluster, relying on Kubernetes CNI-level network security (Cilium encryption, Calico WireGuard, Istio mTLS) for in-cluster transport protection.

Phase 1 (this project): FIX acceptor gateway — inbound routing. Phase 2 (future): FIX initiator — outbound session management.

Deliverables: architecture docs, CRD schemas, Helm chart. No application source code.

Context #

Greenfield multi-asset FIX infrastructure on Kubernetes 1.29+
Protocol-aware, session-transparent: best-effort Logon parse → route → TCP pipe. No validation, no sequence tracking, no checksums — engines own the session
Standards-compliant per the FIX Trading Standards — version-agnostic by design, supporting any FIX protocol version
Custom tags and dialect deviations are mandatory. Non-standard tags in Logon messages may originate from counterparty requirements OR from the FIX gateway operator’s own routing/compliance needs. The gateway extracts and routes on both equally
One FIXSession CRD = one route = one Kubernetes Service (Ingress-style: Service name reference, user controls what’s behind it)
Operator manages routing only — no engine lifecycle. Missing Service? Log WARNING, mark CRD degraded, still register the route. Service appears later? Status updates automatically
Default reject engine (optional): when enabled and no match exists and no default route is configured, a built-in lightweight service sends FIX Logout with reject reason. When disabled or when a default route IS configured, the reject engine is not used. The reject engine emits OpenTelemetry spans, metrics, and logs like every other gateway component
Backend engines are out of scope

Deliverable Tree #

fix-gateway/
├── README.md                              # Hybrid: compelling product intro, then technical docs
├── architecture/
│   ├── system-design.md
│   └── diagrams/                          # Mermaid .md files
├── crds/
│   ├── fixsession-crd.yaml
│   └── examples/
│       ├── equities-single-field.yaml     # SenderCompID49 → equities-engine-svc
│       ├── fi-multi-field.yaml            # SenderCompID49 + DeliverToCompID128 + SenderSubID50
│       ├── algo-custom-tag.yaml           # SenderCompID49 + AlgoStrategy20001
│       ├── redacted-tags.yaml             # Per-tag redaction config
│       ├── wildcard-default.yaml          # Default fallback with priority
│       └── multi-default-cascade.yaml     # Cascading wildcard priorities
├── decisions/
│   ├── adr-001-tcp-proxy-technology.md
│   ├── adr-002-routing-and-conflicts.md
│   ├── adr-003-tcp-proxy-vs-fix-termination.md
│   ├── adr-004-tls-termination.md
│   └── adr-005-acceleration-roadmap.md
├── helm/
│   └── fix-gateway/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
│           ├── gateway-deployment.yaml
│           ├── gateway-daemonset.yaml
│           ├── default-reject-engine.yaml
│           ├── operator-deployment.yaml
│           ├── service.yaml
│           ├── serviceaccount.yaml
│           ├── clusterrole.yaml
│           ├── clusterrolebinding.yaml
│           ├── otel-collector-config.yaml
│           └── crds/
│               └── fixsession-crd.yaml
└── reference/
    ├── tag-field-reference.md
    └── otel-collector-setup.md

Architecture Sections (`system-design.md`) #

Minimum 150 words per section. Concrete specifics, not hand-waving.

1. Technology Selection #

TCP proxy research (REQUIRED): Evaluate HAProxy TCP mode (tcp-request content inspection), Envoy (TCP filter chain + Wasm/Lua), Nginx stream module, and purpose-built proxies. Can an existing proxy be extended with a FIX Logon parser vs. building custom? Assess: protocol extensibility, TLS performance, Kubernetes integration maturity.

FIX Logon parsing: Best-effort tag=value\x01 extraction only. No session management. A lightweight custom parser likely beats a full FIX engine — evaluate explicitly.

Operator framework: controller-runtime/kubebuilder vs Operator SDK vs kube-rs (match to chosen language).

Recommend and justify. No comparison tables — make decisions.

2. Component Architecture #

Gateway: optional TLS termination, TCP listen, Logon buffer/parse, routing eval, bidirectional proxy. Plaintext to engines within cluster — document reliance on CNI-provided network security (Cilium transparent encryption, Calico WireGuard, Istio mTLS) rather than gateway-managed encryption for east-west traffic
Default reject engine (optional, Helm-toggled): accepts unmatched connections when no default route exists, sends FIX Logout. Fully instrumented with OpenTelemetry — not a telemetry blind spot
Operator: CRD watch → conflict validation → Service existence check (Ingress-style WARNING on missing) → routing table reconciliation. CRITICAL log + block on conflicts. Runs as a leader-elected Deployment — leader election via coordination.k8s.io/v1 Lease ensures exactly one operator reconciles at any time, even during rolling updates where old and new pods briefly overlap. This prevents concurrent reconciliation from producing conflicting routing tables. (Different from replicas: 1 alone, which does NOT prevent dual-active during rollouts.)
Engine services: out of scope, referenced by CRD target
Mermaid component diagram REQUIRED

3. Connection Flow #

TLS handshake (if TLS listener) → TCP accept → buffer until Logon boundary → extract routing fields → evaluate rules:
- Match found, endpoints ready: proxy to Service
- Match found, no ready endpoints: hold the client TCP session open (configurable timeout). Log at ERROR when hold begins (no ready endpoints is abnormal). Watch for endpoints to become ready; connect when they do. If hold timeout expires: log at CRITICAL (connection will be dropped, requires operator attention), then close. Hold subject to maxHeldConnections per CRD (default 1) and global per-pod cap (default 500)
- No match, default route configured: proxy to default route
- No match, no default route, reject engine enabled: proxy to reject engine
- No match, no default route, reject engine disabled: close TCP connection, log at WARN with source IP, port, and all received bytes
- Parse failure: reject engine (if enabled) or close, log raw bytes at WARN
Gateway never sends FIX responses (except via optional reject engine)
Bidirectional forwarding, disconnect cleanup, idle timeout
Connection count tracking: the gateway tracks active connection count per CRD route for observability (exposed as OTel gauge metric) but does NOT enforce connection limits — the engine decides whether to accept or reject concurrent sessions
Mermaid sequence diagram REQUIRED

4. TLS Termination (Optional) #

Per-listener TLS/plaintext toggle in Helm values. TLS is optional — deployments behind a TLS-terminating load balancer or within a trusted network can run plaintext-only
TLS listeners terminate encryption at gateway; plaintext to engines
In-cluster transport security is NOT the gateway’s responsibility — document that CNI-level encryption (Cilium, Calico WireGuard, Istio/Linkerd mTLS) provides east-west encryption. The gateway trusts the cluster network
Certificate management: K8s Secret refs, cert-manager option
SNI support: document interaction with Logon routing
Min TLS version + cipher suite config

5. CRD Design — FIXSession #

Target model (Ingress-style): target references a Kubernetes Service by name — like backend.service.name in Ingress. Operator watches: Service missing → WARNING + TargetResolvable=False. Service appears → auto-update to True. Runtime: matched route with no ready endpoints → hold client TCP session (configurable hold timeout in CRD or global Helm config), log at ERROR that no ready endpoints exist, watch Endpoints resource, connect when ready. If hold timeout expires: log at CRITICAL, close connection.

Tag naming — PascalCase<TagNumber>: SenderCompID49, DeliverToCompID128, AlgoStrategy20001. Tag number = parse key. PascalCase prefix = human readability + log labels. No separate dialect CRD needed.

Custom tags in Logon: any tag is a first-class routing field — standard FIX tags and non-standard tags alike. Non-standard tags may be counterparty-originated (counterparty sends proprietary fields) OR operator-originated (the gateway operator requires specific tags for internal routing, compliance metadata, or algo classification). Both are handled identically.

Match fields:

Single: SenderCompID49: "BROKER_A" → svc
Multi AND: SenderCompID49 + DeliverToCompID128 + SenderSubID50 → svc
Custom: SenderCompID49 + AlgoStrategy20001: "VWAP" → svc
Wildcard/default routes with priority/weight (wildcards ONLY — exact matches use conflict detection instead)

Per-tag logging and redaction:

matchFields:
  SenderCompID49:
    value: "BROKER_A"
    log: true           # (default) included in telemetry
  AlgoStrategy20001:
    value: "VWAP"
    redact: true         # appears as [REDACTED] everywhere
  ComplianceID9001:
    value: "RESTRICTED"
    log: false           # invisible in all telemetry

Redaction enforced across logs, OTel spans, and metric labels.

Connection hold limits:

spec:
  maxHeldConnections: 1   # default 1 — most sessions are 1:1 with a pod
  holdTimeout: 30s

Per-CRD maxHeldConnections (default 1) caps how many connections can be held waiting for endpoints on that route. Once the cap is reached, additional connections that arrive while endpoints are unready are immediately closed with a CRITICAL log. A global per-gateway-pod cap (gateway.maxHeldConnectionsPerPod, default 500) provides a safety ceiling across all routes.

CRD deletion with active sessions (CRITICAL behavior): when a FIXSession CRD is deleted while live proxied connections exist on that route, the gateway MUST NOT disconnect those sessions. Instead: log at ERROR identifying the deleted CRD and the number of orphaned active connections. The sessions continue proxying until the counterparty or engine disconnects naturally. The operator can observe the ERROR and re-apply the CRD to restore healthy state (the gateway would then clear the error and re-associate the sessions with the restored CRD). New connections matching the deleted CRD’s former fields will follow normal evaluation (likely no match → default route or reject engine). Document this “orphan session” lifecycle explicitly.

Conflict validation: operator validates full CRD set is conflict-free. Overlap detected → CRITICAL log naming both CRDs + overlapping fields → block update → hold last-known-good table. Define the detection algorithm explicitly. When multiple CRDs arrive simultaneously (batch kubectl apply), the architecture doc MUST address reconciliation strategy — whether the operator debounces (waits for a settle window before validating the full set atomically) or processes individually (re-validating after each). The agent should evaluate both approaches, decide, and document the rationale.

Status: active connections, held connections (waiting for endpoints), orphaned connections (CRD deleted but sessions live), last connection/routing timestamps, target resolution, endpoint readiness.

Routing decision logs: every decision emits: CRD name (or NO_MATCH/DEFAULT_ROUTE/REJECT_ENGINE), extracted tags (per redaction rules), target, source endpoint, listener. If connection is held waiting for endpoints, log ERROR at hold start and CRITICAL if hold times out (with hold duration).

Full CRD YAML with OpenAPI v3 validation schema.

6. Routing Evaluation Engine #

CRD rules compiled to in-memory lookup
Evaluation order: most-specific multi-field match → less specific → wildcard by priority → default route → reject engine (if enabled) → close
Conflict-free invariant: operator validates at admission, runtime assumes unambiguous
Hot-reload: CRD changes propagate without dropping live connections, held sessions, or orphaned sessions. New rules = new connections only
Algorithm described in implementable detail

7. Deployment Topology #

Deployment mode: HPA-enabled, pod anti-affinity on kubernetes.io/hostname (preferred by default, configurable to required)
DaemonSet mode: gateway on every node (or labeled subset). Toggle via gateway.mode in values.yaml
L4 load balancer for TCP (not HTTP). No LB-level session affinity needed
Operator: leader-elected Deployment using coordination.k8s.io/v1 Lease. Replicas set to 2+ for availability — only the lease holder reconciles; standby takes over on failure within the lease duration. Document why replicas: 1 alone is insufficient (dual-active during rollout, no automatic failover)
Default reject engine (if enabled): 1-2 replica Deployment
Network policy: gateway → engine Services; operator → API server; reject engine ← gateway only
In-cluster security: document that network policies + CNI encryption replace gateway-to-engine TLS
Mermaid deployment diagram REQUIRED

8. Helm Chart #

gateway:
  mode: deployment  # or "daemonset"
  replicas: 3
  antiAffinity:
    enabled: true
    type: preferred  # or "required"
  holdTimeout: 30s
  maxHeldConnectionsPerPod: 500
  listeners:
    - name: fix-tls
      port: 9876
      tls:
        enabled: true
        secretName: fix-gateway-tls
        minVersion: "1.2"
    - name: fix-plain
      port: 9877
      tls:
        enabled: false
defaultRejectEngine:
  enabled: true       # set false to disable
  replicas: 2
  rejectReason: "No matching session configuration"
operator:
  replicas: 2         # leader-elected, only one active
  leaseNamespace: ""  # defaults to release namespace
otel:
  collectorEndpoint: "otel-collector.observability:4317"
  protocol: grpc      # or "http"
  insecure: false
  tlsSecretName: ""   # optional TLS for collector connection

Both Deployment + DaemonSet templates (one active per mode). CRDs via crds/ directory.

9. TLS Offload and TCP Forwarding Acceleration #

The latency that matters is TLS termination and TCP forwarding — not Logon parsing or routing (which happen once per connection).

Phase 1 (this project): Software-only on Kubernetes. Tuning: TCP_NODELAY, SO_REUSEPORT, busy-poll, CPU pinning, NUMA-aware scheduling
Phase 2: Kernel bypass for TCP forwarding — DPDK, Solarflare OpenOnload, AF_XDP eBPF fast path. Logon parse and routing stay in software; bidirectional TCP proxy moves to kernel-bypass
Phase 3: FPGA/SmartNIC offload for TLS termination and TCP forwarding — AMD Alveo, Altera SmartNICs. TLS decrypt in hardware, TCP splice in FPGA fabric. Routing stays in software (too dynamic for hardware). Document the offload boundary clearly

For each phase: expected latency, infrastructure requirements, deployment topology changes.

10. Observability — OpenTelemetry #

All telemetry via OpenTelemetry. No vendor-specific instrumentation. Every component emits OTel — gateway, operator, and reject engine (when enabled).

Spans: per-connection from TCP accept → routing → proxy (or hold → connect). Children: TLS handshake, Logon parse, rule eval, endpoint hold (if waiting), upstream connect. Attributes include extracted tags (per redaction), routing decision, listener, TLS metadata
Metrics: active connections per CRD (gauge — tracked for observability, not enforced), held connections waiting for endpoints (gauge), orphaned connections (gauge — CRD deleted but sessions live), routed per CRD (counter), reject engine connections (counter), eval latency (histogram), rejections by reason (counter: no_match/parse_failure/target_unreachable/hold_timeout/hold_cap_reached), TLS failures (counter), bytes proxied (counter)
Logs: OTel-correlated structured logs for every routing decision (redaction-aware), hold events (ERROR on hold start, CRITICAL on timeout, INFO on successful connect after hold), orphan events (ERROR on CRD delete with active sessions), conflict events, connection lifecycle, TLS failures
Redaction enforcement: log: false → absent from all telemetry. redact: true → [REDACTED] everywhere
Alerts: conflict detection, sustained rejections, target unreachable, TLS cert expiry, reject engine rate spikes, high held-connection count, orphaned sessions present (CRD deleted with live connections), hold cap reached

11. OpenTelemetry Collector Configuration #

Deployment and configuration guide — not just application-side instrumentation:

Collector deployment: recommended mode for this gateway — DaemonSet (node-level) vs. Deployment (centralized). Include working otel-collector-config.yaml in Helm chart
Receiver configuration: OTLP gRPC (4317), OTLP HTTP (4318). Document how gateway, operator, and reject engine connect to the collector via otel.collectorEndpoint in Helm values
Processor pipeline: batch processor (batch size, timeout), memory limiter, attribute processor for global enrichment (cluster name, namespace, gateway version)
Exporter configuration: example configs for common backends — Prometheus (metrics), Jaeger/Tempo (traces), Loki/Elasticsearch (logs)
Sampling strategy: 100% sampling for routing decisions and errors; configurable rate for normal proxy span completion. Document head-based vs. tail-based for high-throughput FIX environments
Resource detection: k8sattributes processor for pod name, node, namespace
Include reference/otel-collector-setup.md with step-by-step deployment instructions

ADR Format #

# ADR-NNN: [Title]
## Status: Proposed
## Context: [Why this decision is needed]
## Options Considered: [Minimum 3 options with pros/cons]
## Decision: [What was chosen]
## Consequences: [What this enables and what it costs]

Allowed Actions #

Create directories and files
Research TCP proxies (HAProxy, Envoy, Nginx stream), kernel bypass (DPDK, OpenOnload, AF_XDP), FPGA/SmartNIC TLS offload, K8s operator frameworks, Kubernetes Ingress Service-reference patterns, OTel Collector deployment patterns, CNI encryption options, leader election patterns, and prior art
Mermaid diagrams, YAML CRDs with OpenAPI v3, Helm templates, OTel Collector configs

Forbidden Actions #

No application source code
No Dockerfiles or CI/CD
No test files
No engine pod design — out of scope
No FIX session concerns in the gateway (sequence numbers, heartbeats, validation). Exception: optional reject engine sends FIX Logout
No placeholder text (“TBD”, “TODO”)
State uncertainty explicitly — never hallucinate library features

Stop Conditions — Pause and Ask When: #

Conflict detection algorithm has silent ambiguity edge cases
CRD field design hits a simplicity vs. expressiveness tradeoff
Gateway must participate in FIX protocol beyond the optional reject engine
Best-effort parsing can’t reliably extract custom tags
Per-tag redaction creates observability vs. compliance tension
Connection hold behavior creates resource exhaustion risk beyond the per-CRD and per-pod caps
CRD batch reconciliation strategy has consistency implications the agent can’t resolve
Orphan session lifecycle creates a state management problem that needs clarification
An existing TCP proxy (HAProxy, Envoy) handles 80%+ of requirements with a plugin
Any section needs implementation-level detail to be meaningful

Checkpoints #

✅ after each phase:

Research — TCP proxies, kernel bypass, FPGA TLS offload, Ingress patterns, OTel Collector patterns, CNI encryption, leader election, operator frameworks
ADRs — all five decision records
CRD — FIXSession definition + all examples including redaction and hold config
Architecture — system-design.md, all 11 sections
Helm — Deployment/DaemonSet modes, anti-affinity, listeners, optional reject engine, OTel Collector config, hold limits
Reference — tag field reference, OTel Collector setup guide, README (hybrid tone: product intro + technical docs)
Final review — cross-refs consistent, no FIX session leakage (except optional reject engine), redaction applied across CRD + logs + OTel, Ingress-style Service refs correct, OTel Collector config validates, session hold caps documented, orphan session lifecycle documented, CRD batch reconciliation addressed