Skip to main content
  1. Prompts/

FIX k8s Gateway

··13 mins

This is the full prompt referenced in Domain Experts Dominate AI. It demonstrates what domain expertise looks like when it meets AI tooling — not a one-liner, but a complete specification prompt shaped by years of operating FIX sessions in production, as well as years of Kubernetes experience.

FIX k8s Gateway — Architecture & CRD Design #

Objective #

Design the FIX k8s Gateway — a large-scale Kubernetes-based FIX gateway. One YAML. One route. One connection to the right engine — at scale, with zero session awareness in the gateway itself.

The gateway inspects inbound FIX Logon messages, evaluates declarative CRD routing rules, and transparently proxies the TCP stream to the matched engine. It never participates in the FIX session. Optional TLS termination at the gateway; plaintext to engines within the cluster, relying on Kubernetes CNI-level network security (Cilium encryption, Calico WireGuard, Istio mTLS) for in-cluster transport protection.

Phase 1 (this project): FIX acceptor gateway — inbound routing. Phase 2 (future): FIX initiator — outbound session management.

Deliverables: architecture docs, CRD schemas, Helm chart. No application source code.

Context #

  • Greenfield multi-asset FIX infrastructure on Kubernetes 1.29+
  • Protocol-aware, session-transparent: best-effort Logon parse → route → TCP pipe. No validation, no sequence tracking, no checksums — engines own the session
  • Standards-compliant per the FIX Trading Standards — version-agnostic by design, supporting any FIX protocol version
  • Custom tags and dialect deviations are mandatory. Non-standard tags in Logon messages may originate from counterparty requirements OR from the FIX gateway operator’s own routing/compliance needs. The gateway extracts and routes on both equally
  • One FIXSession CRD = one route = one Kubernetes Service (Ingress-style: Service name reference, user controls what’s behind it)
  • Operator manages routing only — no engine lifecycle. Missing Service? Log WARNING, mark CRD degraded, still register the route. Service appears later? Status updates automatically
  • Default reject engine (optional): when enabled and no match exists and no default route is configured, a built-in lightweight service sends FIX Logout with reject reason. When disabled or when a default route IS configured, the reject engine is not used. The reject engine emits OpenTelemetry spans, metrics, and logs like every other gateway component
  • Backend engines are out of scope

Deliverable Tree #

fix-gateway/
├── README.md                              # Hybrid: compelling product intro, then technical docs
├── architecture/
│   ├── system-design.md
│   └── diagrams/                          # Mermaid .md files
├── crds/
│   ├── fixsession-crd.yaml
│   └── examples/
│       ├── equities-single-field.yaml     # SenderCompID49 → equities-engine-svc
│       ├── fi-multi-field.yaml            # SenderCompID49 + DeliverToCompID128 + SenderSubID50
│       ├── algo-custom-tag.yaml           # SenderCompID49 + AlgoStrategy20001
│       ├── redacted-tags.yaml             # Per-tag redaction config
│       ├── wildcard-default.yaml          # Default fallback with priority
│       └── multi-default-cascade.yaml     # Cascading wildcard priorities
├── decisions/
│   ├── adr-001-tcp-proxy-technology.md
│   ├── adr-002-routing-and-conflicts.md
│   ├── adr-003-tcp-proxy-vs-fix-termination.md
│   ├── adr-004-tls-termination.md
│   └── adr-005-acceleration-roadmap.md
├── helm/
│   └── fix-gateway/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
│           ├── gateway-deployment.yaml
│           ├── gateway-daemonset.yaml
│           ├── default-reject-engine.yaml
│           ├── operator-deployment.yaml
│           ├── service.yaml
│           ├── serviceaccount.yaml
│           ├── clusterrole.yaml
│           ├── clusterrolebinding.yaml
│           ├── otel-collector-config.yaml
│           └── crds/
│               └── fixsession-crd.yaml
└── reference/
    ├── tag-field-reference.md
    └── otel-collector-setup.md

Architecture Sections (system-design.md) #

Minimum 150 words per section. Concrete specifics, not hand-waving.

1. Technology Selection #

TCP proxy research (REQUIRED): Evaluate HAProxy TCP mode (tcp-request content inspection), Envoy (TCP filter chain + Wasm/Lua), Nginx stream module, and purpose-built proxies. Can an existing proxy be extended with a FIX Logon parser vs. building custom? Assess: protocol extensibility, TLS performance, Kubernetes integration maturity.

FIX Logon parsing: Best-effort tag=value\x01 extraction only. No session management. A lightweight custom parser likely beats a full FIX engine — evaluate explicitly.

Operator framework: controller-runtime/kubebuilder vs Operator SDK vs kube-rs (match to chosen language).

Recommend and justify. No comparison tables — make decisions.

2. Component Architecture #

  • Gateway: optional TLS termination, TCP listen, Logon buffer/parse, routing eval, bidirectional proxy. Plaintext to engines within cluster — document reliance on CNI-provided network security (Cilium transparent encryption, Calico WireGuard, Istio mTLS) rather than gateway-managed encryption for east-west traffic
  • Default reject engine (optional, Helm-toggled): accepts unmatched connections when no default route exists, sends FIX Logout. Fully instrumented with OpenTelemetry — not a telemetry blind spot
  • Operator: CRD watch → conflict validation → Service existence check (Ingress-style WARNING on missing) → routing table reconciliation. CRITICAL log + block on conflicts. Runs as a leader-elected Deployment — leader election via coordination.k8s.io/v1 Lease ensures exactly one operator reconciles at any time, even during rolling updates where old and new pods briefly overlap. This prevents concurrent reconciliation from producing conflicting routing tables. (Different from replicas: 1 alone, which does NOT prevent dual-active during rollouts.)
  • Engine services: out of scope, referenced by CRD target
  • Mermaid component diagram REQUIRED

3. Connection Flow #

  • TLS handshake (if TLS listener) → TCP accept → buffer until Logon boundary → extract routing fields → evaluate rules:
    • Match found, endpoints ready: proxy to Service
    • Match found, no ready endpoints: hold the client TCP session open (configurable timeout). Log at ERROR when hold begins (no ready endpoints is abnormal). Watch for endpoints to become ready; connect when they do. If hold timeout expires: log at CRITICAL (connection will be dropped, requires operator attention), then close. Hold subject to maxHeldConnections per CRD (default 1) and global per-pod cap (default 500)
    • No match, default route configured: proxy to default route
    • No match, no default route, reject engine enabled: proxy to reject engine
    • No match, no default route, reject engine disabled: close TCP connection, log at WARN with source IP, port, and all received bytes
    • Parse failure: reject engine (if enabled) or close, log raw bytes at WARN
  • Gateway never sends FIX responses (except via optional reject engine)
  • Bidirectional forwarding, disconnect cleanup, idle timeout
  • Connection count tracking: the gateway tracks active connection count per CRD route for observability (exposed as OTel gauge metric) but does NOT enforce connection limits — the engine decides whether to accept or reject concurrent sessions
  • Mermaid sequence diagram REQUIRED

4. TLS Termination (Optional) #

  • Per-listener TLS/plaintext toggle in Helm values. TLS is optional — deployments behind a TLS-terminating load balancer or within a trusted network can run plaintext-only
  • TLS listeners terminate encryption at gateway; plaintext to engines
  • In-cluster transport security is NOT the gateway’s responsibility — document that CNI-level encryption (Cilium, Calico WireGuard, Istio/Linkerd mTLS) provides east-west encryption. The gateway trusts the cluster network
  • Certificate management: K8s Secret refs, cert-manager option
  • SNI support: document interaction with Logon routing
  • Min TLS version + cipher suite config

5. CRD Design — FIXSession #

Target model (Ingress-style): target references a Kubernetes Service by name — like backend.service.name in Ingress. Operator watches: Service missing → WARNING + TargetResolvable=False. Service appears → auto-update to True. Runtime: matched route with no ready endpoints → hold client TCP session (configurable hold timeout in CRD or global Helm config), log at ERROR that no ready endpoints exist, watch Endpoints resource, connect when ready. If hold timeout expires: log at CRITICAL, close connection.

Tag naming — PascalCase<TagNumber>: SenderCompID49, DeliverToCompID128, AlgoStrategy20001. Tag number = parse key. PascalCase prefix = human readability + log labels. No separate dialect CRD needed.

Custom tags in Logon: any tag is a first-class routing field — standard FIX tags and non-standard tags alike. Non-standard tags may be counterparty-originated (counterparty sends proprietary fields) OR operator-originated (the gateway operator requires specific tags for internal routing, compliance metadata, or algo classification). Both are handled identically.

Match fields:

  • Single: SenderCompID49: "BROKER_A"svc
  • Multi AND: SenderCompID49 + DeliverToCompID128 + SenderSubID50svc
  • Custom: SenderCompID49 + AlgoStrategy20001: "VWAP"svc
  • Wildcard/default routes with priority/weight (wildcards ONLY — exact matches use conflict detection instead)

Per-tag logging and redaction:

matchFields:
  SenderCompID49:
    value: "BROKER_A"
    log: true           # (default) included in telemetry
  AlgoStrategy20001:
    value: "VWAP"
    redact: true         # appears as [REDACTED] everywhere
  ComplianceID9001:
    value: "RESTRICTED"
    log: false           # invisible in all telemetry

Redaction enforced across logs, OTel spans, and metric labels.

Connection hold limits:

spec:
  maxHeldConnections: 1   # default 1 — most sessions are 1:1 with a pod
  holdTimeout: 30s

Per-CRD maxHeldConnections (default 1) caps how many connections can be held waiting for endpoints on that route. Once the cap is reached, additional connections that arrive while endpoints are unready are immediately closed with a CRITICAL log. A global per-gateway-pod cap (gateway.maxHeldConnectionsPerPod, default 500) provides a safety ceiling across all routes.

CRD deletion with active sessions (CRITICAL behavior): when a FIXSession CRD is deleted while live proxied connections exist on that route, the gateway MUST NOT disconnect those sessions. Instead: log at ERROR identifying the deleted CRD and the number of orphaned active connections. The sessions continue proxying until the counterparty or engine disconnects naturally. The operator can observe the ERROR and re-apply the CRD to restore healthy state (the gateway would then clear the error and re-associate the sessions with the restored CRD). New connections matching the deleted CRD’s former fields will follow normal evaluation (likely no match → default route or reject engine). Document this “orphan session” lifecycle explicitly.

Conflict validation: operator validates full CRD set is conflict-free. Overlap detected → CRITICAL log naming both CRDs + overlapping fields → block update → hold last-known-good table. Define the detection algorithm explicitly. When multiple CRDs arrive simultaneously (batch kubectl apply), the architecture doc MUST address reconciliation strategy — whether the operator debounces (waits for a settle window before validating the full set atomically) or processes individually (re-validating after each). The agent should evaluate both approaches, decide, and document the rationale.

Status: active connections, held connections (waiting for endpoints), orphaned connections (CRD deleted but sessions live), last connection/routing timestamps, target resolution, endpoint readiness.

Routing decision logs: every decision emits: CRD name (or NO_MATCH/DEFAULT_ROUTE/REJECT_ENGINE), extracted tags (per redaction rules), target, source endpoint, listener. If connection is held waiting for endpoints, log ERROR at hold start and CRITICAL if hold times out (with hold duration).

Full CRD YAML with OpenAPI v3 validation schema.

6. Routing Evaluation Engine #

  • CRD rules compiled to in-memory lookup
  • Evaluation order: most-specific multi-field match → less specific → wildcard by priority → default route → reject engine (if enabled) → close
  • Conflict-free invariant: operator validates at admission, runtime assumes unambiguous
  • Hot-reload: CRD changes propagate without dropping live connections, held sessions, or orphaned sessions. New rules = new connections only
  • Algorithm described in implementable detail

7. Deployment Topology #

  • Deployment mode: HPA-enabled, pod anti-affinity on kubernetes.io/hostname (preferred by default, configurable to required)
  • DaemonSet mode: gateway on every node (or labeled subset). Toggle via gateway.mode in values.yaml
  • L4 load balancer for TCP (not HTTP). No LB-level session affinity needed
  • Operator: leader-elected Deployment using coordination.k8s.io/v1 Lease. Replicas set to 2+ for availability — only the lease holder reconciles; standby takes over on failure within the lease duration. Document why replicas: 1 alone is insufficient (dual-active during rollout, no automatic failover)
  • Default reject engine (if enabled): 1-2 replica Deployment
  • Network policy: gateway → engine Services; operator → API server; reject engine ← gateway only
  • In-cluster security: document that network policies + CNI encryption replace gateway-to-engine TLS
  • Mermaid deployment diagram REQUIRED

8. Helm Chart #

gateway:
  mode: deployment  # or "daemonset"
  replicas: 3
  antiAffinity:
    enabled: true
    type: preferred  # or "required"
  holdTimeout: 30s
  maxHeldConnectionsPerPod: 500
  listeners:
    - name: fix-tls
      port: 9876
      tls:
        enabled: true
        secretName: fix-gateway-tls
        minVersion: "1.2"
    - name: fix-plain
      port: 9877
      tls:
        enabled: false
defaultRejectEngine:
  enabled: true       # set false to disable
  replicas: 2
  rejectReason: "No matching session configuration"
operator:
  replicas: 2         # leader-elected, only one active
  leaseNamespace: ""  # defaults to release namespace
otel:
  collectorEndpoint: "otel-collector.observability:4317"
  protocol: grpc      # or "http"
  insecure: false
  tlsSecretName: ""   # optional TLS for collector connection

Both Deployment + DaemonSet templates (one active per mode). CRDs via crds/ directory.

9. TLS Offload and TCP Forwarding Acceleration #

The latency that matters is TLS termination and TCP forwarding — not Logon parsing or routing (which happen once per connection).

  • Phase 1 (this project): Software-only on Kubernetes. Tuning: TCP_NODELAY, SO_REUSEPORT, busy-poll, CPU pinning, NUMA-aware scheduling
  • Phase 2: Kernel bypass for TCP forwarding — DPDK, Solarflare OpenOnload, AF_XDP eBPF fast path. Logon parse and routing stay in software; bidirectional TCP proxy moves to kernel-bypass
  • Phase 3: FPGA/SmartNIC offload for TLS termination and TCP forwarding — AMD Alveo, Altera SmartNICs. TLS decrypt in hardware, TCP splice in FPGA fabric. Routing stays in software (too dynamic for hardware). Document the offload boundary clearly

For each phase: expected latency, infrastructure requirements, deployment topology changes.

10. Observability — OpenTelemetry #

All telemetry via OpenTelemetry. No vendor-specific instrumentation. Every component emits OTel — gateway, operator, and reject engine (when enabled).

  • Spans: per-connection from TCP accept → routing → proxy (or hold → connect). Children: TLS handshake, Logon parse, rule eval, endpoint hold (if waiting), upstream connect. Attributes include extracted tags (per redaction), routing decision, listener, TLS metadata
  • Metrics: active connections per CRD (gauge — tracked for observability, not enforced), held connections waiting for endpoints (gauge), orphaned connections (gauge — CRD deleted but sessions live), routed per CRD (counter), reject engine connections (counter), eval latency (histogram), rejections by reason (counter: no_match/parse_failure/target_unreachable/hold_timeout/hold_cap_reached), TLS failures (counter), bytes proxied (counter)
  • Logs: OTel-correlated structured logs for every routing decision (redaction-aware), hold events (ERROR on hold start, CRITICAL on timeout, INFO on successful connect after hold), orphan events (ERROR on CRD delete with active sessions), conflict events, connection lifecycle, TLS failures
  • Redaction enforcement: log: false → absent from all telemetry. redact: true[REDACTED] everywhere
  • Alerts: conflict detection, sustained rejections, target unreachable, TLS cert expiry, reject engine rate spikes, high held-connection count, orphaned sessions present (CRD deleted with live connections), hold cap reached

11. OpenTelemetry Collector Configuration #

Deployment and configuration guide — not just application-side instrumentation:

  • Collector deployment: recommended mode for this gateway — DaemonSet (node-level) vs. Deployment (centralized). Include working otel-collector-config.yaml in Helm chart
  • Receiver configuration: OTLP gRPC (4317), OTLP HTTP (4318). Document how gateway, operator, and reject engine connect to the collector via otel.collectorEndpoint in Helm values
  • Processor pipeline: batch processor (batch size, timeout), memory limiter, attribute processor for global enrichment (cluster name, namespace, gateway version)
  • Exporter configuration: example configs for common backends — Prometheus (metrics), Jaeger/Tempo (traces), Loki/Elasticsearch (logs)
  • Sampling strategy: 100% sampling for routing decisions and errors; configurable rate for normal proxy span completion. Document head-based vs. tail-based for high-throughput FIX environments
  • Resource detection: k8sattributes processor for pod name, node, namespace
  • Include reference/otel-collector-setup.md with step-by-step deployment instructions

ADR Format #

# ADR-NNN: [Title]
## Status: Proposed
## Context: [Why this decision is needed]
## Options Considered: [Minimum 3 options with pros/cons]
## Decision: [What was chosen]
## Consequences: [What this enables and what it costs]

Allowed Actions #

  • Create directories and files
  • Research TCP proxies (HAProxy, Envoy, Nginx stream), kernel bypass (DPDK, OpenOnload, AF_XDP), FPGA/SmartNIC TLS offload, K8s operator frameworks, Kubernetes Ingress Service-reference patterns, OTel Collector deployment patterns, CNI encryption options, leader election patterns, and prior art
  • Mermaid diagrams, YAML CRDs with OpenAPI v3, Helm templates, OTel Collector configs

Forbidden Actions #

  • No application source code
  • No Dockerfiles or CI/CD
  • No test files
  • No engine pod design — out of scope
  • No FIX session concerns in the gateway (sequence numbers, heartbeats, validation). Exception: optional reject engine sends FIX Logout
  • No placeholder text (“TBD”, “TODO”)
  • State uncertainty explicitly — never hallucinate library features

Stop Conditions — Pause and Ask When: #

  • Conflict detection algorithm has silent ambiguity edge cases
  • CRD field design hits a simplicity vs. expressiveness tradeoff
  • Gateway must participate in FIX protocol beyond the optional reject engine
  • Best-effort parsing can’t reliably extract custom tags
  • Per-tag redaction creates observability vs. compliance tension
  • Connection hold behavior creates resource exhaustion risk beyond the per-CRD and per-pod caps
  • CRD batch reconciliation strategy has consistency implications the agent can’t resolve
  • Orphan session lifecycle creates a state management problem that needs clarification
  • An existing TCP proxy (HAProxy, Envoy) handles 80%+ of requirements with a plugin
  • Any section needs implementation-level detail to be meaningful

Checkpoints #

✅ after each phase:

  1. Research — TCP proxies, kernel bypass, FPGA TLS offload, Ingress patterns, OTel Collector patterns, CNI encryption, leader election, operator frameworks
  2. ADRs — all five decision records
  3. CRD — FIXSession definition + all examples including redaction and hold config
  4. Architecturesystem-design.md, all 11 sections
  5. Helm — Deployment/DaemonSet modes, anti-affinity, listeners, optional reject engine, OTel Collector config, hold limits
  6. Reference — tag field reference, OTel Collector setup guide, README (hybrid tone: product intro + technical docs)
  7. Final review — cross-refs consistent, no FIX session leakage (except optional reject engine), redaction applied across CRD + logs + OTel, Ingress-style Service refs correct, OTel Collector config validates, session hold caps documented, orphan session lifecycle documented, CRD batch reconciliation addressed
George Tsiokos
Author
George Tsiokos

Leave a comment

Preview

Comments are reviewed before publishing.