Ryan Mitchell

Ryan Mitchell

Software engineer in San Francisco. I build backend systems, data pipelines, and occasionally the dashboards that make sense of them. Currently at Bastion Security, previously Tensor Cloud and Pinpoint Analytics. Michigan grad. Mountain biker when the fog clears.

Things I've Built

Streaming Threat Detection

Bastion Security, 2023

Bastion's threat detection polled cloud APIs on a 45-second batch interval. Two enterprise prospects walked during POC evaluations because of the detection gap. I proposed and led the migration to a Kafka-based streaming architecture with event subscriptions from CloudTrail and cloud provider webhooks.

The interesting part was the stateful stream processor. It correlates events across multiple cloud sources in real-time with exactly-once processing guarantees via idempotent consumers. I ran a 3-week parallel deployment where both the old and new systems processed live traffic so we could validate correctness before cutting over.

Alert latency dropped from 45 seconds to under 3 seconds. We re-engaged both lost prospects and one converted to a $280K ARR contract. The streaming architecture also unblocked a "live threat map" feature that product had deprioritized as technically infeasible.

Before / After

45s <3s

Alert latency

Go Kafka CloudTrail Idempotent Consumers

Multi-Tenant API Gateway

Bastion Security, 2023

The existing API layer was a single Flask app with tenant routing in middleware. A noisy tenant incident caused a 12-minute partial outage across all customers. At 800 req/sec, p99 was already 420ms and degrading.

I built a new gateway in Go with tenant-scoped connection pools, adaptive rate limiting, and circuit breakers per downstream service. Rolled it out as a gradual traffic shift using weighted DNS, migrating 5% at a time over two weeks while watching latency percentiles. Credentials rotate automatically every 72 hours with zero downtime.

The gateway now handles 2,400 req/sec at p99 of 68ms. Noisy-tenant incidents: zero since launch. Processes $18M in ARR worth of customer traffic.

Throughput

2,400 req/sec

p99: 68ms (was 420ms)

Go Rate Limiting Circuit Breakers DNS Routing

ECS to Kubernetes Migration

Bastion Security, 2024

14 microservices on ECS with manually maintained task definitions. Deployments took 18 minutes, rollbacks required SSH, and infrastructure costs had grown 40% YoY without matching traffic growth. I wrote Helm charts for all 14 services with standardized health checks, resource limits, and auto-scaling.

Built a GitOps pipeline with ArgoCD for declarative deployments. Migrated in 4 waves over 6 weeks, starting with the lowest-risk internal services. Ran hands-on workshops so every engineer could deploy and debug independently.

Deploy time

4 min

was 18 min

Cost savings

31%

$14K/month

Kubernetes Helm ArgoCD EKS

ML Model Serving Layer

Tensor Cloud, 2021-2022

Tensor Cloud's inference layer was a prototype Flask service. At 400 req/sec, response times exceeded 500ms and GPU utilization was below 35% because requests queued sequentially.

I designed a two-tier architecture: a Go request router handling connection management and batching, fronting Python inference workers with shared GPU memory pools. Adaptive batching groups requests by model version. An LRU response cache with TTL hits 23% of requests, giving free headroom. Health-based routing avoids workers mid-model-load.

The serving layer handles 1,800 req/sec at p99 of 92ms. GPU utilization jumped from 35% to 78%. The architecture became the template for Tensor's next two product lines.

GPU utilization

35% 78%

p99: 92ms at 1,800 req/sec

Go Python GPU Pools Adaptive Batching

Event-Driven Billing Migration

Tensor Cloud, 2022

Billing lived inside a Django monolith. Race conditions between usage tracking and the billing cron caused double-charges -- $34K in refunds over 6 months. Customer trust was eroding.

I extracted billing into a standalone service. Usage events publish to Kafka with idempotent producers. The consumer processes with exactly-once semantics using the transactional outbox pattern. I built a real-time usage dashboard so customers see charges accumulate live, and wrote property-based tests (Hypothesis) to verify billing invariants under concurrent load. Zero billing discrepancies in 14 months since launch.

Billing errors

Zero

14 months running

Support tickets

-61%

billing-related

Kafka Outbox Pattern Hypothesis React

GPU Cost Attribution Dashboard

Tensor Cloud, 2022

$230K/month GPU bill and nobody could tell which teams or projects consumed what. Finance was manually attributing costs from AWS billing exports in spreadsheets.

Built the dashboard in React with D3.js. Designed a cost attribution model using Kubernetes labels to tag GPU hours by team, project, and model type. Wrote an ETL pipeline enriching AWS Cost Explorer data with pod metadata every 15 minutes. Drill-down views, trend lines, anomaly highlighting for spend spikes. Adopted by all 8 engineering teams in the first week. Identified $51K/month in idle GPU instances.

Cost reduction

22%

First quarter after launch

React D3.js AWS Cost Explorer Kubernetes Labels

Time-Series Query Engine

Pinpoint Analytics, 2020

Pinpoint's analytics dashboard was slow for high-volume customers. P95 load time was 4.2 seconds, with some customers hitting 8+. The root cause was full table scans on a 900M-row time-series table. The team didn't have bandwidth for a new database.

I stayed within PostgreSQL and implemented a multi-pronged strategy: time-based monthly partitioning, materialized views for the 5 most common aggregations on staggered refresh schedules, partial indexes on customer_id + time range, and raw SQL replacing ORM queries where the ORM generated suboptimal partition pruning plans.

Dashboard load time (p95)

4.2s 380ms

11x improvement, same database

PostgreSQL Partitioning Materialized Views

Notes

On API pagination

Feb 2026

Cursor-based pagination is almost always what you want. Offset pagination breaks under concurrent writes and gets slower linearly with page depth. The only exception is when users genuinely need to jump to page 47 -- and even then, I'd rather build a search feature than maintain offset-based queries at scale.

GraphQL vs. REST

Jan 2026

GraphQL is great when your frontend makes lots of varied queries against a stable schema. It's a bad fit when your API is mostly writes or when you need strong caching at the HTTP layer. At Pinpoint, the switch to GraphQL cut dashboard DB round trips by 73%. I'd still use REST for most microservice-to-microservice calls. The tooling maturity just isn't there for server-to-server GraphQL yet.

Chaos engineering, practically

Dec 2025

We started Litmus chaos experiments at Bastion and found 8 critical failure modes in the first 2 months. The biggest win wasn't the bugs we found -- it was that the team had already rehearsed failures before they happened in production. Monthly game days became a team ritual. The on-call rotation got noticeably calmer after the third one.

Property-based testing for billing

Nov 2025

Hypothesis caught edge cases in our billing system that 200+ unit tests missed. When money is involved, generating random concurrent scenarios is worth the setup cost. The invariant is simple: total charged must equal total consumed. Proving it under concurrency is not. We found a race condition that had been silently double-charging 0.3% of invoices.

The D3.js learning curve

Oct 2025

D3 clicked for me when I stopped thinking of it as a charting library and started thinking of it as a data-to-DOM binding engine. The GPU cost dashboard I built at Tensor would have been painful with any higher-level library because of the drill-down interaction patterns. That said, for simple bar charts I'd still reach for Recharts or Nivo first.

Why I like Go for gateways

Sep 2025

Goroutines per connection, predictable GC pauses at high throughput, easy to reason about memory. The Bastion API gateway handles 2,400 req/sec and the tail latency stays flat. I wouldn't write a data pipeline in Go, but for request routing it's hard to beat. Python's asyncio gets close on developer speed but the runtime overhead shows at p99.

Background

Senior Software Engineer

Bastion Security

Jan 2023 -- Present / San Francisco

Threat detection pipelines, multi-tenant API gateway, Kubernetes migration, compliance reporting engine, distributed tracing across 22 services. Also introduced chaos engineering.

Backend Engineer

Tensor Cloud

Jun 2021 -- Dec 2022 / San Francisco

ML model serving infrastructure, event-driven billing system, data pipeline on Airflow and Kafka (3.2M events/day), GPU cost attribution dashboard in React and D3.js.

Software Engineer

Pinpoint Analytics

Jul 2019 -- May 2021 / Ann Arbor, MI

Data ingestion service (850K events/day), time-series query optimization, GraphQL API layer with DataLoader batching, analytics dashboard in React and D3.js. Promoted from junior after 14 months.

B.S. Computer Science

University of Michigan

2015 -- 2019 / Ann Arbor, MI

GPA 3.71. Senior capstone: distributed key-value store with Raft consensus in Go. TA for EECS 485 (Web Systems).

Languages

Python 5 years
Go 4 years
TypeScript 5 years
SQL 5 years

Infrastructure

AWS Kubernetes Docker Terraform Helm ArgoCD

Data

PostgreSQL Kafka Redis Airflow Spark

Observability

Datadog Prometheus Grafana OpenTelemetry

Frontend

React Next.js D3.js GraphQL Tailwind