ben@tobler.nz | +1 206 370-2990 | tobler.nz
I'm a software engineer working in distributed systems, networking & virtualization. My start in cloud was at AWS building EC2. I developed and owned the control plane hypervisor agent & VM resource manager from inception to GA. When EC2 expanded to VPC I owned development & delivery of VPC's control plane and APIs.
After VPC GA, I joined the nascent Amazon Aurora project where I developed the storage RPC stack, IO and task scheduling and coordinated DB engine storage client development between Palo Alto and Seattle teams.
At Oracle Cloud I owned the delivery of Virtual Machines, coordinating development and integration across networking, storage and compute teams. Later I designed & patented novel transactor executed & arbitrated guarded operations for OCC based database systems. The innovation eliminated a common class of commit conflicts causing performance & availability issues across some of the largest OCI control planes.
Recently I've been working in B2B on an EDI platform.
In the past I've worked in EFT implementing transaction processing platforms. I implemented 10K connection support for Java (prior to non-blocking IO availability in the JDK) enabling customers to support thousands of POS terminals and ATMs with a single transaction processor. Additionally I implemented EMV support and a HSM vendor independent cryptographic library.
M.Sc. Computer Science, with distinction, University of Cape Town,
2005
Thesis on
formal specification & implementation of network security protocols
Awarded NRF Prestige Scholarship for
Masters study
Papers published in IFIP SEC 2004
Toulouse (.pdf)
& ISSA
2003 Johannesburg
(.pdf)
Owned development of Virtual Machine service and delivered meeting an aggressive TTM goal. Coordinated development & integration across block storage, virtual networking and compute teams. Designed control plane architecture involving command distribution and state collation from VM hosts. Responsible for operational readiness at GA. Fast followed GA with competitor beating bulk provisioning for Gartner evaluation.
Responsible for post GA roadmap, including BYOI support. Owned delivery incremental customer post GA features & system functionality. Including capacity management automation, improved operational insights & troubleshooting tooling. Defined requirements for Bring Your Own Image to enable lift & shift of users' VMWare (etc) VMs to OCI. Implemented BYOI PoC integration of key technologies and planned development tasks and schedule.
Transitioned the internal database system used by OCI control planes from ORM style database library layer to a service using a cell architecture for tenant isolation to limit blast radius and mitigate noisy neighbours.
Designed novel transactor executed & arbitrated guarded operations. Eliminated transaction aborts caused by hot key contention under common workload patterns of the largest control planes in the org. The innovation means that concurrent transactions are no longer necessarily aborted due to conflicts on, for example, incrementing quota counters or allocation of resources from a set. Instead these operations are arbitrated and completed at commit time by the transactor.
PoC'd a new KV store leveraging existing distributed systems components and collaborated to use formal methods (TLA+) to verify strict serializability.
Led work to extend the org's microservice generator by developing a service framework SDK for the output to use.
Significantly reduced: Effort teams spend on undifferentiated functionality, boilerplate & being operationally ready; Opportunity for bugs; Operational load.
Determined actual needs by collaboration and source code repository auditing. Identified common pitfalls, relevant best practices and conventions for APIs, data access, configuration and resilience.
Delivered: Resource GC - e.g. idempotency tokens & operation identifiers and tombstoning; Resiliency mechanisms - e.g. PID based load shedder & circuit breaker; API functionality - e.g. operation idempotency management.
Developed high performance storage RPC stack, IO and task scheduling. Implemented and used zero copy, user mode task scheduling, vectored non-blocking IO with resource partitioning and core assignment to achieve low latency, high concurrency log write request processing. Profiled and optimized hot path to achieve close to line rate RPC capability.
Coordinated DB engine storage client development and integration across Palo Alto and Seattle based teams. Owned security threat model analysis and mitigations.
Owned development and delivery of the control plane frontend, user facing and internal APIs. The control plane processes API calls, from users and internal compute services, to modify virtual network topology, VM attachments, external routing and security ACLs. The API requests are converted into commands that it reliably delivers to 100,000s of hypervisors hosts and network & edge devices.
Collaborated with AWS Linux kernel team on successful effort to improve packet throughput on hypervisors. Achieved by moving source identification logic from control plane agent in user space, to a kernel module.
Extended and maintained the distributed firewall system that implements EC2 instance security groups across all the hypervisors in a data center.
Optimized in-memory representation of VPC network topology and security policy on hypervisors to support massive VPC & EC2 scaling while making efficient use of limited hypervisor host memory.
Developed & and owned the EC2 control plane hypervisor agent & VM resources coordinator from conception through GA. The agent manages the VM lifecycle on each of the many thousands (now millions) of hypervisor hosts. Implemented collaborative peer boot image download when original pre-GA object storage read capacity was highly limited.
Implemented internal and public tooling and microservices, including AMI packager and up-loader CLI, dynamic DNS vendor and intra data center network rate limiting to mitigate noisy neighbours.
Helped grow the EC2 team in Seattle by ramping up new team members tasked with developing new EC2 networking and security functionality.