About Our Client
We’re a deep-tech startup building next-generation networking and communication infrastructure for AI/ML systems. Our work focuses on eliminating bottlenecks in distributed compute — improving how GPUs, clusters, and networks communicate at scale.
This isn’t application-layer AI. We operate at the transport, systems, and hardware-adjacent layers, solving real performance problems across data centers, edge environments, and specialized networks.
The Role
We’re hiring a Senior Software Engineer to work on high-performance networking and GPU communication systems.
You’ll be designing and optimizing the core infrastructure that powers large-scale AI workloads — reducing latency, improving throughput, and unlocking better utilization across distributed systems.
What You’ll Do
- Design and implement high-throughput, low-latency transport systems
- TCP / UDP / QUIC
- Congestion control, pacing, and flow control
- Optimize distributed system performance across:
- CPU, memory, and network
- GPU-to-GPU and node-to-node communication
- Work on real-world bottlenecks in AI/ML infrastructure
- Build and ship production systems in:
- C / C++
- (Rust is a plus)
- Collaborate with a small, highly technical team to deliver measurable performance gains
What You Bring
Core Requirements
- Strong proficiency in C/C++
- Deep understanding of networking fundamentals
- Transport protocols
- Congestion and flow control
- Experience working on systems-level performance problems
- Strong debugging skills and ability to reason from first principles
- Exposure to GPU systems or acceleration
-CUDA
-NCCL (NVIDIA Collective Communications Library)
-GPU data movement or optimization
What We’re Looking For
- Engineers who enjoy working close to the metal
- Strong problem solvers who can break down complex systems
- People who value substance over buzzwords
- Individuals who can contribute quickly and independently
This role is best suited for engineers with a systems and networking background, rather than those focused solely on high-level ML frameworks.
Preferred Experience
- Rust experience (or interest in learning)
- RDMA / Infiniband / NVLink familiarity
- Experience with erasure coding / FEC
- Mobile or edge networking experience
- High-performance or low-latency systems
- Distributed systems or data plane optimization
- Hardware-adjacent environments (e.g., FPGA, network processors)
How We Work
- Small, senior, highly technical team
- Remote-friendly (U.S. time zones preferred)
- Focus on real output and performance impact, not process overhead
Why Join
- Work on critical infrastructure for AI/ML systems
- Solve problems at the intersection of networking and compute
- Join a team that values technical depth and execution
- Direct impact on performance at scale
Bonus: GPU-Focused Role
We’re also actively interested in engineers with deep GPU communication experience (CUDA, NCCL, NVIDIA stack). If that’s your background, we’d especially like to connect.