Onlano
AI / Machine LearningRemote

Staff Software Engineer - Inference & Performance

Runware - Company

UK, ๐Ÿ‡ฌ๐Ÿ‡ง United KingdomFull-time - Principal (10+ years)0 applicantsCloses Jul 8, 2026

Salary

GBP 77,112 - 77,112 / year

Apply for this job

Job description

Job details

  • Location: UK
  • Work mode: Remote
  • Employment type: Full-time (Not an internship)
  • Salary: GBP 77,112 per year

Role overview

Runware is seeking a Staff Software Engineer to lead the technical strategy for latency, throughput, and reliability within our AI inference platform. In this senior leadership position, you will optimize the entire pipeline from request ingress to GPU execution, ensuring high-performance delivery of AI models at scale.

Job details

This is a Full-time position based in the UK. The role is Remote, offering flexibility for a high-impact technical leader. This is not an Internship. The annual salary for this role is 77,112 GBP.

Responsibilities

  • Take full technical ownership of latency and throughput across the AI inference platform
  • Architect and implement systems to achieve sub-one-second inference times in production
  • Define technical standards and execution roadmaps for GPU execution and result delivery
  • Optimize request ingress and data flow to maximize platform reliability and scale
  • Lead the design of high-performance infrastructure to support massive AI workloads

Requirements

  • Extensive experience in high-performance software engineering and system architecture
  • Proven track record of optimizing GPU execution and inference latency at scale
  • Deep understanding of distributed systems and low-latency networking
  • Ability to lead complex technical initiatives from conceptual design to production
  • Strong expertise in languages and tools used for high-performance AI infrastructure

Benefits

  • Competitive salary of 77,112 GBP
  • Remote-first work environment
  • Opportunity to lead critical AI infrastructure
  • High-impact role in a fast-growing AI company

Keywords

AI InferenceGPU OptimizationLatency ReductionDistributed SystemsHigh-Performance ComputingScalabilityMLOps