Remote Infrastructure Engineer jobs – Full‑Time Senior Position in Brentwood, California | AWS, Terraform, Kubernetes, $115k‑$150k – Remote Infrastructure & Cloud Architecture Role
TITLE:Remote Infrastructure Engineer jobs – Full‑TimeSenior Position in Brentwood, California | AWS, Terraform, Kubernetes, $115k‑$150k –Remote Infrastructure & Cloud Architecture Role --- Who we are We’re ByteForge, a mid‑size SaaS provider that grew from a two‑person garage project to a platform serving more than 2 million end‑users across North America. Our core product—an API‑driven data‑pipeline that powers real‑time analytics for retail chains—runs entirely in the cloud, and the reliability of that pipeline is what keeps our customers awake at night (in a good way).We’ve been pulling double‑shifts on call for the last 18 months because a recent acquisition added a new data‑ingestion module that increased our daily traffic by 45 %. The engineering leadership team decided that we need a dedicatedRemote Infrastructure Engineer to own the underlying platform, bring systematic automation, and finally give the on‑call crew a predictable shift schedule. That’s why we’re hiring in Brentwood, California—because the talent pool there has a reputation for pragmatic cloud expertise, and we want someone who can relate to the same regional tech community while working fully remotely.Why this role exists now When we launched the new ingestion service, we saw three concrete pain points: 1. Spikes in latency that breached our 99.9 % SLA on 12 % of daily requests. 2. Infrastructure cost overruns that pushed our AWS bill from $1.1 M to $1.5 M in 6 months, a 38 % increase. 3. Manual provisioning of Kubernetes clusters and VPCs that caused a mean time to recovery (MTTR) of 84 minutes after a failure, far above our target of What you’ll actually do - Architect and implement a fully automated, IaC‑driven environment using Terraform and AWS CloudFormation that can spin up identical staging, production, and disaster‑recovery clusters in under 10 minutes. - Manage our Kubernetes fleet (currently 12 clusters, 340 nodes) leveraging Helm and Kustomize to version control all manifests, ensuring that every rollout is reversible. - Build and maintain CI/CD pipelines in Jenkins and GitHub Actions that push infrastructure changes through a gated approval process, integrating security scans via HashiCorp Vault and Trivy.- Instrument the stack with Prometheus, Grafana, and New Relic to surface latency, error‑rate, and cost metrics in real time, setting alerts that feed directly into our on‑call rotation. - Collaborate with the security team to enforce least‑privilege IAM policies, manage secrets, and run quarterly compliance checks (SOC 2, ISO 27001) using AWS Config and AWS Security Hub. - Mentor a small team of 4 junior engineers, guiding them through best practices for cloud cost optimization, container security, and incident post‑mortems.- Run capacity planning on quarterly forecasts, using AWS Cost Explorer and CloudHealth to model growth scenarios and recommend right‑sizing recommendations that keep the expense curve flat. - Drive the on‑call rotation redesign: moving from a 24/7 “fire‑fighting” model to a predictably scheduled, run‑book‑first approach that reduces fatigue and improves resolution quality. - Document everything in Confluence, ensuring that any new hire in Brentwood, California can walk through a “day‑in‑the‑life” playbook without having to ask a senior colleague.Who you’ll work with - Product Engineering (12 engineers): You’ll be their go‑to for infrastructure feasibility, helping them understand the cost implications of new feature flags. - Data Science (5 analysts): They need reliable, low‑latency pipelines; you’ll work with them to fine‑tune cluster autoscaling policies. - Security & Compliance (3 specialists): You’ll partner on audits and embed security controls directly into the IaC pipeline. - Customer Success (8 reps): Occasionally you’ll join calls with a high‑value client in Brentwood, California who wants to understand how a new region will impact latency.- Executive leadership:The CTO (based in Austin) meets weekly, and the CFO (who lives in Brentwood, California) tracks infrastructure spend closely. Your reports will influence quarterly budgeting discussions. Our tech stack (the tools you’ll be getting hands‑on with) 1. AWS (EC2, RDS, S3, EKS, Lambda) 2. Terraform (v1.5+) 3. Kubernetes (v1.27) 4. Helm & Kustomize 5. Docker (v24) 6. Jenkins + GitHub Actions 7. Prometheus & Grafana 8. New Relic APM 9. HashiCorp Vault 10. Ansible (for VM configuration) 11. AWS CloudWatch & CloudTrail 12.Splunk (log aggregation) You’ll also get to experiment with GitLab CI and Azure if a client asks for a multi‑cloud proof‑of‑concept. We consider the list “the tools we love today,” not a static requirement. Metrics you’ll be judged on (the numbers that matter) | Metric | Target (12‑month horizon) | |--------|---------------------------| | Cloud cost reduction | ≥ 10 % YoY | | SLA compliance (99.9 % uptime) | ≥ 99.95 % | | MTTR for infra incidents | ≤ 20 minutes | | Automation coverage (IaC vs manual) | 95 %+ | | Team satisfaction (internal survey) | ≥ 4.5/5 | | On‑call fatigue index (self‑reported) | ↓ 30 % |Your first 90 days will be a “learning sprint”: you’ll audit existing pipelines, map out the biggest cost drivers, and submit a roadmap that outlines the automation milestones.Success is measured not just by ticking boxes but by the tangible improvement in the numbers above. What we offer (the real stuff, not buzzwords) -Salary: $115k – $150k base, commensurate with experience, plus a quarterly bonus tied to the cost‑reduction targets. - Equity: 0.15 % option pool that vests over 4 years with a 1‑year cliff. - Remote‑first policy: While we say “remote,” we provide a $2,500 stipend for a home office upgrade (standing desk, monitors, ergonomic chair). You’ll still attend two quarterly “team‑offsites” in Brentwood, California—we’ve found the coffee there keeps ideas flowing.- Health benefits: Medical, dental, vision, and a $1,200 per‑year wellness allowance (gym, meditation apps, you name it). - Learning budget: $2,000 annually for certifications (AWS, CKA, etc.) and conference tickets (AWS re:Invent, KubeCon). We’ve covered travel to remote conferences before, even for folks based in Brentwood, California. - Paid time off: 20 days + federal holidays, plus a “recharge week” you can take any time after your first six months. - Family‑friendly policies: Parental leave (up to 12 weeks paid), flexible schedule (you set the core hours, we just need you for the on‑call overlap).A human moment > “When I first joined ByteForge, I was pulling 2‑hour on‑call after‑hours because we didn’t have proper run‑books. Within three months, the new automation I helped build reduced my average incident time from 84 minutes to under 12 minutes. That change wasn’t just a metric—it gave me evenings back with my kids. Knowing my work directly improves someone’s personal life is why I stay here.” – * Lena,Senior Infrastructure Engineer (based in Brentwood, California)* Why you should (the “why now” in plain language)Our next major release is scheduled for Q2 2026, and the new ingestion engine will double the data volume we process.The engineering leaders have already earmarked a $500k budget for infrastructure automation, but they need a senior engineer who can turn that budget into concrete pipelines, cost savings, and a calmer on‑call rotation. If you love digging into cloud bills, writing Terraform modules that feel like poetry, and mentoring junior talent, you’ll find this role both challenging and rewarding. What a typical day looks like (remotely, from anywhere in the US, but we’ll be hiring in Brentwood, California) - 08:30 – 09:00 – Quick stand‑up on Zoom with the platform team (all are in different time zones, but we overlap for an hour).- 09:00 – 10:30 – Review recent CloudWatch alerts; triage any spikes and add a new Prometheus rule if needed. - 10:30 – 11:15 – Pair‑program with a junior engineer on a Terraform module that provisions a new VPC for a client in the Midwest. - 11:15 – 12:00 – Write a short post‑mortem in Confluence, adding a run‑book snippet for a “node‑drain” incident we observed yesterday. - 12:00 – 13:00 – Lunch break (we encourage you to step away, then maybe read the latest AWS blog post). - 13:00 – 14:30 – Deploy a Helm chart to a sandbox cluster, test a new autoscaling policy using KEDA, and monitor the results in Grafana.- 14:30 – 15:30 – Attend a 30‑minute security sync with the compliance team; discuss IAM role redesign to meet upcoming SOC 2 audit requirements. - 15:30 – 16:00 – Update the cost‑optimization dashboard in AWS Cost Explorer, flag any resources that have been idle > 48 hours. - 16:00 – 16:30 – End‑of‑day “handoff” notes posted in Slack for the next on‑call engineer (who’s based out of Brentwood, California this week). (simple, no‑nonsense process) 1. Submit your resume via our career portal (link below).Include a short paragraph (2‑3 sentences) describing the biggest cloud‑cost reduction you’ve delivered. 2. Technical screen (30 minutes) with our lead architect – focused on your experience with Terraform, Kubernetes, and AWS networking. 3. Take‑home design exercise (no more than 4 hours). You’ll design a modular, reusable Terraform configuration for a multi‑AZ EKS cluster that meets a 99.95 % SLA and adheres to cost‑optimization best practices. We’ll provide the spec; we’ll not expect a fully coded solution, just architecture diagrams and pseudo‑code.4. Final interview with the CTO and a senior engineer (45 minutes). Expect a mix of culture fit, leadership style, and a deep dive into the take‑home exercise. 5. Offer – if everything aligns, you’ll receive an offer within 5 business days after the final interview. A final word from our CTO > “Infrastructure is the skeleton that holds up the experience we promise our customers. When you join us, you’re not just writing code—you’re shaping how millions of users see our product, and you’ll see that impact in the numbers day after day.” – * Michele Ramirez, CTO, ByteForge (also a resident of Brentwood, California)* --- If you’re a hands‑on engineer who prefers concrete outcomes over vague buzzwords, enjoys automating the boring stuff so that teams can focus on delivering value, and wants to work in a place where your fellow engineers are as honest about challenges as they are about successes, we’d love to hear from you.and help us build a more resilient, cost‑effective, and human‑centric remote infrastructure—right from Brentwood, California and everywhere else you call home. Apply tot his job