Remote Senior DevOps & Infrastructure Lead Job at RV LIFE

USA

an hour ago

Full Time

USA

$100000 - $130000 USD

React

Node.js

MongoDB

Terraform

CI/CD

Redis

Rust

PHP

AWS

Linux

REST

devops

“When applying, mention the word FarCoder to show you’ve read the job post completely. Employers can look for these words to identify genuine, thoughtful applicants and avoid spam.”

Job Overview

RV LIFE is hiring a remote candidate for Senior DevOps & Infrastructure Lead. This is a full time position. Work location: USA.

The role typically involves technologies such as React, Node.js, MongoDB, Terraform, CI/CD, Redis.

This role involves building user interfaces using technologies such as React.

Key Responsibilities

Administer and improve existing DigitalOcean infrastructure.
Support and improve Linux-based production server environments.
Migrate self-managed databases onto managed database services, with validated failover, backups, and recovery.
Move applications onto managed runtimes (including Laravel Cloud where it fits), replacing manual deploy processes with automated, repeatable pipelines.
Expand and harden our use of Cloudflare for edge, static hosting, caching, and security.
Build a clear inventory of servers, services, databases, domains, access paths, backups, monitoring, and operational risks.
Create and maintain practical runbooks for common and emergency infrastructure workflows.
Improve incident response, escalation paths, monitoring, logging, and alerting.
Review and improve backup, restore, and disaster-recovery procedures.
Identify recurring manual work and convert it into safer procedures, scripts, automation, or infrastructure-as-code.
Help define infrastructure-as-code standards and move appropriate infrastructure into repeatable, version-controlled workflows.
Work with AWS services where needed (Lambda, VPC, IAM, CloudWatch, S3, SSM/Secrets Manager, queues).
Use AI tools to accelerate discovery, documentation, scripting, troubleshooting, and automation, with strong production-safety judgment.
Partner with engineering leadership to prioritize infrastructure risk and modernization; track work clearly in Jira/GitHub and communicate proactively about risks, tradeoffs, and blockers.

Required Skills

Primary Skills

React
Node.js
MongoDB
Terraform
CI/CD
Redis
Rust

Secondary Skills

PHP
Go
AWS
Linux
REST
devops

Skills required for this role include React, Node.js, MongoDB, and related tools for day-to-day development.

Job Details

Employment Type: Full Time
Location: USA
Salary: $100000 – $130000 USD

Tech Stack

React, Node.js, MongoDB, Terraform, CI/CD, Redis, Rust, PHP, Go, AWS, Linux, REST, devops

Role details

About the Role

RV LIFE is looking for a Senior DevOps & Infrastructure Lead to help us stabilize, document, and modernize the infrastructure behind our products.

This is a hands-on senior role for someone comfortable inheriting real production systems, reducing operational risk, improving reliability, and moving us toward a documented, secure, automated, infrastructure-as-code operating model.

We run production across DigitalOcean, AWS, Cloudflare, and other hosting providers, and are consolidating onto managed, infrastructure-as-code platforms. We need deep, hands-on expertise across these environments.

RV LIFE is an AI-first engineering organization. We expect this person to use AI to accelerate discovery, documentation, runbooks, log review, scripting, and infrastructure-as-code drafting, while applying strict human judgment around security, secrets, production access, destructive commands, rollback, and correctness.

This role focuses on the infrastructure path to reliability; application-level architecture changes are handled in partnership with our engineering team. It is not just about keeping servers alive. It is about building durable practices that reduce single-person dependency, improve visibility, and make our systems safer to operate.

This is not a standard 9-to-5 role. Production issues do not keep business hours, so it carries real on-call responsibility: you need to be reachable and able to respond when unforeseen incidents arise.

What You'll Do

What Success Looks Like

In the first 30-60 days, you'll take ownership of how we see and operate our infrastructure, building on what we already track and closing the gaps.

You'll validate and take ownership of what already exists:

Our infrastructure inventory and server map
Our monitoring and alerting
Our DNS / Cloudflare configuration
Our prioritized infrastructure risk register

You'll create what we're missing:

An access and credential map
Verified backup and restore status for critical systems (tested, not assumed)
Runbooks for the highest-risk operational workflows

In the first 90 days, you'll move us toward a durable, consolidated model. Success means:

The first core database migrated to a managed service, with a tested restore, plus a clear, sequenced plan for the rest.
The first application running on a managed runtime (App Platform or Laravel Cloud).
The first static frontend served from Cloudflare Pages.
A measurably stronger edge security posture.
Critical systems no longer understood by only one person; common tasks have documented procedures; manual processes are being converted to automation; AI is used safely to reduce toil.

What We're Looking For

Senior-level experience operating production infrastructure.
Deep, hands-on Linux server administration (the traditional, 'old-school' kind): operating, securing, and troubleshooting manually managed production servers (LAMP/LEMP, system services, cron, networking, SSH) directly at the command line, not only through a cloud console.
Experience with DigitalOcean, Linode, AWS EC2, bare VPS hosting, or comparable environments.
Senior database operations: migrating self-managed MySQL to a managed service, replication, backup validation, restore testing, and IO isolation.
Strong Cloudflare across DNS, WAF, CDN and caching behavior, page rules, Workers, Pages, and Zero Trust/Access, including traffic routing and origin protection.
PHP/Laravel application environments, and experience with a managed Laravel runtime (Laravel Cloud and/or DigitalOcean App Platform).
Datadog or a comparable observability platform for monitoring, alerting, dashboards, logs, and incident investigation.
Infrastructure-as-code such as Terraform, Pulumi, AWS CDK, Serverless Framework, or CloudFormation.
CI/CD pipelines and deployment automation.
Practical AWS experience (Lambda, IAM, VPC, CloudWatch, S3, SSM/Secrets Manager, queues).
Good judgment around production safety, access control, secrets, backups, and incident response.
Willingness to carry real on-call responsibility and respond to production incidents outside normal business hours; this is not a strict 9-to-5 role.
A habit of documenting what you learn and creating runbooks others can follow.
Practical experience using AI tools (ChatGPT, Claude, Cursor, GitHub Copilot, or similar), with strong judgment about where human verification is required.
Ability to work independently in a small, remote engineering organization where practical ownership matters more than bureaucracy.

Nice to Have

Experience migrating manually managed services onto managed platforms or IaC.
Experience moving static frontends onto Cloudflare Pages.
Managed migrations for MongoDB, OpenSearch, or Valkey/Redis.
Experience supporting Node.js, React, and React Native alongside PHP.
Experience helping organizations reduce infrastructure bus-factor risk.
Experience working with external DevOps/security partners or auditors.

Who You Are

You are someone who:

Takes ownership without waiting to be told every next step.
Is calm and practical during incidents.
Can inherit messy systems without being judgmental or reckless.
Prefers consolidating on platforms we already run over adding new vendors.
Documents as you go.
Uses AI as leverage, but does not blindly trust its output; you verify, test, and apply senior judgment before anything touches production.
Knows when to automate and when to stabilize first.
Communicates clearly with technical and non-technical stakeholders.
Understands that reliability is not just uptime: it is visibility, repeatability, recovery, and shared understanding.
Wants to leave infrastructure better than you found it.

Apply for this Job

Explore More Jobs

Remote Frontend Developer Jobs Remote Backend Developer Jobs All Remote Developer Jobs