Responsibilities
Be the Digital Guardian: Architect and ensure Deriv’s systems can withstand challenges, building a robust infrastructure that keeps operations running smoothly.
Lead with Strategy: Develop and oversee disaster recovery (DR) plans, mentor your team, and guide us through the evolving IT resilience landscape. Provide leadership in Business Impact Analysis (BIA) and DR planning.
Design Resilient Systems: Implement advanced DR solutions for critical cloud services (AWS, GCP) ensuring scalability, durability, and readiness for any situation.
Anticipate Risks: Conduct in-depth risk assessments, leverage machine learning, and prepare for the unexpected by leading exercises. Ensure DR strategies meet or exceed RTO and RPO targets.
Automate Recovery Processes: Develop frameworks and tools (Chef, AWS Step Functions, Ansible, Terraform, AWS CloudFormation) to streamline recovery, minimize downtime, and accelerate recovery speed.
Test and Improve: Design comprehensive testing and validation protocols for DR drills, using observability tools (Grafana, AWS CloudWatch). Stress-test systems with chaos engineering techniques to ensure DR plans are effective.
Collaborate Across Teams: Partner with various teams to ensure everyone understands their role in DR. Work closely with architects, system engineers, and security specialists.
Lead in Crisis: Take charge during disruptions, coordinating recovery efforts and minimizing the impact with steady leadership.
Ensure Compliance: Stay on top of regulatory requirements, particularly in the financial sector, and provide detailed performance reports to senior leadership and regulatory bodies.
Commit to Continuous Improvement: Keep up with industry trends, continuously improve DR capabilities, and share knowledge with the team.
Requirements
10+ years of experience in disaster recovery, business continuity, or a related field
3+ years in a leadership role in a technical environment
In-depth knowledge of AWS services for disaster recovery (AWS Backup, Amazon RDS Multi-AZ, AWS Elastic Disaster Recovery, CloudFormation, Global Accelerator, Fault Injection Simulator)
Proficiency in managing DR in cloud environments (AWS, GCP) using tools like Terraform, Chef, Docker, Kubernetes, and Octopus Deploy
Strong understanding of modern architectures (microservices, serverless, containerization)
Proven experience leading complex DR projects
Familiarity with Agile methodologies and Business Continuity principles
Excellent analytical, problem-solving, and communication skills
Bachelor’s degree in Computer Science, Information Technology, or a related field
Master’s degree or certifications (e.g., CDRP, CISSP, ISO 22301 LI) are a plus
What We Offer
Health Insurance
Visa
Paid Annual Leaves
Maternity and Paternity Leaves