Sr. Platform & Reliability Engineer (Remote)
As a Sr. Platform & Reliability Engineer you will live, eat, and breathe the principles of availability, performance, reliability, and automation. You will be constantly presented with new challenges of sizable scope and variety. You will maintain a close partnership with development teams; helping them architect and implement their applications and environments via new and ground-breaking methods that break the traditional infrastructure model.
This position, under the direction of the Sr. Manager, Platform & Reliability Engineering, will be responsible for delivering knowledge and experience of the DevOps and SRE domains, including production support and cloud service delivery as well as experience of CI/CD.
Successful candidates will be humble, yet passionate and self-motivated. They will be strong leaders who can prioritize well, communicate clearly, and have a consistent track record of identifying opportunities and creating efficiencies. We welcome those who see things differently, aren’t afraid to experiment, practice the fail fast/fail forward philosophy, believe that if you have to do it more than once-you automate, and are comfortable having healthy discussions/debates with teammates and peers to drive the aforementioned principles.
Reports To: Sr. Manager, Platform & Reliability Engineering
Essential Duties and Responsibilities:
- Remain curious! Meaning you research and present new technology trends, influencing peers and leadership toward adoption, while always questioning the industry standards or status quo.
- Collaborate closely with other Solution Centers to understand workload/technical requirements and guide them to the best leverage of infrastructure cloud services, optimizing for performance, cost and architectural flexibility
- You are never satisfied with the performance you are seeing and always know you can get a little bit more if you pull this lever. You consistently improve developer experience, availability, performance, and reliability via automation, observability, and related efficient tooling.
- Design, implement and roll out solutions that leverage integration of home-grown, open source and 3rd party solutions to provide a high-performing continuous delivery pipeline that fits with the development teams’ needs as well as Designer Brands’ long-term strategy
- Define reusable components, frameworks, common schemas, standards, and tools, influencing their usage across teams
- Assist in building world-class, multi-cloud capable, state-of-the-art products by:
- Automating build and deployment processes
- Automating verification, rollback, and scaling bi-directionally
- Including A/B, Canary, Blue/Green deployment patterns
- Building highly resilient cloud eco-systems capable of high availability and scale
- Using Docker containers, Kubernetes as an orchestrator, Small Function Sets, or as full VMs with base images
- Mastering Layer-7 Traffic Management Technologies as code for Efficient Delivery
- Implementing observability as code (Metrics, Logging, Tracing, Alerting)
- Influence, Implement, and continuously refine operational processes, ensuring a balance between speed, agility, and adherence to policy
- Utilize the combination of above-mentioned items to create a Next-Generation Platform for DBI Application Delivery
- Evolve infrastructure, server, deployment strategies and testing to support our goal of 100% up time and quick turnaround of deployments for the application development organization
- Mentor and provide technical oversight and guidance to team members and cross-functional partners, improving their skills, knowledge of our systems, and their ability to get things done!
- Possess the ability to troubleshoot technology you know, and technology you don’t know. Sometimes you will have to lead issues where you may not be versed on all the technology under the covers. You will need to get with your team to bring resources together to fill the gaps.
- Participate in industry groups to gain visibility to trends and influence future direction
Required Skills:
- Subject matter expertise in a wide range of infrastructure related domains, with a track record of large production grade service deployment and IT operations in a 24/7 setting
- Ability to take technical and/or business requirements and translate them into detailed infrastructure solution designs
- Expert knowledge of container solutions and their management (Kubernetes, Docker, OpenShift)
- Expert knowledge of Infrastructure as Code frameworks such as Puppet, Chef, Ansible, and Terraform, ArgoCD, Flux
- Knowledge of one or more Layer 7 Traffic Management Application such as F5, Pulse Secure vATM, AVI, Envoy, or Nginx(Plus)
- Demonstrated Programming/Scripting skills or the ability to read and modify: Bash, Python, Ruby, C, or Golang.
- Excellent communication, presentation and leadership skills
Competencies:
SETTING GOALS – Creates and follows effective plans. Anticipates risks, creates contingency plans. Aligns plans with goals. Allocates adequate resources. Accepts and supports change. Willing to take risks and suggests new ideas, approaches. Takes initiative. Seeks out learning activities.
WORKING WITH OTHERS – Clearly articulates own, other’s goals. Promotes a team atmosphere by demonstrating humility and respect. Builds effective relationships, relates well to others. Delivers and responds to feedback in a constructive manner. Considers multiple perspectives. Handles conflict, pressure, uncertainty and adapts independently. Meets commitments. Dedicated to working with business partners on their expectations.
GETTING RESULTS – Personally accountable for work performance targets and achieving results. Prioritizes well. Anticipates and handles obstacles effectively. Makes good, timely decisions. Can simplify and process complex problems. Understands underlying issues and addresses root causes. Meets deadlines, works until finished.
Qualifications:
Experience:
- 5-7+ years’ experience as part of large-scale engineering teams or commerce environments where downtime is not acceptable
- 3+ years’ experience supporting container runtimes and orchestration such as Docker, Docker-swarm, Kubernetes/K8S, Mesos, and Nomad IN PRODUCTION
- In-depth understanding of cloud native design patterns (Infrastructure as Code, Microservices)
- Experience with Content Delivery Networks and Related Offerings (Akamai, Cloudflare, Fastly)
- Strong aptitude for learning new technologies and understanding how, when, and where to best utilize them
- Experience with offerings for cloud (Azure, AWS, GCP) and on-prem (VMWare, OpenShift, etc.) solutions
- Experience utilizing best of breed processes to improve day to day operations
- Experience with modern development tools such as Git, Jenkins, Azure DevOps, Jira, etc.
- Admin-level experience supporting and developing Linux/Unix based environments
- Admin-level experience in infrastructure and network (DNS, DHCP, IPAM, NTP, LB, etc.)
Preferred Qualifications:
- Experience in Retail preferred, but not required
Education:
- Bachelor’s degree in relevant field or equivalent work experience.
#LI-Remote
Nearest Major Market: Columbus
Job Segment:
Open Source, Developer, Linux, Unix, Application Developer, Technology