Site Reliability Engineer
At Pearson, we’re committed to a world that’s always learning and to our talented team who makes it all possible. From bringing lectures vividly to life to turning textbooks into laptop lessons, we are always re-examining the way people learn best, whether it’s one child in our own backyard or an education community across the globe. We are bold thinkers and standout innovators who motivate each other to explore new frontiers in an environment that supports and inspires us to always be better. By pushing the boundaries of technology — and each other to surpass these boundaries — we create seeds of learning that become the catalyst for the world’s innovations, personal and global, large and small.
The Site Reliability Engineer will be a critical part of a small team focused on ensuring that our critical services are ready and battle tested. This role will require a generalist who can contribute with needs in development, system operations, resiliency testing, security hardening, and performance engineering. The SRE should be comfortable with taking on new engineering challenges, defining potential solutions, and implementing designs in a team environment. This position will play an important role in our organization’s evolution towards contemporary application and infrastructure management practices and will be expected to both guide and support the team’s growth and learning.
- Provide technical leadership to a growing team focused on applying software engineering practices to operations at scale.
- Monitor and report on service level objectives for a given applications services. Work with business and product owners to establish key performance indicators.
- Participate in conducting technical training events, game day scenarios, and focused engineering spikes.
- Design and architect operational solutions for managing applications and infrastructure, with the specific goal of increasing the automation, repeatability, and consistency of operational tasks.
- Create and maintain monitoring technologies and processes that improve the visibility to our applications' performance and business metrics and keep operational workload reasonable.
- Sponsor healthy software development practices – including complying with the chosen software development methodology (Agile, or alternatives), building standards for code reviews, work packaging, etc.
- Persistent testing of application and infrastructure resiliency over a variety of error conditions.
- Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
- Collaborate with internal teams to ensure that operational development solutions meet business requirements.
- Provide architectural and practical guidance to software development to improve resiliency, efficiency, performance, and costs.
- Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.
- Minimum of a Bachelor's Degree in Computer Science, Computer Engineering, Software Engineering, MIS, or other related discipline required. Master's Degree preferred.
- Minimum of 4 years prior relevant software development experience required.
- Prior experience in architecting cloud-based solutions on AWS.
- Prior experience in managing EC2, EBS, and S3 on AWS.
- Proficiency in Ruby or Python desired.
- Familiarity with Chef/Puppet/Ansible or other configuration management tools desired.
- Familiarity with cloud computing concepts desired and infrastructure as a service offerings. (AWS or other public cloud providers)
- Familiarity with container technologies, orchestration, and container deployment using Docker, rkt, ECS, Rancher, or Kubernetes.
- A strong understanding of diverse infrastructure platforms and infrastructure concepts required.
- A strong understanding of the SDLC and the Agile software development methodology required.
- Versatility as demonstrated with troubleshooting diverse sets of hosting technologies (web server platforms, Java application platforms, operating systems, network components, virtualization technologies, database platforms) strongly desired.
- Understanding of general networking concepts and protocols desired.
- Experience in a production environment supporting mission-critical applications desired.
- Knowledge of standard production practices including change management desired.
Primary Location: US-TX-San Antonio
Other Locations US-TX-Austin, US-IA-Iowa City, US-CO-Boulder
Work Locations: US-TX-San Antonio-19500 Bulverde 19500 Bulverde Road San Antonio 78259
Organization: Assessments School
Employee Status: Regular Employee
Job Type: Standard
Shift: Day Job
Job Posting: Oct 26, 2017
Job Unposting: Ongoing
Schedule: Full-time Regular
Req ID: 1717362