Job Description
Platform Engineer
Remote (UK-based) | Full-time | Salary: £68,671 + benefits
Do your best work, for the right reasons.
Oak is a fully remote, mission-driven organisation offering high levels of flexibility, autonomy, and purpose. We’re a national not-for-profit working in partnership with teachers to create the highest-quality, sequenced curriculum and lesson resources for pupils across all subjects and age groups.
About The Role
In this role, you will be working with engineering, product and research colleagues to build confidence using observability principles that aids our understanding of our users and help us continually improve our products. We work together in product squads alongside designers, researchers and education experts, regularly releasing new features and improvements to give teachers and their pupils quick and easy access to the highest quality learning resources.
As a young organisation we have been able to leverage the latest technologies to rapidly build and deliver the game changing products we have. Now that we’ve proven ourselves and are established, we want to mature our processes to ensure we are getting the best out of the technology and remain able to respond quickly to business needs. We see this role as being a key part of that change.
You will be tasked with raising our monitoring and observability to a high standard across all our key applications while working closely with engineering teams to help them improve the stability of their applications and give engineers more sense of ownership.
You will also drive site reliability engineering principles and be a key driver of automation by working alongside other members of the platform team, helping to improve the overall developer experience.
Candidates must have a good understanding of SRE principles and the value they bring to an organisation. While a good grounding in development practices, security fundamentals and infrastructure operation are key, specific technical skills are less important than a passion for automation, an ability to understand complex systems and a keenness to learn.
Responsibilities
- Lead the continuous improvement of the observability, performance, and reliability of our web applications (Next.js, JavaScript, Typescript, Node), Serverless Functions (Google Cloud Functions, Cloudflare). Deployed on PaaS Infrastructure (Vercel, Cloudflare).
- Promote and nurture a culture of quality across the product and engineering department, enabling teams in using SLO/SLAs to ensure they maintain a high quality of service delivery.
- Take ownership of our observability, monitoring, logging and reporting solutions to ensure they are easy to use and provide development teams with the information they need to understand service quality, resolve problems quickly, and get meaningful insights into application behaviour.
- Identify and implement ways in which automation can be used to speed up development, secure systems or improve the quality of the services we provide.
- As a member of the Oak Team, you will contribute to the wider success and culture of the organisation and support and role model our five values: create the right environment, be a great colleague, own your role but work for the team, make things happen, and keep getting better.
- Work in cross-functional and product-oriented squads with colleagues from across the organisation, as required. Oak has a strong focus on collaboration and mentoring.
- Deputise for other members of the Platform team and take on other general responsibilities as required.
Requirements
- The ideal candidate would have strong professional experience leading the continuous improvement of event-driven architectures using Serverless technologies such as Google Cloud Run, AWS Lambda or Azure Serverless.
- Considerable experience in designing and implementing monitoring, observability and reporting solutions for complex cloud infrastructures within a major cloud provider (GCP, AWS, Azure). In production we’re using Datadog as our main monitoring platform.
- Confident in understanding and maintaining web application code and able to design and build small apps, preferably using JavaScript/TypeScript.
- Experience working with Cloud computing platforms and a familiarity with Infrastructure as Code tools. We’ve chosen Terraform as our Infrastructure as Code tool.
- Comfortable promoting and leading a spirit of collaboration with a range of technical and non-technical stakeholders.The successful candidate will have a desire to contribute in all areas to ensure Oak is successful. You will be comfortable working at pace, with a range of digital systems (including proprietary ones as required) and you will continuously look at ways that the team can keep getting better. You will be excellent at working as part of a remote team, building relationships and managing your time effectively.
Benefits
- 25 days annual leave, plus one extra day for each year of service (up to 28)
- Additional Oak closure days over Christmas/New Year
- 11% employer pension contribution (with no minimum employee contribution)
- A 36-hour working week, with half-days on Fridays or every other Friday off
- Fully remote working — we’ll support your home set-up and offer coworking options if preferred
- Twice-yearly in-person offsites to collaborate, connect, and have fun
- A culture that genuinely supports flexibility, autonomy, and trustThe successful candidate will have a desire to contribute in all areas to ensure Oak is successful. You will be comfortable working at pace, with a range of digital systems (including proprietary ones as required) and you will continuously look at ways that the team can keep getting better. You will be excellent at working as part of a remote team, building relationships and managing your time effectively.