Blackpoint Cyber is the leading provider of world-class cybersecurity threat hunting, detection and remediation technology. Founded by former National Security Agency (NSA) cyber operations experts who applied their learnings to bring national security-grade technology solutions to commercial customers around the world, Blackpoint Cyber is in hyper-growth mode, fueled by a recent $190m series C round.
Why Blackpoint?
Ready to give some hackers hell? On the Blackpoint Cyber Team, we win the unfair fight while helping others protect what’s most important to them. Simply put, our team takes out the adversaries before they see us coming. Join us today and help put the bad guys in their place for good.
Blackpoint Cyber was built by former US Department of Defense and Intelligence security experts focused on stopping malicious tradecraft and safeguarding MSP operations. Our mission? Provide absolute and unified Managed Detection and Response services to organizations across the world.
Company Culture
On this team, we value high-quality execution, ownership, and strong morals. With us, principles are never tested, and we are proud to always do right by our customers. If you’re a driven professional with a passion for learning and contributing towards the best, then Blackpoint welcomes you. Our team is energetic and collaborative, maintaining a high-performance culture and enabling growth through overcoming challenges in the modern cyberthreat landscape.
What You'll Do:
We are seeking a Senior Site Reliability Engineer to help manage our world-class cyber security systems and platforms. You will be directly responsible for ensuring that Blackpoint’s Security Analysts can access mission critical systems to stop real-time cyber-attacks. In addition, you’ll help identify emerging system and performance issues and collaborate with platform and software engineers to improve system efficiency, reliability, and scalability.
Responsibilities:
Proactively monitor system performance and identify potential issues before they become critical by using Datadog, CloudWatch, Grafana, and Prometheus to implement effective monitoring solutions.
Reactively troubleshoot system and platform issues, providing real-time updates, data, and guidance to platform and software engineers.
Help resolve immediate system errors to ensure the smooth operation of engineering tools and systems.
Create runbooks and follow incident response processes to ensure consistency and reliability in resolving incidents.
Participate in postmortem analysis and work to implement corrective actions to prevent similar incidents from occurring in the future.
Create and maintain documentation for tools, technologies, and workflows used in engineering operations.
Define and track key metrics for the systems we manage, including Service Level Objectives (SLOs), Error Budgets, Mean Time to Detect (MTTD), Mean Time to Repair (MTTR), Change Failure Rate (CFR), Service Level Indicators (SLIs), Capacity Planning, and Postmortem Analysis.
Collaborate with development teams to identify areas for improvement and optimize workflows.
Requirements:
3+ years of experience working with monitoring and alerting platforms, for example Datadog, AWS CloudWatch, Grafana, New Relic, Prometheus, etc.
3+ years of experience with incident management and notification platforms, such as PagerDuty, OpsGenie, Datadog, ServiceNow, etc.
Be on-call to respond to incidents and provide support outside of business hours.
Excellent problem-solving and analytical skills.
Strong communication and interpersonal skills, with the ability to effectively collaborate with cross-functional teams.
Nice to Haves:
Experience using Terraform to create and manage alerts, dashboards, and incident systems
Experience with Prometheus Query Language (PromQL) to analyze and display metrics (PromQL)
Experience maintaining and optimizing Elasticsearch and Kibana instances.
Direct experience with Prometheus and OpsGenie
Familiarity with kubernetes clusters and kubectl commands
Familiarity monitoring and troubleshooting message bus frameworks
Experience monitoring and analyzing database performance metrics
Blackpoint Cyber welcomes and encourages applications from qualified individuals of all races, colors, religions, sex, sexual orientation, gender identity or expression, national origin, age, marital status, or any other legally protected status. We are committed to equality of opportunity in all aspects of employment.
We thank everyone for their interest, but only those candidates selected for an interview will be contacted.