Dell Jobs

dell footer logo

Job Information

Dell Site Reliability Engineer, Observability in Bedford, Massachusetts

Job Posting Title: Site Reliability Engineer, Observability

RSA creates a wide range of industry-leading products that allow customers to take control of risk. Whether those risks stem from external cyber threats, identity and access management challenges, online fraud, compliance pressure or any number of other business and technology issues.

Our customers expect our services to meet all availability and performance SLAs. We are building out expertize in Site Reliability Engineering and expanding our use of DevOps methodologies. As a new role for the global 24/7 SaaS Operations group, this is an exciting opportunity for a seasoned engineer to have a positive impact across all teams and services.

You will work closely with Engineering Architecture, Development, Infrastructure, DBA, Application Support, Security Operations and our NOC. You will ensure that tools provide the required visibility in to environments for efficient, effective support, Root Cause Analysis and predictive analytics.

You will be expected to be able to understand operational issues across the full stack. You will also need to understand how to create common processes and systems to cover heterogeneous environments across the cloud and in traditional datacenters.

PRINCIPAL DUTIES AND RESPONSIBILITIES

  • Research, evaluate, develop, maintain and support observability tool suite across cloud and data center environments

  • Partner with development teams to ensure applications are instrumented to provide visibility of performance metrics

  • Develop automations and integrations for deployment of monitoring tools

  • Develop and maintain external synthetic monitoring and RUM

  • Improve root cause identification speed and efficiency

  • Work cross-functionally to define KPIs used to measure operational efficiency, capacity and availability of environments

  • Generate internal and customer facing dashboards and reports required by engineering and product support teams

  • Support activities that ensure that monitoring infrastructure meets all security and compliance requirement

KNOWLEDGE & SKILLS

  • Experience integrating monitoring cloud, AWS/Azure (Flow Logs, CloudTrail, CloudWatch, GuardDuty etc)

  • Experienced with DataDog/Dynatrace (or equivalent) for root cause analysis of performance issues, capacity, reliability, and scalability

  • Experience with additional Open Source monitoring tools preferred (Grafana, Prometheus, ELK, Hobbit etc.)

  • Experience with web servers and application stacks (Tomcat, JBoss, Nginx, Apache, .NET)

  • Scripting/coding skills (e.g., Ruby, Python, Java)

  • Experienced with RUM and external performance monitoring dashboards (Pingdom)

  • Working knowledge of code pipeline tools advantageous

  • Working knowledge of Linux, Windows, virtualization stacks, databases, storage and networking devices

  • Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures

  • Experience with Infrastructure Monitoring (Solarwinds preferred)

  • Problem solving skills and ability to work in a rapid paced, customer facing, 24/7 production environment

  • Proven successful project management skills and technical leadership

  • Excellent written and verbal communication and documentation skills

  • Ability to work within a global team and strong work ethic

EXPERIENCE

  • 3+ years’ experience with monitoring applications and infrastructure stacks

  • Experience with AWS/Azure cloud and traditional datacenters required

  • Hands-on experience troubleshooting and tuning preferred

  • 10 + years and a BS in CS, IT, or related field or equivalent work experience

" LI Priority "

DirectEmployers