Title:  Observability / Site Reliability Principal Engineer

Job ID:  16462
Location: 

ST Engineering Jurong East Bui, SG

Description: 

We are seeking a skilled Observability Principal Engineer with at least 2-3 years of experience in observability to join our dynamic team. In this role, you will be responsible for implementing, managing, and optimizing observability tools. You will work closely with cross-functional teams to ensure that our systems are monitored effectively, and issues are identified and resolved proactively.

 

Key Responsibilities:

  • Design, implement, and maintain observability frameworks using tools such as Prometheus, Grafana, ELK Stack, tableau or similar.
  • Design, implement, and maintain Monitoring tools such as BMC, CA, SolarWinds, SCOM, Dynatrace, Datadog or similar.
  • Create and manage dashboards, visualizations, and reports to communicate system health and performance metrics.
  • Collaborate with the sales team to understand client requirements and demonstrate how our observability solutions can address their specific needs.
  • Prepare and deliver presentations, demos, and workshops to potential clients showcasing the capabilities and benefits of our observability tools.
  • Troubleshoot and resolve tools-related issues in a timely manner.
  • Assist in the training and mentoring of team members on observability and monitoring tools and practices.

 

Job Requirements:

  • 2-3 years of experience in software development, Implementation, operations, or a related field with a focus on observability tools.
  • Proficiency in implementing and managing observability tools.
  • Solid understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
  • Experience with scripting languages (Python, Bash, etc.) for automation tasks.
  • Knowledge of best practices in monitoring, logging, and incident management.
  • Strong analytical skills with the ability to diagnose issues and propose effective solutions.
  • Excellent communication and collaboration skills, with a proactive approach to problem-solving.
  • Technical experience in Enterprise Monitoring tools such as Dynatrace, Grafana, BMC,
  • Knowledge of Automation tools, Cloud Technologies and DevOps Concepts, Open systems and Networking Technologies
  • Good knowledge in various monitoring tools e.g. BMC, SolarWinds, CloudWatch and Azure.
  • Experience with configuration management tools (Ansible, Terraform, etc.).
  • Familiarity with APM (Application Performance Management) tools such as New Relic, Dynatrace, or similar.
  • Understanding of network protocols and architectures.
  • Experience with orchestration tools (e.g., BMC, Kubernetes, Apache Airflow, Jenkins) to create and manage automated workflows for deploying, monitoring, and scaling observability solutions.

Preferred Qualifications:

  • Proficiency in observability tools (e.g., Grafana, ELK Stack, Datadog, Prometheus etc).
  • Proficiency in ITOM tools (e.g., BMC, Dynatrace, CA, SCOM, IBM, SolarWinds etc).
  • Strong understanding of monitoring and logging frameworks.
  • Experience with distributed systems and microservices architecture
  • Ability to write scripts for automation and data analysis.
  • Experienced in cloud platforms (AWS, Azure, GCP) and their monitoring services.
  • Experience with CI/CD pipelines and infrastructure as code (IaC) tools like Terraform or Ansible
  • Relevant certifications in cloud computing, DevOps, or observability tools can be a plus.

 

Work location: Jurong East