Back to jobs
A

Incident Response Analyst II

Singapore, SingaporePosted 2 days ago
Full-timehybrid

Job Description

Knowledge, Skills & Abilities:

Incident & Problem Management

Analysts are responsible for the full lifecycle of incident management, from detection through to resolution and root cause analysis (RCA). This includes acting as incident commanders, maintaining SLAs, documenting actions, and providing insights to support continuous improvement efforts across teams and systems.

●      Investigate, report, and respond to alerts, incident response (war room, remote bridges).

●      Respond to incidents and critical situations in a calm, problem-solving manner, and conduct in-depth investigation of alerts.

●      Be the first line of defense using monitoring and automation tools to conduct investigation, classification, and triage, all within prescribed SLAs.

●      Provide deep understanding and intelligence of incident criticality and impact to resolver groups.

●      Ensure detailed records of alarm handling activities, including actions taken and resolutions in ticketing tools; file incident reports.

●      Act as incident commander during major incidents.

●      Understand internal/external communication methods and stakeholder responsibilities.

●      Support program managers and facilitate project deliverables, improving operational and engineering initiatives.

●      Conduct root cause analysis (RCA) to determine recurring problems.

●      Use in-depth questioning and analysis to determine the underlying cause of incidents or problems (Who, What, Where, When, Why).

●      Perform duties in compliance with SOPs, MOPs, Runbooks, and Playbooks.
 

Server, DCIM, Network and Traffic Alarms Operations

This function involves real-time monitoring of infrastructure alarms, determining the severity of alerts, escalating appropriately, and maintaining clear communications with resolver teams. It ensures uptime and system integrity across servers, network infrastructure, and environmental systems.

●      Continuously monitor alarm dashboards and systems.

●      Investigate and respond to alarms related to Network and Server Health.

●      Identify and acknowledge incidents associated with alarms.

●      Assess incidents to determine their criticality and operational impact.

●      Engage resolver groups and escalate to higher tiers or management following established paths.

●      Maintain communication with teams, stakeholders, and incident responders.

●      Follow documented procedures to resolve incidents promptly and effectively.

●      Ensure accurate records of alarm handling and resolution activities in ticketing tools.

●      Comply with SOPs, MOPs, Runbooks, and Playbooks.
 

Threat Intelligence, Critical Event Management

Analysts monitor global threat feeds and operational alerts to protect ByteDance personnel and assets. Responsibilities include triaging alerts related to weather, security, travel, and regional instability, then coordinating appropriate response actions, escalating to law enforcement if necessary, and compiling response reports.

●      Monitor Everbridge Visual Command Center (VCC), InternationalSOS emails, and open-source tools for real-time incidents affecting ByteDance assets and travelers.

●      Monitor tools or queries for specific stakeholder requests.

●      Report on violence, severe weather, or threats to life, property, and assets.

●      Coordinate emergency responses, including with law enforcement if required.

●      Verify incident information accuracy through secondary sources.

●      Generate heatmaps to highlight affected areas during significant events.

●      Collaborate with security and operational teams for a coordinated response.

●      Implement incident containment and mitigation strategies.

●      Document incident details, response actions, and lessons learned.

●      Follow SOPs, MOPs, Runbooks, and Playbooks.
 

Physical Security and Safety

The analyst monitors access control systems, CCTV, and safety-related alarms (e.g., fire, electrical, leaks). Responsibilities include reviewing footage, responding to security anomalies, and reporting incidents to security engineering teams while ensuring compliance with safety procedures

●      Monitor Closed-Circuit Television (CCTV) and Access Control Systems (ACS).

●      Review camera footage for quality and area coverage.

●      Investigate and report access control incidents.

●      Report findings to the Security and Safety Engineering teams.

●      Follow SOPs, MOPs, Runbooks, and Playbooks.

●      Familiarity with Lenel and Genetec systems.
 

Qualifications

Required Qualifications / Soft Skills

●  2+ years of experience in a NOC, command center, or similar 24/7 operations environment

●  Ability to quickly triage and prioritize multiple incidents based on risk

●  Knowledge of systems including IP Networks, DC Environment, and Server Health

●      Strong written and verbal communication skills

●      Works well under pressure and within deadlines

●      Excellent communication and collaboration abilities

●      Strong analytical and problem-solving skills

●      Ability to work independently and as part of a team

●      Familiarity with data protection laws such as GDPR

●  This is an on-site role at client facilities

●      Must be willing to work variable shifts, including nights, weekends, and holidays
 

Preferred Qualifications

●      Degree in Information Technology

●      Networking knowledge (IP, DNS, load balancing)

●      Experience with Grafana, ticketing systems, and DC infrastructure.

●      Certifications such as CompTIA Server+

●      Experience with Lenel, Genetec, or Avigilon systems is a plus

●      Proficiency with programming/scripting tools

Incident Response Analyst II at Astreya | Renata