Incident Response Plan

Download PDF

1.0 Introduction

This document outlines pivot’s plan for reporting and dealing with security incidents.

  • Incident manes any unplanned event which disrupts our service of our platform that affects customer’s ability to use it.
  • Major incident means an incident that immediately blocks critical functionality, prevents usage of our platform, or puts our customer’s data at risk.
  • All incidents must be managed in an efficient and time effective manner to ensure that the impact of an incident is controlled, and consequences are limited.

2.0 Response Team

24/7 Reporting

Team

Incident Response Plan - Role and Responsibility.PNG

3.0 Incident Response Plan Steps

3.1 Log the incident

  • Any issue found either by a member of the team or externally reported by a client or partner.
  • should immediately be logged and brought to the attention of the Product Manager.
  • The team will investigate the issue and begin the discovery process.

3.2 Identify the incident in 5 W’s:

  • When did it start? How long has it been from the occurrence of the incident to the team’s realization of the incident? Is the incident ongoing or has it passed?
  • Where is it occurring? If it’s affecting multiple areas, what is in common between these areas?
  • Who is it affecting? Who is responsible for the incident?
  • What is happening? Break it down into as detailed steps as possible.
  • Why is it happening? What are the possible reasons that can cause these steps to occur?

3.3 Notify

  • Immediately notify any impacted end users or stakeholder of the incident and what is affected.
  • Inform them of the current state of the platform and what is useable or not, and that the resolution process is underway.

3.4 Investigate

  • Once the incident has been discussed and broken down, assign the necessary SMEs (Subject Matter Experts) to begin investigating each part and list out and possibilities that caused the incident (take ~ 15-30 minutes to investigate).
  • Get back together and discuss the results of the investigation:
    • Determine what are the most likely scenarios and prioritize them by likelihood.
    • After that first prioritization, determine the risk of each scenario (physical, client relationship, company reputation, monetary) and adjust the prioritized list according to their respective risks.
    • Begin brainstorming how each scenario can be tackled starting with the top 3-5 highest priorities (how many to brainstorm depends on the number of possibilities the more there are, the more should be thought through).
    • Identify the team members responsible for acting on each scenario and assign them tasks. Sometimes, tasks may be specific and only certain team members will have the skillset suited to attacking them; in this case, other team members should make themselves an available resource and assist in any way possible.

3.5 Resolve

After the roles and tasks have been assigned, separate and begin working on testing each possibility to see if it is the reason that the incident occurred.

  • If something is found to be a potential reason, immediately report it to the coordinator and re-test with different inputs to ensure whether this is the case or not.
  • If the reason is found (or a potential one), the team will re-convene and have the necessary team members assist the primary SME who found the issue to find the root cause and resolution steps of the incident.
    • Once a solution is found, run multiple tests to ensure that the fix has resolved the issue by performing different attempts at trying to “break” the solution (i.e. ensure that the solution has no weaknesses).
      • If the initial reason isn’t the answer, go back to the list of prioritized scenarios and work down the list to identify the reason.
  • Implement the solution immediately (or as soon as possible).

3.6 Inform

  • Inform any impacted end users or stakeholders once operations have become stabilized.
  • Update on the current state of the platform, the reason behind the incident, and measures that have been implemented to resolve and prevent future occurrences.
  • Schedule any update meetings that are requested or required.

3.7 Review

  • After the situation has stabilized and stakeholders are informed, schedule a team meeting within the next 2 days to review the incident and the team’s response. Discuss steps that can be taken to improve the response, why the original incident happened, and any vulnerabilities that exist from the incident or in relation to it.
  • Any necessary updates will be made to:
    • Infrastructure
    • Incident response plan
    • System security protocols

*** If the solution to the incident is a temporary solution, after the procedure is complete, the team will sit down and create a plan of action to implement preventative procedures and improvements as soon as possible. The priority is to immediately resolve the issue and get functionality and security back.

4.0 Prevention

Various measures are already in place to prevent the occurrence of an incident, or to immediately identify when one occurs. These include, but are not limited to:

  • Processor log track: A tracking dashboard that measures when a processor has stopped sending log files. If a processor does not send a log file for 10 minutes, an alert is created and emails are sent to the appropriate pivot team members, with reminder pings every hour until the processor sends a file again
  • Restrict log file sizes to ~ 60kB per file
    • To prevent any potential backlog of processor delays, the size of logs files is limited so that it is sent immediately after it reaches 500 lines (approximately 60kB).
    • This also provides a defense against flooding attacks for our AWS functions, where malicious users may continually send massive files of fake data to both flood the cloud server capacities and send data that compromises the integrity of the data displayed on the platform.
  • Only retry sending log files once if the first attempt fails. If the log file fails to be sent twice, dump the log file and continue writing the next one and attempt to continue sending as planned
    • This prevents a backlog of files that can fill the disk space of a processor.
  • AWS: All the services used in AWS are set to the highest levels of encryption and security, and require combinations of keys as well as access level to obtain/change data or functions.