High Availability Policy

The RosettaHealth High Availability Policy establishes policies and procedures designed to ensure continuous availability of HealthBus to customers . This policy is maintained by the RosettaHealth Security Officer and CTO.

This RosettaHealth High Availability Policy has been developed as required under the Office of Management and Budget (OMB) Circular A-130, Management of Federal Information Resources, Appendix III, November 2000, and the Health Insurance Portability and Accountability Act (HIPAA) Final Security Rule, Section §164.308(a)(7), which requires the establishment and implementation of procedures for responding to events that damage systems containing electronic protected health information.

Applicable Standards

Applicable Standards from the HITRUST Common Security Framework

  • 12.c - Developing and Implementing Continuity Plans Including Information Security

Applicable Standards from the HIPAA Security Rule

  • 164.308(a)(7)(i) - Contingency Plan

Architecture Based Approach

The architecture of HealthBus is based on the concept of high availability (HA). High availability is defined as providing a solution that is resilient to unexpected surges in demand as well as unexpected degradation of capability. There are 5 main mechanisms that provide this HA capability.

  1. Two geographically separated data centers

  2. Replication of all platform components (servers and systems) across both data centers

  3. Rapidly scalable capacity.

  4. Multi-level load balancers that route traffic between not only each data center but between the platform components within the data centers. 

  5. Use of multi-az serverless components.

  6. 24/7 monitoring of RosettaHealth components 

Geographically Separated Data Centers

HealthBus is hosted in Amazon Web Services at 2 distinct locations, or Availability Zones (AZ) in the Northern Virginia Region. "Each Availability Zone is designed as an independent failure zone. This means that Availability Zones are physically separated within a typical metropolitan region and are located in lower risk flood plains (specific flood zone categorization varies by AWS Region). In addition to discrete uninterruptable power supply (UPS) and onsite backup generation facilities, they are each fed via different grids from independent utilities to further reduce single points of failure. Availability Zones are all redundantly connected to multiple tier-1 transit providers. https://docs.aws.amazon.com/whitepapers/latest/aws-overview/global-infrastructure.html” .

Replication of Platform Components

HealthBus is a system of systems comprised of multiple components that fall into one of three categories.

  • HealthBus components developed and/or maintained by RosettaHealth

  • AWS Infrastructure components managed by ClearDATA

  • AWS cloud services services managed by ClearDATA.

For HealthBus components developed and/or maintained by RosettaHealth, each component is duplicated in each AZ. Those components rely on AWS infrastructure components (ex EC2, EBS, …) that is managed by ClearDATA. RosettaHealth technical team coordinates with with ClearDATA to ensure that each component is available for supporting HealthBus components. Additionally AWS cloud services are utilized (ex Lambda, RDS, S3, …). All of these services are redundant across multiple AZ.

Rapidly Scalable Capacity

The use of 2 AZ provides sufficient capacity for RosettaHealth normal operations. In addition if demand on components suddenly increases, either due to customer usage or degradation in one AZ, individual platform components can be changed, either manually or automatically to handle the increase. Therefore most all operations can be continued in a single AZ for at least a limited amount of time if needed.

Multi-Level LoadBalancers

Supporting the replication of HealthBus components across two AZ is the use of 2 levels of loadbalancers. The Level 1 are AWS Network Load Balancers (NLB) used to load balance traffic between the two AZ. All inbound traffic is passed to one of the two NLB. Each NLB with then pass the traffic to the Level 2 loadbalancers in a round-robin pattern. If a NLB detects that either of the Level 2 loadbalancers is not available it will re-route to the available loadbalancer. The Level 2 loadbalancers are based on HaProxy running on 2 EC2 instances each in a different AZ. Each of these level 2 loadbalancers route traffic to the appropriate HealthBus component. These Level 2 loadbalancers also use a round-robin strategy to distribute traffic to components across both AZ.

24/7 Monitoring

Overseeing this HA architecture is a set of policies and procedures described in the RosettaHealth Auditing Policy concerning Monitoring and Alerting. These support the HA approach by providing continuous monitoring of all of the components. Whenever a component experiences any issues that may impact it's operations an alert is triggered and appropriate support personnel notified. Issues can include things such as:

  • Infrastructure capacity issue (ex CPU Load, Storage, Memory, ...)

  • Unexpected traffic volumes (either higher or lower than expected)

  • Unexpected error conditions or volumes.

Support personnel can triage the issue can then take appropriate action to restore the components or take other remediating actions. This provides HealthBus with a realtime Disaster Recovery capability both at the data center level as well as the individual component level. 

Responsibilities

The RosettaHealth Tech Team is responsible for working with ClearDATA in setting up the HA capabilities of the RosettaHealth production environment in AWS to include AWS services, network services, and all EC2 servers. The RosettaHealth Tech Team is directly responsible for assuring all RosettaHealth Platform components are working.

Testing and Maintenance

The HA capability is routinely tested as part of normal maintenance operations.  During these operations one component in one data center will be taken off-line and traffic is automatically rerouted to the healthy system in the other data center.  Once the maintenance operation on the one component is complete it is brought back up and added back on to the loadbalancer(s).