Cluster Health Check

Instaclustr's Cluster Health page exposes a number of indicators to help you understand your cluster's long term performance. There are three potential states for each indicator:

  • Green represents a healthy state
  • Amber represents a warning state; and
  • Red represents failed state

Disk Usage Indicator

The Disk Usage indicator checks the percentage of space used on each node. If the disk usage is over 75%-80% in the last hour, it indicates that the node is filling up, and it is very likely that the node cannot provide enough work space for normal Cassandra operations. Please refers to Disk Usage for more details.

Suggested fix for non-healthy states:

  • Remove excess data from the cluster
  • Add more nodes to the cluster

Partition Size Indicator

Partition Size indicator checks the size of the largest partition in each table. We recommended limiting the maximum partition size to 10MB for optimal performance with 100MB as un upper limit for ongoing stability. Large partitions may significantly impact the performance of Cassandra operation. Please refer to Partition Size for more details.

Suggested fix for non-healthy states:

  • Remove the problem partition
  • Re-assess the data model as data may not be evenly distributed or is bunched into too few partitions

Tombstones to Live Cells Indicator

The Tombstones to Live Cells indicator checks the average ratio of the number of tombstones and live cells per read in each table. High ratios of tombstones to live cells (greater than 5x as a starting guide) can cause substantially reduced performance in reads from a table. Please refers to Tombstones and Live Cells for more details.

Suggested fix for non-healthy states:

  • Tune the compaction strategy to more aggresively remove tombstones
  • Re-assess the data model

Replication Strategy Indicator

The Replication Strategy indicator checks the replication class used for each keyspace. NetworkTopologyStrategy is highly recommended to ensure data is replicated to minimise impact of likely failures in your infrastructure (e.g. replicate across AWS availability zones) and to enable additional data centers to be added to the cluster without table rebuilds.

Suggested fix for non-healthy states:

  • Change the replication class to NetworkTopologyStrategy for the problem keyspaces

Replication Factor Indicator

The Replication Factor indicator checks the number of replicas set for each datacenter. A replication factor of at least 3 is required for Instaclustr SLAs to apply and highly recommended for data protection and high availability.

Suggested fix for non-healthy states:

  • Set the replication factor to three or larger for the problem datacenters (note: increasing replication factor requires repairs to be run after the change to ensure data is correctly distributed. Contact support@instaclustr.com for assistance with this operation.)
Last updated:
If you have questions regarding this article, feel free to add it to the comments below.

0 Comments

Please sign in to leave a comment.