ic-tools for Cassandra sstables

Overview 

Instaclustr have developed a number of useful tools to assist with diagnosing issues in a cluster. For users of Instaclustr's Managed Service, our Technical Operations team will run these as needed when working with you to help diagnose issues. The tools are available on a supported basis for our enterprise support customers and on an unsupported basis for the general community (although we'll probably answer questions on the C* user email list).

These tools supplement the information available from the nodetool utility that is part of core Apache Cassandra. Whereas nodetool tends to report based on summary statistics maintained as Cassandra services operate, ic-tools directly read Cassandra's data files when executed to report more detailed and accurate statistics.

As such, executing the tools can result in a large amount of data being read which can potentially impact the performance of a node where they are being executed. The two most data heavy tools (ic-cfstats and ic-purge) provide rate limiting functions to reduce the impact. However, users are advised to execute care when using these tools in a live cluster.

These tools are version-specific and you must use the corresponding ic-tools version for your Cassandra version. We have provided pre-built jars for all versions of Cassandra at the bottom of this page.

The source code is published on github

Command

Description

ic-summary

Summary information about all column families including how much of the data is repaired

ic-sstables

Print out metadata for sstables the belong to a column family

ic-pstats

Partition size statistics for a column family

ic-cfstats

Detailed statistics about cells in a column family

ic-purge

Statistics about reclaimable data for a column family

(We've generally used the old-school C* term 'column family'. It is synonymous  with 'table' in modern C* versions.)

ic-summary

Provides summary information about all column families. Useful for finding the largest column families and how much data has been repaired by incremental repairs.

Usage

ic-summary

Output

Column

Description

Keyspace

Keyspace the column family belongs to

Column Family

Name of column family

SSTables

Number of sstables on this node for the column family

Disk Size

Compressed size on disk for this node

Data Size

Uncompressed size of the data for this node

Last Repaired

Time of the last incremental repair

Repair %

Percentage of data marked as repaired by incremental repair

 

ic-sstables

Print out sstable metadata for a column family. Useful in helping to tune compaction settings.

Usage

ic-sstables <keyspace> <column-family>

Output

Column

Description

SSTable

Data.db filename of sstable

Disk Size

Size of sstable on disk

Total Size

Uncompressed size of data contained in the sstable

Min Timestamp

Minimum cell timestamp contained in the sstable

Max Timestamp

Maximum cell timestamp contained in the sstable

Duration

The time span between minimum and maximum cell timestamps

Level

Leveled Tiered Compaction sstable level

Keys

Number of partition keys

Avg Partition Size

Average partition size

Max Partition Size

Maximum partition size

Avg Column Count

Average number of columns in a partition

Max Column Count

Maximum number of columns in a partition

Droppable

Estimated droppable tombstones

Repaired At

Time when marked as repaired by incremental repair

 

ic-pstats

Tool for finding largest partitions. Reads the Index.db files so is relatively quick.

Usage

ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

 

-h

Display help

-b

Batch mode. Uses progress indicator that is friendly for running in batch jobs.

-n <num>

Number of partitions to display in leaders lists

-t <name>

Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.

-f <files>

Comma separated list of Data.db sstables to filter on

 

Output

Summary: Summary statistics about partitions

Column

Description

Count (Size)

Number of partition keys on this node

Total (Size)

Total uncompressed size of all partitions on this node

Total (SSTable)

Number of sstables on this node

Minimum (Size)

Minimum uncompressed partition size

Minimum (SSTable)

Minimum number of sstables a partition belongs to

Maximum (Size)

Maximum uncompressed partition size

Maximum (SSTable)

Maximum number of sstables a partition belongs to

Average (Size)

Average (mean) uncompressed partition size

Average (SSTable)

Average (mean) number of sstables a partition belongs to

 

Largest partitions: The top N largest partitions

Column

Description

Key

The partition key

Size

Total uncompressed size of the partition

SSTable Count

Number of sstables that contain the partition

 

SSTable Leaders: The top N partitions that belong to the most sstables

Column

Description

Key

The partition key

SSTable Count

Number of sstables that contain the partition

Size

Total uncompressed size of the partition

 

SSTables: Metadata about sstables as it relates to partitions.

Column

Description

SSTable

Data.db filename of SSTable

Size

Uncompressed size

Min Timestamp

Minimum cell timestamp in the sstable

Max Timestamp

Maximum cell timestamp in the sstable

Level

Leveled Tiered Compaction level of sstable

Partitions

Number of partition keys in the sstable

Avg Partition Size

Average uncompressed partition size in sstable

Max Partition Size

Maximum uncompressed partition size in sstable

 

ic-cfstats

Tool for getting detailed cell statistics that can help identify issues with data model.

Usage

ic-cfstats [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

 

-h

Display help

-b

Batch mode. Uses progress indicator that is friendly for running in batch jobs.

-r <limit>

Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)

-n <num>

Number of partitions to display in leaders lists

-t <name>

Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.

-f <files>

Comma separated list of Data.db sstables to filter on

 

Output

Summary: Summary statistics about partitions

Column

Description

Count (Size)

Number of partition keys on this node

Total (Size)

Total uncompressed size of all partitions on this node

Total (SSTable)

Number of sstables on this node

Minimum (Size)

Minimum uncompressed partition size

Minimum (SSTable)

Minimum number of sstables a partition belongs to

Maximum (Size)

Maximum uncompressed partition size

Maximum (SSTable)

Maximum number of sstables a partition belongs to

Average (Size)

Average (mean) uncompressed partition size

Average (SSTable)

Average (mean) number of sstables a partition belongs to

 

Largest partitions: Partitions with largest uncompressed size

Column

Description

Key

The partition key

Size

Total uncompressed size of the partition

Tombstones

Number of cell or range tombstones

(droppable)

Number of tombstones that can be dropped as per gc_grace_seconds

Cells

Number of cells in the partition

SSTable Count

Number of sstables that contain the partition

 

Widest partitions: Partitions with the most cells

Column

Description

Key

The partition key

Cells

Number of cells in the partition

Tombstones

Number of cell or range tombstones

(droppable)

Number of tombstones that can be dropped as per gc_grace_seconds

Size

Total uncompressed size of the partition

SSTable Count

Number of sstables that contain the partition

 

Tombstone Leaders: Partitions with the most tombstones

Column

Description

Key

The partition key

Tombstones

Number of cell or range tombstones

(droppable)

Number of tombstones that can be dropped as per gc_grace_seconds

Cells

Number of cells in the partition

Size

Total uncompressed size of the partition

SSTable Count

Number of sstables that contain the partition

 

SSTable Leaders: Partitions that are in the most sstables

Column

Description

Key

The partition key

SSTable Count

Number of sstables that contain the partition

Size

Total uncompressed size of the partition

Cells

Number of cells in the partition

Tombstones

Number of cell or range tombstones

(droppable)

Number of tombstones that can be dropped as per gc_grace_seconds

 

SSTables: Metadata about sstables as it relates to partitions.

Column

Description

SSTable

Data.db filename of SSTable

Size

Uncompressed size

Min Timestamp

Minimum cell timestamp in the sstable

Max Timestamp

Maximum cell timestamp in the sstable

Partitions

Number of partitions

(deleted)

Number of row level partition deletions

(avg size)

Average uncompressed partition size in sstable

(max size)

Maximum uncompressed partition size in sstable

Cells

Number of cells in the SSTable

Tombstones

Number of cell or range tombstones in the SSTable

(droppable)

Number of tombstones that are droppable according to gc_grace_seconds

(range)

Number of range tombstones

Cell Liveness

Percentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells.

 

ic-purge

Usage

ic-purge [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>

 

-h

Display help

-b

Batch mode. Uses progress indicator that is friendly for running in batch jobs.

-r <limit>

Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)

-n <num>

Number of partitions to display in leaders lists

-t <name>

Snapshot to analyse. Snapshot is created if none is specified.

 

Output

Largest reclaimable partitions: Partitions with the largest amount of reclaimable data

Column

Description

Key

The partition key

Size

Total uncompressed size of the partition

Reclaim

Reclaimable uncompressed size

Generations

SSTable generations the partition belongs to

 

Last updated:
If you have questions regarding this article, feel free to add it to the comments below.

0 Comments

Article is closed for comments.