The side effect of having Cassandra tables with partition sizes of more than 100MB

cassandra

I am running Apache Cassandra 3.11.1 and have 6 tables sizes in the failing state.

Max partition is larger than 100MB.

For these 6 tables the partition sizes are on average between 200MB and upwards of 5GB.
These 6 tables are split across 3 key spaces and are specific to Akka Persistence eventsByTag (i.e. eventsByTag1, eventsByTag2).

Much of the data in these tables is not used it still needs to be available.

I'm looking at changing the data model however at the same time I'm trying to better understand what the impact is of having large partition sizes.

Other than running out of memory or hitting Cassandra limitations what are some of the other negative impacts of having large partition sizes if most of the data is not accessed?

A specific case that might be related (not confirmed) is that I'm currently running Cassandra with materialized views and elasticsearch. At times the projections used to update elasticsearch with data from Cassandra fail and I'm not yet sure if this is related.

The error message I receive in this case is:

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: 
Cassandra timeout during read query at consistency LOCAL_QUORUM (2 
responses were required but only 1 replica responded)

Best Answer

With this version of Cassandra it should be better than before, although still there could be performance problems accounted to accessing to many SSTables, making selections only on partition key, etc.

This presentation gives good overview of work done to support "wide partitions", although it's still recommended way to re-model data.

Related Topic