Cassandra – What is meant by – “cannot rename non primary key part”

cassandracql

I have created a table users as follows:

create table users (user_id text primary key, email text, first_name text, last_name text, session_token int);

I am referring to the CQL help documentation on the DataStax website.

I now want to rename the email column to "emails". But I when I execute the command –

alter table users rename email to emails;

I am getting the error –
Bad Request: cannot rename non primary key part email

I am using CQL 3 . My CQLSH is 3.1.6 and C* is 1.2.8.

Why cannot I rename the above column? If I run help alter table, it shows the option to rename the column. How do I rename the column?

Best Answer

In CQL, you can rename the column used as the primary key, but not any others. This seems opposite from what it should be, one would think that the primary key would need to stay the same and the others would be easy to change! The reason comes from implementation details.

The name of the primary key is not written into each row, rather it is stored in a different place that's easily changeable. But for non-primary key fields, the names of the fields are written into each row. In order to rename the column, the system would have to rewrite every single row.

This article has some fantastic examples and a much longer discussion of Cassandra's internals.

To borrow an example directly from the article, consider this example column family:

cqlsh:test> CREATE TABLE example (
            ... field1 int PRIMARY KEY,
            ... field2 int,
            ... field3 int);

Insert a little data:

cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);

And then the Cassandra-CLI output (not CQLSH) from querying this column family:

[default@test] list  example;
-------------------
RowKey: 1
  => (column=, value=, timestamp=1374546754299000)
  => (column=field2, value=00000002, timestamp=1374546754299000)
  => (column=field3, value=00000003, timestamp=1374546754299000)

The name of the primary key, "field1" is not stored in any of the rows, but "field2" and "field3" are written out, so changing those names would require rewriting every row.

So if you really still want to rename a non-primary column, there are basically two different strategies and neither of them are very desirable.

  1. Drop the column and add it back, as another poster mentioned. This has the big downside of dropping all the data in that column.

or

  1. Create a new column family that is basically a copy of the old but with the column in question renamed and rewrite your data there. This is, of course, very computationally expensive.
Related Topic