Suppose your clustering keys are
k1 t1, k2 t2, ..., kn tn
where ki is the ith key name and ti is the ith key type. Then the order data is stored in is lexicographic ordering where each dimension is compared using the comparator for that type.
So (a1, a2, ..., an) < (b1, b2, ..., bn) if a1 < b1 using t1 comparator, or a1=b1 and a2 < b2 using t2 comparator, or (a1=b1 and a2=b2) and a3 < b3 using t3 comparator, etc..
This means that it is efficient to find all rows with a certain k1=a, since the data is stored together. But it is inefficient to find all rows with ki=x for i > 1. In fact, such a query isn't allowed - the only clustering key constraints that are allowed specify zero or more clustering keys, starting from the first with none missing.
For example, consider the schema
create table clustering (
x text,
k1 text,
k2 int,
k3 timestamp,
y text,
primary key (x, k1, k2, k3)
);
If you did the following inserts:
insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 1, '2013-09-10 14:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 2, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 14:00+0000', '1');
then they are stored in this order on disk (the order select * from clustering where x = 'x'
returns):
x | k1 | k2 | k3 | y
---+----+----+--------------------------+---
x | a | 1 | 2013-09-10 14:00:00+0000 | 1
x | a | 2 | 2013-09-10 13:00:00+0000 | 1
x | b | 1 | 2013-09-10 13:00:00+0000 | 1
x | b | 1 | 2013-09-10 14:00:00+0000 | 1
k1
ordering dominates, then k2
, then k3
.
CREATE TABLE example ( a int, b int, c int, d int, PRIMARY KEY (a,b,c)) WITH CLUSTERING ORDER BY (b DESC , c ASC ) ;
Is the correct syntax for ordering with multiple columns.
For your particular application you actually are trying to get results from distinctly different types of queries. In Cassandra it is best to shape each table to be the response to a particular query.
For example (not knowing that much about your application)
select * from demo where oid = 100 and cid = 100 and ts > minTimeuuid('something');
select * from demo where oid = 100 and cid = 10;
Might be better served by a table structure like
create table demo_oct(oid int,cid int,ts timeuuid, body, other ...., PRIMARY KEY ((oid,cid),ts)) WITH CLUSTERING ORDER BY (ts DESC);
This way each set of time-series for a pair of oid and cid data will reside in it's own partiton and be easy to retrieve. This is because i'm using a Parition key made up of both oid and cid. This is why there is an extra set of parenthesizes in the key. The clustering key ts ensures that the data is in the order that you want.
But as you noticed, you cannot perform select * from table oid == 10 on this table because that would require scanning over the entire database (because of the partition structure)
For the queries like
select * from demo where oid = 100;
you need a second table(again not knowing your particular application)
create table demo_ot(oid int,cid int,ts timeuuid, body, other ...., PRIMARY KEY (oid,ts)) WITH CLUSTERING ORDER BY (ts DESC);
This table would keep time series for each OID in a single partiton allowing for extremely fast slices. Here the partition key is just OID while ts is still the clustering key.
On the application side you would be inserting to both of these tables simultaneously.
More info on Datamodeling
Best Answer
Your clustering columns define the order (in your case
age
thenscore
)http://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt
http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt