Cassandra: Query with where clause containing greather- or lesser-than (< and >)

cassandracqlwhere-clause

I'm using Cassandra 1.1.2 I'm trying to convert a RDBMS application to Cassandra. In my RDBMS application I have following table called table1:

| Col1 | Col2 | Col3 | Col4 |

Col1: String (primary key)
Col2: String (primary key)
Col3: Bigint (index)
Col4: Bigint

This table counts over 200 million records. Mostly used query is something like:

Select * from table where col3 < 100 and col3 > 50;

In Cassandra I used following statement to create the table:

create table table1 (primary_key varchar, col1 varchar, 
col2 varchar, col3 bigint, col4 bigint, primary key (primary_key));

create index on table1(col3);

I changed the primary key to an extra column (I calculate the key inside my application).
After importing a few records I tried to execute following cql:

select * from table1 where col3 < 100 and col3 > 50;

This result is:

Bad Request: No indexed columns present in by-columns clause with Equal operator

The Query select col1,col2,col3,col4 from table1 where col3 = 67 works

Google said there is no way to execute that kind of queries. Is that right? Any advice how to create such a query?

Best Answer

Cassandra indexes don't actually support sequential access; see http://www.datastax.com/docs/1.1/ddl/indexes for a good quick explanation of where they are useful. But don't despair; the more classical way of using Cassandra (and many other NoSQL systems) is to denormalize, denormalize, denormalize.

It may be a good idea in your case to use the classic bucket-range pattern, which lets you use the recommended RandomPartitioner and keep your rows well distributed around your cluster, while still allowing sequential access to your values. The idea in this case is that you would make a second dynamic columnfamily mapping (bucketed and ordered) col3 values back to the related primary_key values. As an example, if your col3 values range from 0 to 10^9 and are fairly evenly distributed, you might want to put them in 1000 buckets of range 10^6 each (the best level of granularity will depend on the sort of queries you need, the sort of data you have, query round-trip time, etc). Example schema for cql3:

CREATE TABLE indexotron (
    rangestart int,
    col3val int,
    table1key varchar,
    PRIMARY KEY (rangestart, col3val, table1key)
);

When inserting into table1, you should insert a corresponding row in indexotron, with rangestart = int(col3val / 1000000). Then when you need to enumerate all rows in table1 with col3 > X, you need to query up to 1000 buckets of indexotron, but all the col3vals within will be sorted. Example query to find all table1.primary_key values for which table1.col3 < 4021:

SELECT * FROM indexotron WHERE rangestart = 0 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 1000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 2000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 3000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 4000 AND col3val < 4021 ORDER BY col3val;

Related Solutions

PHP OOP – Missing argument 1

The problem is here:

// Create object without constructor by calling a method
$stefan = new person();                        // <-----
$stefan->set_name("Stefan Mischook");

You're not passing a required parameter to the constructor.

  function __construct($persons_name)
  {
      $this->name = $persons_name;
  }

This (constructor) requires a $persons_name argument to construct a new instance of the person class.

Also (related), your comment // Create object without constructor by calling a method is not at all what the code is doing. You are calling the constructor, and that is the problem. Perhaps this was partially copied from some example, and you missed something?

Check if datasource is up in WebLogic

In order to see a Server/State and Test Data Source action listed under Services -> Data Sources -> <your datasource> -> Monitoring (Tab) -> Testing (Tab), all of the following need to be true:

At least one server targeted by the Data Source needs to be running. If the AdminServer is not targeted, this might not be true - visit Environment -> Servers and check that the targeted server(s) are RUNNING.
The Data Source's Configuration -> Connection Pool -> (Advanced) "Test Connections On Reserve" must be checked/true
You need to have a table-name configured in Test Table Name or an SQL statement e.g. SQL SELECT 1 FROM DUAL.

You should then see the targeted servers listed in the Monitoring/Testing tab.

Best Answer

Related Solutions

PHP OOP – Missing argument 1

Check if datasource is up in WebLogic

Related Topic