Cassandra `COPY FROM`unable to coerce GMT date string to a formatted date (long)

cassandracqlsh

I have been trying to use COPY FROM to insert into a Cassandra table that has a timestamp type column. However, I encountered the following error:

code=2200 [Invalid query] message="unable to coerce '2015-03-06 18:11:33GMT' to a  formatted date (long)"
Aborting import at record #3. Previously-inserted values still present.
0 rows imported in 0.211 seconds.

The content of the CSV file was actually created with a COPY TO command. My TZ environment variable has been set to GMT.

I did some searching and found a post here that mentioned using Z instead of GMT as the timezone in the data string, i.e. '2015-03-06 18:11:33Z'. If I replace all the GMT in my CSV with Z, COPY FROM worked. Link for the post here:
unable to coerce '2012/11/11' to a formatted date (long)

When I run a SELECT on this table, the datetime column shows up in the format of: 2015-03-06 17:53:23GMT.

Further info, there was a bug about 'Z' timezone but it was fixed. Link: https://issues.apache.org/jira/browse/CASSANDRA-6973

So my question is, is there a way that I can run COPY TO so that it writes Z instead of GMT for time zone?

Alternatively, is there a way I can make COPY FROM work with GMT?

Thanks.

Note: The solution is in the comment from @Aaron for this post. Yes, it's a hack but it works.

Best Answer

I think what is happening here, is that you are getting bit by your time_format property in your ~/.cassandra/cqlshrc file. COPY uses this setting when exporting your timestamp data during a COPY TO. CQLSH uses the Python strftime formats. It is interesting to note that the lowercase %z and uppercase %Z seem to represent your problem.

When I SELECT timestamp data with %Z (upper), it looks like this:

aploetz@cqlsh:stackoverflow> SELECT * FROm posts1;

 userid | posttime               | postcontent  | postid
--------+------------------------+--------------+--------------------------------------
      1 | 2015-01-25 13:25:00CST |    blahblah5 | 13218139-991c-4ddc-a11a-86992f6fed66
      1 | 2015-01-25 13:22:00CST |    blahblah2 | eacdebcc-35c5-45f7-9374-d5fd987e699f
      0 | 2015-03-12 14:10:00CDT |  sdgfjdsgojr | 82766df6-4cca-4ad1-ae59-ba4488103da4
      0 | 2015-03-12 13:56:00CDT | kdsjfsdjflds | bd5c2be8-be66-41da-b9ff-98e9a4836000
      0 | 2015-03-12 09:10:00CDT |  sdgfjdsgojr | 6865216f-fc4d-431c-8067-c27cf20b6be7

When I try to INSERT a record using that date format, it fails:

aploetz@cqlsh:stackoverflow> INSERT INTO posts1 (userid,posttime,postcontent,postid) VALUES (0,'2015-03-12 14:27CST','sdgfjdsgojr',uuid());
code=2200 [Invalid query] message="unable to coerce '2015-03-12 14:27CST' to a  formatted date (long)"

But when I alter time_format to use the (lowercase) %z the same query produces this:

aploetz@cqlsh:stackoverflow> SELECT * FROm posts1;

 userid | posttime                 | postcontent  | postid
--------+--------------------------+--------------+--------------------------------------
      1 | 2015-01-25 13:25:00-0600 |    blahblah5 | 13218139-991c-4ddc-a11a-86992f6fed66
      1 | 2015-01-25 13:22:00-0600 |    blahblah2 | eacdebcc-35c5-45f7-9374-d5fd987e699f
      0 | 2015-03-12 14:10:00-0500 |  sdgfjdsgojr | 82766df6-4cca-4ad1-ae59-ba4488103da4
      0 | 2015-03-12 13:56:00-0500 | kdsjfsdjflds | bd5c2be8-be66-41da-b9ff-98e9a4836000
      0 | 2015-03-12 09:10:00-0500 |  sdgfjdsgojr | 6865216f-fc4d-431c-8067-c27cf20b6be7

I can also INSERT data in this format:

INSERT INTO posts1 (userid,posttime,postcontent,postid) 
VALUES (0,'2015-03-12 14:27-0500','sdgfjdsgojr',uuid());

It also appears in this way when I run a COPY TO, and a COPY FROM of the same data/file also works.

In summary, check your ~/.cassandra/cqlshrc and make sure that you are either using the default setting, or this setting in the [ui] section:

[ui]
time_format = %Y-%m-%d %H:%M:%S%z

It won't get you the 'Z' like you asked for, but it will allow you to COPY TO/FROM your data without having to muck with the CSV file.

Edit

For those of you poor souls out there using CQLSH (or Cassandra, God help you) on Windows, the default location of the cqlshrc file is c:\Users\%USERNAME%\.cassandra\cqlshrc.

Edit - 20150903

Inspired by this question, I submitted a patch (CASSANDRA-8970) to allow users to specify a custom time format with COPY, and it was marked as "Ready To Commit" yesterday. Basically, this patch will allow this problem to be solved by doing the following:

COPY posts1 TO '/home/aploetz/posts1.csv' WITH DELIMITER='|' AND HEADER=true 
    AND TIME_FORMAT='%Y-%m-%d %H:%M:%SZ;

Edit - 20161010

The COPY command was improved in Cassandra 2.2.5, and the TIMEFORMAT option has been renamed to DATETIMEFORMAT.

From New options and better performance in cqlsh copy:

DATETIMEFORMAT, which used to be called TIMEFORMAT, a string containing the Python strftime format for date and time values, such as ā€˜%Y-%m-%d %H:%M:%S%zā€™. It defaults to the time_format value in cqlshrc.