Elasticsearch getting too many results, need help filtering query

elasticsearch

I'm having much problem understanding the underlying of ES querying system.

I've got the following query for example:

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "referer": "www.xx.yy.com"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now",
              "lt": "now-1h"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "interval": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "0.5h"
      },
      "aggs": {
        "what": {
          "cardinality": {
            "field": "host"
          }
        }
      }
    }
  }
}

That request get too many results:

"status" : 500, "reason" :
"ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException:
Data too large, data for field [@timestamp] would be larger than limit
of [3200306380/2.9gb]]; nested:
UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException:
Data too large, data for field [@timestamp] would be larger than limit
of [3200306380/2.9gb]]; nested: CircuitBreakingException[Data too
large, data for field [@timestamp] would be larger than limit of
[3200306380/2.9gb]]; "

I've tryied that request:

{
  "size": 0,
  "filter": {
    "and": [
      {
        "term": {
          "referer": "www.geoportail.gouv.fr"
        }
      },
      {
        "range": {
          "@timestamp": {
            "from": "2014-10-04",
            "to": "2014-10-05"
          }
        }
      }
    ]
  },
  "aggs": {
    "interval": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "0.5h"
      },
      "aggs": {
        "what": {
          "cardinality": {
            "field": "host"
          }
        }
      }
    }
  }
}

I would like to filter the data in order to be able to get a correct result, any help would be much appreciated!

Best Answer

I found a solution, it's kind of weird. I've followed dimzak adviced and clear the cache:

curl --noproxy localhost -XPOST "http://localhost:9200/_cache/clear"

Then I used filtering instead of querying as Olly suggested:

{
  "size": 0,
  "query": {
    "filtered": {
      "query":  {
        "term": {
          "referer": "www.xx.yy.fr"
        }
      },
      "filter" : { 
        "range": {
          "@timestamp": { 
            "from": "2014-10-04T00:00", 
            "to": "2014-10-05T00:00"
          }  
        }
      }
    }
  },
  "aggs": {
  "interval": {
    "date_histogram": {
    "field": "@timestamp",
    "interval": "0.5h"
    },
    "aggs": {
    "what": {
      "cardinality": {
      "field": "host"
      }
    }
    }
  }
  }
}

I cannot give you both the ansxwer, I think dimzak deserves it best, but thumbs up to you two guys :)

Related Topic