Scroll api for a more efficient way to request large data sets

elasticsearchelasticsearch-plugin

I am using Elastic Search data format plugin and my requirement is to request a large data set nearly 1 million records. But whenever I request above 10 000 I get the error as below:

Result window is too large, from + size must be less than or equal to:
[10000] but was [100000]. See the scroll api for a more efficient way
to request large data sets. This limit can be set by changing the
[index.max_result_window] index level setting."

I tried to change my default page allocation as:

http://1.2.3.4:9200/index/_settings -d '{ "index" : { "max_result_window" : 1000000}}'

But things are not working for me. Is there is any other alternate?

I am using Elastic Search 5.4
Data Format plugin as Master

Best Answer

As suggested in the error message, using scroll api is the efficient way to retrieve large data sets. For example,

POST <host_name>:<port_num>/<index_name>/_search?scroll=1m&size=100000

As shown above, the size is mentioned as 100000 and scroll is 1m, this means that the scroll api will retrieve 100000 records per hit and this scroll is available for 1 minute. Also, this api returns a scroll id, which should be used for further retrieval of records. Please find the sample below:

POST <host_name>:<port_num>/_search?scroll=1m&scroll_id=<scroll_id>

Note : Further further scroll api calls, index name need not be mentioned. Only the scroll_id and scroll time is sufficient.

For more information, please refer to the elastic search documentation on scroll api : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

Related Topic