Database – How does Google store search trends in backend

big datadatabase

Google trends shows what query has been searched how many times and some other properties of the said query. But how is this data stored in a database?

Storing a new row for every search does not seem right. They also tell the query on a time graph, so they must have some way to look for individual searches made by users, but the number of queries they get every day, it does not feel right that they would store every search in a database row along with a time-stamp.

This does not apply to just Google trends or Google in general but any other big site that gets awful number of queries and then has tools to see them in depth. I am not an expert on this but I am interested to know some high level structure of how things work behind the scenes.

Best Answer

To be able to do the time graph they would need to store each search, or at least the timestamps linked to the search entry. This is likely stored in a distributed sharded database. There are different approaches to how the data could be shared, but that is likely a trade secret along with much of their search engine design.

From their terms of service and the delay in finalizing ad revenues, it appears they do much the same thing every time they display an ad.

EDIT: By sharding the data by query (storing all data for a particular set of queries on one datastore, and those for other queries on different datastores) it is quite easy to scale. Each datastore can have a reasonable size, and can be queried and update quickly. Part of the trick is how to decide which data store gets what queries. This can be adjusted over time. In this case if you loose a datastore, you only loose a vertical slice (set of queries in this case) of data, not the whole database.

Data may be stored in a traditional relational model, in which case the primary key index of the query event table would provide the timestamp for the queries. Depending on their needs it might be possible to use one of the newer noSQL databases, or another non-relational store.

Related Topic