Google-apps-script – How to know when the contents of a PDF in Google Drive are indexed for searching

google-apps-scriptgoogle-drivegoogle-drive-search

I wrote a nice routine that allows me to get file Ids of 200 PDF files (uploaded some minutes ago onto my Google Drive) and setValue of a Google Sheet with those Ids.

In order to get the correct file Ids, it searches the content of the PDF.

The issue is that some days the content is searchable 5 minutes after uploading the 200 PDFs, and other days it takes up to 7 hours.

Is there a way to track the indexing status?

Is there a way to force a specific folder on Drive to be indexed with priority?

Best Answer

Is there a way to track the indexing status?

To know if a file is indexed look at the contentHints.indexableText property by using the get method from Google Drive REST API.

Is there a way to force a specific folder on Drive to be indexed with priority?

AFAIK Google Drive doesn't include any way for end users to force a specific folder to be indexed with priority.

It's worth to say that on contrary of other filesystems, in Google Drive folders are not related to the physical location of files on Google's servers, they are just a way to help users to organize files in a way that is comprehensible by most of them.

References

Quotes

From https://developers.google.com/drive/v3/web/file#saving_indexable_text

Saving indexable text

Drive automatically indexes documents for search when it recognizes the file type. This includes text documents, PDFs, images with text, and other common types. If your app saves other types of files (drawings, video, shortcuts), you can improve the discoverability by supplying indexable text.

Indexable text is indexed as HTML. If you save the indexable text string Here's some text, then "Here's some text" is indexed, but "value1" is not. Because of this, saving XML as indexable text isn't as useful as saving HTML. Also keep in mind:

The size limit for contentHints.indexableText is 128K. Don't try to sort text in order of importance; the indexer does that very efficiently for you. Indexable text should be updated by your application with each save. Make sure the text is related to the content of the file. This last point may seem obvious, but it is quite important. It's not a good idea to add commonly searched terms to try and force a file to show up in search results. This can frustrate users, and might even motivate them to delete the file.