.Docx documents do not appear to be being indexed.
I used a unique string in a .docx, but the .docx is not returned when I search on "one".
For example here's the following text:
"Here is the text for line one and here is the text for line two."
Will be extracted via the iFilter as:
"Here is the text for line oneand here is the text for line two."
So when the Ifilter parses the .docx he deletes the line break separator and tries to parse "oneand here"… .
So it seems that the Word ifilter for .docx concatenates the last word of a line with the first word of the next line.
Can anyone give some ideas of how to get around this issue?
Thanks in advance.
Best Answer
OK I figured this one out now. Basically the 64 bit IFilter is not working correctly. It merges words that are separated by line breaks and does not carry them through. I used Ionic.zip to access the docx zip archive and parsed the important xml files using a slightly modified version of DocxToText. This works perfectly now.
Here is the modified code originally created by Jevgenij Pankov
Here is the usage of this code...