Lucene 4.4. How to get term frequency over all index

I'm trying to compute tf-idf value of each term in a document. So, I iterate through the terms in a document and want to find the frequency of the term in the whole corpus and the number of documents in which the term appears. Following is my code:

//@param index path to index directory
//@param docNbr the document number in the index
public void readingIndex(String index, int docNbr) {
    IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(index)));

    Document doc = reader.document(docNbr);         
    System.out.println("Processing file: "+doc.get("id"));

    Terms termVector = reader.getTermVector(docNbr, "contents");
    TermsEnum itr = termVector.iterator(null);
    BytesRef term = null;

    while ((term = itr.next()) != null) {               
        String termText = term.utf8ToString();                              
        long termFreq = itr.totalTermFreq();   //FIXME: this only return frequency in this doc
        long docCount = itr.docFreq();   //FIXME: docCount = 1 in all cases 

        System.out.println("term: "+termText+", termFreq = "+termFreq+", docCount = "+docCount);   
    }            

    reader.close();     
}

Although the documentation says totalTermFreq() returns the total number of occurrences of this term across all documents, when testing I found it only returns the frequency of the term in the document given by docNbr. and docFreq() always return 1.

How can I get frequency of a term across the whole index?

Update
Of course, I can create a map to map a term to its frequency. Then iterate through each document to count the total number of time a term occur. However, I thought Lucene should have a built in method for that purpose.
Thank you,

String termText = term.utf8ToString(); Term termInstance = new Term("contents", term); long termFreq = reader.totalTermFreq(termInstance); long docCount = reader.docFreq(termInstance); System.out.println("term: "+termText+", termFreq = "+termFreq+", docCount = "+docCount);

Lucene 4.4. How to get term frequency over all index

Best Answer

Related Topic

Best Answer

Related Solutions

Python – How to remove an element from a list by index

Python – How to get the last element of a list

Related Topic