Python – Get random key:value pairs from dictionary in python


I'm trying to pull out a random set of key-value pairs from a dictionary I made from a csv file. The dictionary contains information for genes, with the gene name being the dictionary key, and a list of numbers (related to gene expression etc.) being the value.

# python 2.7.5
import csv
import random

genes_csv = csv.reader(open('genes.csv', 'rb'))

genes_dict = {}
for row in genes_csv:
    genes_dict[row[0]] = row[1:]

length = raw_input('How many genes do you want? ')

for key in genes_dict:
    random_list = random.sample(genes_dict.items(), int(length))
    print random_list

The problem is, if I try to get a list of 100 genes (for example), it seems to iterate over the whole dictionary and return every possible combination of 100 genes.

Best Answer

If you want to get random K elements from dictionary D you simply use

import random
random.sample( D.items(), K )

and that's all you need.

From the Python's documentation:

random.sample(population, k)

Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement.

In your case

import csv
import random

genes_csv = csv.reader(open('genes.csv', 'rb'))

genes_dict = {}
for row in genes_csv:
    genes_dict[row[0]] = row[1:]

length = raw_input('How many genes do you want? ')
random_list = random.sample( genes_dict.items(), int(length) )
print random_list

There is no need to iterate through all the keys of the dictionary

for key in genes_dict:
    random_list = random.sample(genes_dict.items(), int(length))
    print random_list

notice, that you are actualy not using the key variable inside your loop, which should warn you that something may be wrong here. Although it is not true that it " return every possible combination of 100 genes.", it simply returns N random k element genes lists (in your case 100), where N is the size of the dictionary, which is far from being "all combinations" (which is N!/(N-k)!k!)