Python csv copy column

csvpython

I have a file containing following

first_name,last_name,uid,email,dep_code,dep_name
john,smith,jsmith,jsmith@gmail.com,finance,21230
john,king,jking,jjing@gmail.com,human resource,31230

I want to copy column "email" and create a new column "email2" and then replace gmail.com from column email2 to hotmail.com

I'm new to python so need help from experts, I tried few script, but if there is a better way to do it then please let me know. The original file contains 60000 rows.

with open('c:\\Python27\\scripts\\colnewfile.csv', 'rb') as fp_in1, open('c:\\Python27\\scripts\\final.csv', 'wb') as fp_out1:
    writer1 = csv.writer(fp_out1, delimiter=",")
    reader1 = csv.reader(fp_in1, delimiter=",")
    domain = "@hotmail.com"
    for row in reader1:
        if row[2:3] == "uid":
            writer1.append("Email2")
        else:
            writer1.writerow(row+[row[2:3]])

Here is the final script, only problem is that it does not complete the entire outfile, it only show 61409 rows, whereas in the input file there are 61438 rows.

inFile = 'c:\Python27\scripts\in-093013.csv'
outFile = 'c:\Python27\scripts\final.csv'

with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1:
writer = csv.writer(fp_out1, delimiter=",")
reader = csv.reader(fp_in1, delimiter=",")
for col in reader:
del col[6:]
writer.writerow(col)
headers = next(reader)
writer.writerow(headers + ['email2'])
for row in reader:
if len(row) > 3:
email = email.split('@', 1)[0] + '@hotmail.com'
writer.writerow(row + [email])

Best Answer

If you call next() on the reader you get one row at at a time; use that to copy over the headers. Copying the email column is easy enough:

import csv

infilename = r'c:\Python27\scripts\colnewfile.csv'
outfilename = r'c:\Python27\scripts\final.csv'

with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
    reader = csv.reader(fp_in, delimiter=",")
    headers = next(reader)  # read first row

    writer = csv.writer(fp_out, delimiter=",")
    writer.writerow(headers + ['email2'])

    for row in reader:
        if len(row) > 3:
            # make sure there are at least 4 columns
            email = row[3].split('@', 1)[0] + '@hotmail.com'
        writer.writerow(row + [email])

This code splits the email address on the first @ sign, takes the first part of the split and adds @hotmail.com after it:

>>> 'example@gmail.com'.split('@', 1)[0]
'example'
>>> 'example@gmail.com'.split('@', 1)[0] + '@hotmail.com'
'example@hotmail.com'

The above produces:

first_name,last_name,uid,email,dep_code,dep_name,email2
john,smith,jsmith,jsmith@gmail.com,finance,21230,jsmith@hotmail.com
john,king,jking,jjing@gmail.com,human resource,31230,jjing@hotmail.com

for your sample input.