I'm trying to compare 2 different CSV files, mark those differences respectively, then produce it as an output. However, my code seems to be only reading the last part of the lines from sample1.csv and sample2.csv as you can see below:
Sample1.csv
Planet,Account,Name,Station,City
Earth,1234,Pete,Nebula,Phoenix
Earth,1234,Pete,Nebula,Phoenix
Earth,1234,Pete,Nebula,Phoenix
Sample2.csv
Planet,Account,Name,Station,City
Earth,1234,Pete,Nebula,Wakanda
Earth,1234,Pete,Nebula,Montgomery
Earth,1234,Pete,Nebula,Carlo
Current Output
History,Planet,Account,Name,Station,City
Changed,Earth,1234,Pete,Nebula,Carlo
Expected Output
History,Planet,Account,Name,Station,City
Changed,Earth,1234,Pete,Nebula,Wakanda
Changed,Earth,1234,Pete,Nebula,Montgomery
Changed,Earth,1234,Pete,Nebula,Carlo
Here is the code I have:
import csv
with open('old.csv', newline='') as f_old:
csv_old = csv.reader(f_old, delimiter=',')
header = next(csv_old)
old_data = {row[0] : row for row in csv_old}
with open('new.csv', newline='') as f_new:
csv_new = csv.reader(f_new, delimiter=',')
header = next(csv_new)
new_data = {row[0] : row for row in csv_new}
set_new_data = set(new_data)
set_old_data = set(old_data)
added = [['Added'] + new_data[v] for v in set_new_data - set_old_data]
deleted = [['Deleted'] + old_data[v] for v in set_old_data - set_new_data]
in_both = set_old_data & set_new_data
changed = [['Changed'] + new_data[v] for v in in_both if old_data[v] != new_data[v]]
print(changed)
with open('difference.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output, delimiter=',')
csv_output.writerow(['History'] + header)
csv_output.writerows(sorted(added + deleted + changed, key=lambda x: x[1:]))
Does anyone know how to get the expected output? Any help is appreciated Thanks!