We get files from a unix system that are delimited with linefeed only, this is not a problem. The problem is that within some of the fields themselves, there are carraige controls ("\r\n"). Reading the file using OS will see a row like this and stop reading at the crlf.
I am trying to figure out how to A) Read to the end of the record regardless of crlf's, and B) if I do encounter a field that has one, remove it, and C) write that row back out in a proper format windows (and SQL) can use by replacing the newlines with CRLF's.
This is the code I'm using:
import csv
class MyDialect(csv.excel):
lineterminator = "\n"
csv.register_dialect("myDialect", MyDialect)
cr = csv.reader(open("data.csv","rb"), dialect = "myDialect")
cw = csv.writer(open("clean_data.csv", "wb"))
crlf = '\r\n'
for row in cr:
for col in row:
if crlf in col:
#col.replace("\r\n", "") <-- didn't work
col = col.rstrip()
cw.writerow(row)
print "Finished"
I tried (delimiter = '\n') without any luck either. Is there any way to get Python to ignore CRLF's all together?