Let’s imagine that you want to read CSV file, change it and write it back. Here is simple code:
1#!/usr/bin/env python
2# -*- coding: utf-8 -*-
3
4import csv
5
6
7csv_options = {
8 'delimiter': ',',
9 'doublequote': False,
10 'escapechar': '\\',
11 'lineterminator': '\n',
12 'quotechar': '"',
13 'quoting': csv.QUOTE_ALL,
14}
15
16with open('infile.csv', 'r') as infile:
17 with open('outfile.csv', 'w') as outfile:
18 incsv = csv.reader(infile, **csv_options)
19 outcsv = csv.writer(outfile, **csv_options)
20
21 for row in incsv:
22 print("Row is: {}".format(row))
23 for value in row:
24 print("Value is: {}".format(value))
25 outcsv.writerow(row)
What can go wrong with it? Let’s run this code:
1soar@localhost$ cat infile.csv && ./test.py && cat outfile.csv
2value1,"value 2","value with \" (escaped quote)","value with escaped escape char 3\\"
3
4Row is: ['value1', 'value 2', 'value with " (escaped quote)', 'value with escaped escape char 3\\']
5Value is: value1
6Value is: value 2
7Value is: value with " (escaped quote)
8Value is: value with escaped escape char 3\
9
10"value1","value 2","value with \" (escaped quote)","value with escaped escape char 3\"
As you can see - escaped quote character (\"
) is escaped properly in output. But escaped escape character (\\
) become escaped quote character (\"
) and it makes CSV file invalid, so you even can’t process it once more with same script.
This is bug in Python CSV module and it present in fresh Python 2.7 and up to 3.6. And this bug is 6-years old, you can find bug report here.
What you can do? Simpliest workaround is to replace any occurencies of \\
before writing back:
1#!/usr/bin/env python
2# -*- coding: utf-8 -*-
3
4import csv
5
6
7csv_options = {
8 'delimiter': ',',
9 'doublequote': False,
10 'escapechar': '\\',
11 'lineterminator': '\n',
12 'quotechar': '"',
13 'quoting': csv.QUOTE_ALL,
14}
15
16with open('infile.csv', 'r') as infile:
17 with open('outfile.csv', 'w') as outfile:
18 incsv = csv.reader(infile, **csv_options)
19 outcsv = csv.writer(outfile, **csv_options)
20
21 for row in incsv:
22 print("Row is: {}".format(row))
23 for idx, value in enumerate(row):
24 print("Value is: {}".format(value))
25 row[idx] = value.replace('\\', '\\\\')
26 outcsv.writerow(row)
And try it:
1soar@localhost$ cat infile.csv && ./test.py && cat outfile.csv
2value1,"value 2","value with \" (escaped quote)","value with escaped escape char 3\\"
3
4Row is: ['value1', 'value 2', 'value with " (escaped quote)', 'value with escaped escape char 3\\']
5Value is: value1
6Value is: value 2
7Value is: value with " (escaped quote)
8Value is: value with escaped escape char 3\
9
10"value1","value 2","value with \" (escaped quote)","value with escaped escape char 3\\"