31 октября 2017 г. 14:19
Написал soar

6-years old issue in Python CSV module

Let's imagine that you want to read CSV file, change it and write it back. Here is simple code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv


csv_options = {
    'delimiter': ',',
    'doublequote': False,
    'escapechar': '\\',
    'lineterminator': '\n',
    'quotechar': '"',
    'quoting': csv.QUOTE_ALL,
}

with open('infile.csv', 'r') as infile:
    with open('outfile.csv', 'w') as outfile:
        incsv = csv.reader(infile, **csv_options)
        outcsv = csv.writer(outfile, **csv_options)

        for row in incsv:
            print("Row is: {}".format(row))
            for value in row:
                print("Value is: {}".format(value))
            outcsv.writerow(row)

What can go wrong with it? Let's run this code:

[email protected]$ cat infile.csv && ./test.py && cat outfile.csv
value1,"value 2","value with \" (escaped quote)","value with escaped escape char 3\\"

Row is: ['value1', 'value 2', 'value with " (escaped quote)', 'value with escaped escape char 3\\']
Value is: value1
Value is: value 2
Value is: value with " (escaped quote)
Value is: value with escaped escape char 3\

"value1","value 2","value with \" (escaped quote)","value with escaped escape char 3\"

As you can see - escaped quote character (\") is escaped properly in output. But escaped escape character (\\) become escaped quote character (\") and it makes CSV file invalid, so you even can't process it once more with same script.

This is bug in Python CSV module and it present in fresh Python 2.7 and up to 3.6. And this bug is 6-years old, you can find bug report here.

What you can do? Simpliest workaround is to replace any occurencies of \\ before writing back:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv


csv_options = {
    'delimiter': ',',
    'doublequote': False,
    'escapechar': '\\',
    'lineterminator': '\n',
    'quotechar': '"',
    'quoting': csv.QUOTE_ALL,
}

with open('infile.csv', 'r') as infile:
    with open('outfile.csv', 'w') as outfile:
        incsv = csv.reader(infile, **csv_options)
        outcsv = csv.writer(outfile, **csv_options)

        for row in incsv:
            print("Row is: {}".format(row))
            for idx, value in enumerate(row):
                print("Value is: {}".format(value))
                row[idx] = value.replace('\\', '\\\\')
            outcsv.writerow(row)

And try it:

[email protected]$ cat infile.csv && ./test.py && cat outfile.csv
value1,"value 2","value with \" (escaped quote)","value with escaped escape char 3\\"

Row is: ['value1', 'value 2', 'value with " (escaped quote)', 'value with escaped escape char 3\\']
Value is: value1
Value is: value 2
Value is: value with " (escaped quote)
Value is: value with escaped escape char 3\

"value1","value 2","value with \" (escaped quote)","value with escaped escape char 3\\"

Комментарии