It'll work until someone upstream upgrades their CSV writing library and your pr...

nightski · on Aug 18, 2021

So you can constrain what type of CSV you will allow and if this happens it will bail. It's that simple. There is nothing wrong with having additional constraints on top of just saying it must be "CSV" especially in these scenarios.

I'm in a similar situation, we've been using CSV for over a decade to move billions of dollars worth of product each year. It just works.

turtlebits · on Aug 18, 2021

I'm pretty sure most devs are going to use whatever CSV library that comes with their language. When that breaks, it's generally not a simple fix.

kcartlidge · on Aug 18, 2021

> I'm pretty sure most devs are going to use whatever CSV library that comes with their language. When that breaks, it's generally not a simple fix.

Call me a yak-shaver, but in every language I've worked with I've written my own csv parsing library when I needed one.

It's such a trivial thing for the majority cases (varying of delimiters, line-endings, ascii/uft8, quoting, escaping, and embedding of delimiters/line-endings) that it takes barely no time at all after you've done it once in another language. Of course there are edge cases and special cases depending upon specific workloads, but if your team has it's own parser (which is a small amount of obvious code) then it does indeed usually become a simple fix.

Sounds good using someone else's library, but below a certain complexity it's rarely worth it in the medium to long term except for when doing proof of concept or demo code, or if the domain space is complex.

magicalhippo · on Aug 18, 2021

We've got at least a few dozen customer integrations that parse CSV-ish files, and they all have a custom parser. Many of these have been chugging for over a decade, sending "mission critical" data back and forth.

It's dead simple to whip up, and we can easily tweak it to whatever the customers software spits out, like one field suddenly being UTF-8 encoded in an otherwise Windows-1252 file.

nojito · on Aug 18, 2021

That’s a good thing.

We validate on ingestion and if there are changes upstream we can immediately triage without polluting our data warehouse.

proverbialbunny · on Aug 18, 2021

The same argument could be made for all other data formats.

CSV is like the C of data formats. It's incredibly stable yet simple enough you can make your own variant if you need to.