Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

idk im pretty good at exporting to CSV.


Dunno why you're getting hammered, this is part of a good Excel workflow. Do what you can using the nice GUI, if you have bulk changes to make that are easier to script than find-and-replace then dump it to CSV, do the grunt work with some command line tools or a script, and re-import it.


While this is a fine workflow if you're more familiar with command line tools than Excel, bulk changes are usually easily doable in Excel with judicious use of cell formulas and copy-pasting of values.

On a lark, I did a couple dozen Euler problems in Excel a couple years ago and found it was actually quicker and easier than doing it in a 'real' programming language; at least part of that is that loops and recursions are easy to visualize and debug in Excel because you can just break intermediate calculations into cells.


I have never seen anyone with an Excel workflow that involves exporting to CSV that does not immediately mojibake every non-ASCII character.

Yeah, I know there's a well-hidden option to export in tab-separated UTF-16. Nobody uses it.

So I guess you're fine if your data is just numbers, and if it's not labeled with words, names, or places.


Excel also messes with CSV values that look like integers but aren't (think serial numbers or IDs), leading to many frustrating days of support.


Not to mention the famous mangling of gene names that look like dates. Yes, I know scientists shouldn't be using Excel -- I agree. But I'm a computational biologist. Experimentalists know how to use exactly three programs 1) Microsoft Word 2) Microsoft Excel 3) Microsoft Powerpoint.


I find myself working with four letter codes to identify wards and facilities at my work, inevitably I have to compare my SQL output to someone's spreadsheet. No excel, I didn't mean March 02, just give me MAR2 like I asked. Drug codes that start with a '.' are another headache, they will be auto formatted as a decimal number. Worse is when they want to put info from a spreadsheet in to the database, no one understands the need for consistent field formatting :'(


I feel your pain. We have a lot of ID values with leading zeroes, which Excel loves to strip, and data with numbers in parentheses, which Excel's CSV loader thinks should be negative numbers (this is apparently common in book-keeping).

I keep getting sent spreadsheets which were made from CSV data by people who didn't know how to set the columns to text - to be fair, if the file extension is .csv then Excel won't even let them set the column formats - and having this latter category turn up as -ve numbers not only looks weird but ruins the sorting. Even 1, 11, 2 is better than sorting "backwards".


Have you tried bringing in with PowerQuery instead? Much more flexible.


I haven't tried pulling Excel -> SQL with PowerQuery yet, only really dabbled in presenting data the other way. Do you know if this works easily with a document that utilises a lot of macros and VBA? Or are we still talking flat CSV.


What are you talking, Excel with VBA is importing/consuming the data, or it is the datasource for something else?

I assume Excel with VBA is importing/consuming the data in which case Power Query / Power Pivot would likely solve most of your problems. Refactoring a large complicated existing spreadsheet to use PQ rather than your existing peocess, hard to say how tough that would be.


i was reading a paper about how Excel gets some standard statistical tests wrong too - the US Stats Association i think did a compare/contrast with a range of packages. some of these bugs have been there since 2002 or so. i lived in Excel for many years but wasn't a power user - i got to like it but sadly, it got used for everything even when it really, really shouldn't have.


I read an article on that, but it was clearly mentioned that this was an issue with older versions of excel (excel 03 07? iirc).


this link seems quality and you're right - seems like the excel team took it seriously in... 2010. that's still pretty bad!

http://www.practicalstats.com/xlsstats/excelstats.html


There is also a pretty gnarly bug in excel when it tries to import a CSV with the asci string "ID" as the first two characters.

http://superuser.com/questions/210027/why-does-excel-think-c...

Very Frustrating


Hacker news discussion on that bug:

https://news.ycombinator.com/item?id=12041210


Oh this was fun.

Excel has issues with numerical figures greater than 13 digits, and can't distinguish from a serial number in a csv or an integer.

Importing a small (30k rows of so) set of of tax information thus became hell.

Worst part is that I didn't find it till halfway through the migration. I love excel on average, but that was just a painful fight.


Often, you can strip the BOM (which isn't supposed to be there to begin with), convert from excel's proprietary character set to UTF-8, and translate the line endings and things may work - if it hasn't decided to randomly change things that weren't dates into dates, strip leading 0's from your zip codes, reformat things that you didn't want reformatted, and otherwise mangle your data. (Nevermind that Excel can't handle dates properly anyway.)

What's worse is that many people play with excel and use things like coloring, fonts, and visual layout to distinguish things, and/or combine multiple unrelated chunks into one sheet so that they won't have to switch tabs (which of course makes for a scrambled mess of data). I try to suppress the memories of it taking 36 tries to get a usable CSV from business people (after having given clear instructions in advance and doing postprocessing afterward to clean it up).

Excel should never be used for working with data. But it is, so that's job security if you can clean up the messes that it makes.


> So I guess you're fine if your data is just numbers, and if it's not labeled with words, names, or places.

Whoops, busted. I very rarely deal with localised data so I can get away with ASCII. Excellent point, though, my case doesn't generalise as well as I thought it did.


I use LibreOffice just for this purpose. ie Creating csv files from xls files. Open Excel file in LibreOffice Save as text >> UTF-8 >> quote all text cells. Works perfectly every time, Excel is brilliant for most things but is useless with csv, all kinds of strange bugs will arise.


Exporting from and importing to Excel from CSV becomes a life-saver sometimes. Once upon a time, I had to work on a 200k by 50 line Excel 2007 spreadsheet on a IBM Core Duo laptop with only 1GB of RAM and XGA scren. It would take ten minutes to apply a single filter. Writing VBA to export and then letting Cygwin do my work: curling regexing and sorting was the only option.


> 200k by 50 line Excel 2007 spreadsheet on a IBM Core Duo laptop with only 1GB of RAM and XGA scren. It would take ten minutes to apply a single filter.

This is implausible. I worked with far bigger datasets, and far more complex calculations, and it didn't take minutes - several seconds, maybe.


I'm talking about a filter, not calculations. And I'm so glad you mystically decided that your calculations were much more complex than whatever I was doing.


"1 GB RAM"


That's the key point here. We recently upgraded all computers to at least 16GB ram solely because of huge Excel spreadsheets that we use.


Actually, one of the safest option I have found to import/export from Excel without the encoding going wild is to just copy/paste TSV data, and treat it as UTF-8 in your programming language of choice.


Spreadsheet programs are great for data entry if we take the care before hand to design the data structures well so that they easily export and can be easily analyzed by a computer. Easy stuff like small org budgets can be done with spreadsheets, but masse data analysis simply requires industrial grade tools.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: