Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Partly because a bad workman chooses poor tools to begin with. Tools do matter. It's easier to find a mistake in say, an SQL query or some R code than it is to find a mistake in an Excel spreadsheet, where you are trying to catch the difference between SUM(A3:A12) and SUM(A3:A10) in a thousand different cells.


> where you are trying to catch the difference between SUM(A3:A12) and SUM(A3:A10) in a thousand different cells.

Excel does a pretty good job in highlighting which cells are selected when you edit a formula, and it does a pretty good job of maintaining the meaning of the formula under sheet transformations. For example, if you inserted a row between rows 8 and 9 then the two formulae would be =SUM(A3:A13) and =SUM(A3:A11) respectively.

This may have been a genuine error, but the two researchers definitely started the process with a goal in mind, and when the results agreed with their goals they didn't bother to check.


This may have been a genuine error, but the two researchers definitely started the process with a goal in mind...

What would that goal be? According to Megan Mcardle, Rogoff was a mild proponent of stimulus. For example, he said this in 2012:

"Back in 2008-9, there was a reasonable chance, maybe 20% that we’d end up in another Great Depression. Spending a trillion dollars is nothing to knock that off the table."

http://www.thedailybeast.com/articles/2013/04/17/did-reinhar...


Yes, it does highlight the cells. But if a mistake does somehow make it in, it is still exceptionally tedious to find the error. It is easy enough to expand a formula for additional cells but accidentally miss a cell, leaving it with the old formula. If you were using SQL, R, or Python, this class of error would never happen.


I guess it is just that to do a proper audit you need to have your logic on one page. Otherwise it is extreemly hard to follow the code. In excel every formula is sitting essentially on it's own page. It is easy to code this way but hard to audit.


Recent versions of excel do warn you if formulas omit adjacent cells. I agree with your sentiment to some extent but with excel 2011/2013 you have to actively suppress these warnings.


I'm not sure I understand why it is easier with SQL than Excel. Both seem to offer similar roadblocks and both seem to offer similar solutions.

I think we're thinking there is a technological solution to a human problem, and personally I don't think there is or could ever be.

This is a process problem. And by process I mean the process of building complex datasets, validating them for "correctness," and judging the quality/correctness of different data sets.

Unit tests are a massive asset to programmers. I wonder if unit tests (i.e. "sanity checks") would also help in this situation? I mean you would have to force people to write them and monitor them, but once they've been created they pay for themselves by picking up unexpected errors.


One of the big problems with science is the proliferation of Excel "databases". It is very easy to look at a lot of numbers and make quick calculations with them. However, when you start getting into extremely large datasets, your propensity to make mistakes increases. A2:A10 here, B3:B11 there, etc... This is one reason why recent versions of Excel warn you when your formulas aren't in sync.

However, all of this is fixed when you are using SQL to properly query a database. Why? Because you are forced to write a SQL statement that details exactly what you want done. With Excel it can all be hidden away behind the cells. With SQL, it's out in front, so it's easier to check.

People like to use Excel because they can get an answer quickly without all that "programming". The problems start to arise when you need better tools, but only know Excel. So, in this case, it's not a matter of a craftsman blaming their tools, it's closer to an amateur trying to pretend to be a professional.

Excel is a wonderful spreadsheet. It is a horrible database.


Unless you write SQL queries that use stored procedures or pre-calculated results tables, and then you just wind up in the same situation as Excel.

I mean Excel is at its heart a query language. So your logic equally applies to Excel, why not write a massive query in Excel that does all the calculations in one go so you can see the inner workings?

All you're really doing is playing musical chairs with the data. SQL query language for Excel query language, and data moved from tables to worksheets.

As I said earlier, unless there are procedural changes upstream nothing will change. A tool is just a tool. You can use it in a way to minimise mistakes, or not. Humans are the weak point.

Now there are tools that automatically identify common mistakes but neither Excel or SQL/relational database engines are in that class.


> I mean Excel is at its heart a query language. So your logic equally applies to Excel

The least readable query language ever. Instead of names, everything must be addressed as column,row. Imagine a C program that instead of declaring variables with sane names as needed, simply created a large array of each type and used constant numeric indecies. That's what Excel requires.


Excel cells have been nameable since forever... It also has supported types since forever. I'm literally talking Excel 2000 or older functionality[1][2].

With respect do you even know how to use Excel? Because everything you just said is incorrect.

[1] http://www.computerhope.com/issues/ch000704.htm

[2] http://support.microsoft.com/kb/274504


Anecdotally I have to strongly disagree here.

Only one of the tens of heavy users of Excel I have worked with know of this feature on others like it. I am not a heavy use of Excel but my theory is that this is due to a failure of explanation by the software. Excel does not by its design encourage good coding, and most of the users of Excel do not have any programming background so they do not realize that these features should even be there so they wont know to look it up in the manual.


And it's easy to write horrifically bad, but working, code in C or C++. How exactly do you propose Excel butt into your workflow to expose advanced features?

If you plan to use Excel to aid you in making serious money (as countless business do), can't you fork out a thousand extra for an employee who actually knows how to use Excel properly?


Naming a single cell doesn't solve the problem of accidentally using a different range of values. In a normal programming language you would define a variable such as GDP containing a list of values for the countries of interest, in Excel every formula has to restate the range, which is what caused the mistake.


That's not the case.

You can have named ranges in Excel - in fact, you can have named ranges which are dynamically calculated based on a formula.


Tell me which is more likely to be a sum of profits for years since 2013. Excel:

    SUM(A100:A112)
SQL:

    SELECT SUM(yp.profit) 
          FROM yearly_profitability AS yp
          WHERE yp.year > '2012-12-31'::Datetime
(I'm forgetting how to do SQL datetimes, but whatever.)

SQL and other programming languages use words to describe the variables. Treating SQL like Excel would be DailyWTF material:

    CREATE TABLE sheet1(
      a  INTEGER,
      b  TIMESTAMP WITH TIME ZONE
    )


I hate it when people generate these contrived examples where they intentionally stack the scenario to favour whichever way proves their point the most.

Which is easier to understand:

=SUM(profits01_2012:profits12_2012)

Or:

SELECT * FROM dassaddasads WHERE date > 1325394000 AND date < 1356930000

Which is easier to understand?!


I have seen literally hundreds of Excel spreadsheets from banks, consultancies, and VC firms and I have never, ever seen anyone name individual cells the way you are proposing here. If I had to name individual cells in a monthly model spanning the next 10 years over 250 line items, I'd quit that job.

I'm not saying that Excel can't do it, but a) it sounds like it would take a huge amount of time, so b) it is not practically done in "the real world."


If I had to name individual cells in a monthly model spanning the next 10 years over 250 line items, I'd quit that job.

Why would you quit the job over a few minutes of scripting with "WorkSheet.Names.Add"?

http://msdn.microsoft.com/en-us/library/office/ff835300.aspx


This ignores the common use case. yummyfajitas' examples were exceedingly stereotypical of both excel and mysql usage.


They aren't stereotypical of the kind of Excel that gets written by people who use it as a major part of their job. They're stereotypical of people writing poor Excel queries against unnamed cells via location reference.


Is that first one valid Excel? I've never seen it before.


Yes, it is valid Excel. Naming cells is something they teach in beginners courses.

http://www.computerhope.com/issues/ch000704.htm

http://support.microsoft.com/kb/274504




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: