Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think this "machine learning for hackers" approach is just not enough. Oftentimes, you do need a solid theoretical/mathematical background. Most people seems to approach ML like they approach programming tools or libraries - learn just enough to get job done and move on.

I was studying machine learning from Andrew Ng's CS229 (the class videos are online. I think they date from 2008 or hereabout). There is no way you can progress beyond lecture 2 (out of 20) without a solid probability background. A solid background in probability/statistics probably means a good first course in Probability or maybe the first five chapters of "Statistical Inference" by Cassias and Berger. Similarly, for SVM, you need a solid background in Linear Algebra and so on. You probably also need a background Linear Optimization. Here are the recommendations by Prof. Michael Jordan https://news.ycombinator.com/item?id=1055389

Not a lot of people want to dive in this much. They have got things to do and who cares about proofs anyway. The thinking goes like "Most of the mathematics is abstracted away by libraries like scikit-learn. Let's get shit done.". Well, I think a lot of competitive advantage of Google/Facebook in ML is because they have staffed their engineering with people who have studied these things for years (by PhD). Compare that to flipkart's recommendations.

However, I don't think this problem is unique to ML/Data Science. It is equally bad in "Distributed systems". Let's use Docker, that's the future!



I understand where you're coming from and also agree in principle, but I'd change the claim that "this approach is just not enough" with "this approach is just not enough for achieving many things in machine learning including breaking new ground". I think there's always a way to be creative within the constraints and concepts/axioms you take as given. For example, the fact that I have absolutely no control (or knowledge for that matter) on how to design or improve the microprocessor on my computer, I don't feel this is limiting my creativity in software development at all. Once changes and improvements occur in the hardware level, I'm sure they will find a way to the software development layer and then I'll be handed even more degrees of freedom to be creative (though I'm not complaining with the freedom I currently have). Don't you think the same might apply to machine learning - i.e those with solid theoretical/mathematical background are analogous to the chip designers and the "machine language hackers" are the software developers?


I think in both cases -- ML and software in general -- the basics open 80-90% of the field up, and that's enough for most people.

But I'd never do safety critical embedded devices without understanding the physical processor.

And there are similar limits for ML.


I don't think Andrew Ng would agree with your assertion. His Coursera ML class assumes little more than a basic high school math education, and at the start of the course, he teaches the very small subset of linear algebra required to understand his course materials.

I think what Andrew Ng would say is that without a rigorous statistical background, you will be limited in your ability to use ML, and you will certainly be more liable to blow your foot off by using it improperly. That being said, in a subset of cases, you may be able to achieve non-trivial insights through the techniques he teaches in the course.

So how I would rephrase your assertion is that a hacker can probably get a lot more out of ML techniques if they are willing to learn the math underlying them.


> And at the start of the course, he teaches the very small subset of linear algebra required to understand his course materials.

I tried doing the ML course without any prior knowledge of Linear Algebra and dropped out after the first three weeks. In hindsight, I realized it wouldn't have been possible to appreciate how PCA works without understanding eigenvectors, how collaborative filtering is an elegant application of matrix factorization and so on.

But after I completed Strang's Linear Algebra course, the entire ML class was a breeze.


Would you recommend that Linear Algebra course?


Sure! The course has a slow start. The first few lectures focus on the mechanics of matrix operations instead of starting with linear transformations, and compositions of linear transformation - the core ideas behind Linear Algebra. But after lecture 10, the course picks up pace.


I've been interviwing for ML positions and what struck me was the general disdain for details. I had one manager claim that they're set to beat their competitors now because they're moving to GOOG's new tensorflow. Others knew little more than Tensoflow and Backprop.

Frankly I regret spending time understanding all the math, instead of working for some company, munging through their data and applying some black box stuff. The hype is quite bad IMO.


Totally agree, I've got a solid applied calculus background from my electrical engineering undergrad degree and some DSP in my first job, but avoided learning statistics and probability because I thought it was intuitive. But after doing a MOOC course on Machine Learning, I realise that statistics and probability is more complex than anticipated.

If you don't work on your fundamentals, you end up simply memorising the algorithms and basic applications. Much like the neurons discussed for deep learning, we need to create rich relationships between our own to ensure we retain and can apply the knowledge in the long term, which starts with laying a foundation of fundamentals.

For brushing up on those fundamentals, I recommend Bertsekas & Tsitsiklis : Introduction to Probability. Theory supported with lots of examples, as well as comparing the theory to "intuition" and why it is much more effective to apply the theory.


> I think a lot of competitive advantage of Google/Facebook in ML is because they have staffed their engineering with people who have studied these things for years (by PhD).

This is not entirely true. Most of their advantage comes from the corpus of data. Of course, I'm not discounting the fact that they're pioneers in the field, but at this stage data is their competitive advantage (hence they open sourced Tensorflow.)

I feel the present state of ML libraries or even distributed system libraries not being a black-box solution and not the "just works" type is a growing pain, and will be evolve into something more accessible/robust in the future. The whole point of it being a "layer of abstraction" is that you don't need to know the details.


An alternate theory is that you can make use of ML to do useful tasks through an understanding of just high school math (basic algebra) and basics of python. I'm not sure if that's actually true, but I'm inclined to be in this camp as abstractions are used in virtually every other tasks. The amount of extra value that can be had from applying even basic ML techniques is so great, that there is probably a lot of upside to using ML even if they're only able to hire practitioners.

A great resource specifically tailored to those that don't have a especially strong grasp of probability and statistics is Grokking Deep Learning

https://iamtrask.github.io/2016/08/17/grokking-deep-learning...


This is backward thinking. It borders on elitist, although I know it's not meant that way.

Developers everywhere use Paxos without even knowing it, much less having read Lamport's papers, because they're building on top of solid tools that use Paxos (or Raft or what have you). This is more true at Google and Facebook than anywhere.

Same goes for ML. You can study the theory, and you can learn to apply it. In the field's nascency you basically need to understand the theory in order to apply anything, but eventually robust tools are built upon which developers can build systems without having "studied these things for years (by PhD)".


> eventually robust tools are built upon which developers can build systems without having "studied these things for years (by PhD)".

For ML, I don't think we are at the eventually point just yet.


Let's put it this way. A startup that insists every one of its developers touching ML has a sound basis in fundamental theory is going to get left in the dust.

Anyone who wants to be the guy/gal who understands the fundamentals will be valuable. But we don't need everyone trying to be that person. And most wouldn't be successful, though they'd be successful as the guy/gal who does other stuff.


Well, sure. But that isn't "eventually".


Everytime there is a paradigm shift there is always that voice: If you don't understand the paint at a chemical compound level you can't make a beautiful painting. Wait what?


Eh, let's revise that analogy. More like not understanding the bricks means you can't make a good building. You can by good intuition, but it won't be spot-on perfect (as by calculating all the physics) and you'll need more luck the higher you get.


I do take your point. I guess i'm just trying to say i've had good success just diving in head first and working backwards :)


Well, most states won't let me practice as an engineer, even if I get really good at Solidworks from watching YouTube videos, because I have no background in the field.


^ this, especially the

> Oftentimes, you do need a solid theoretical/mathematical background. Most people seems to approach ML like they approach programming tools or libraries - learn just enough to get job done and move on.

I've been coming across this on HN front page and it's worrysome to an extent.


If you are a hacker, you tend to be more driven towards learning techniques (ML, DL etc) to solve a problem at hand rather than just learning a technique and then hunting for a problem which can be solved with it. For example, my motivation for learning ML & any associated statistical methods went through the roof when confronted with a problem of figuring out a better way to identify & predict which devices (from a huge set) would go bust largely based on available indicators like power drain etc. I wouldn't have made the effort to read a bunch of papers & watch relevant videos if I didn't have a problem to solve. Maybe that happens if you've been a code-monkey for 20+ years.


Although to do serious production level ML, I agree that you need to understand the math. But as a starting point, the machine learning for hackers is a great place to start.

I think writing some algorithms and using them to solve problems provides great motivation for the math. In particular, the math will explain why certain approaches did and did not work. Without the hacking that material can get a bit dry.


> Well, I think a lot of competitive advantage of Google/Facebook in ML is because they have staffed their engineering with people who have studied these things for years (by PhD). Compare that to flipkart's recommendations.

Not entirely true. Google & FB have orders of magnitude more data than flipkart. You can have the smartest ML people on the planet churning out the most clever, advanced ML algos & models, but without enough data, its not going to be useful & effective.

I recently attended slashn[0], flipkart's annual technical conference, and spoke to a bunch of their ML folks. They have masters & phd degrees in ML from IITs, IISc and are as smart as they come.

Sure, flipkart doesnt have marquee names of the likes of yann lecun, andrew ng, but i wouldnt doubt the ML talent Flipkart has

0. https://slashn.flipkart.net/ ps : I dont work for flipkart, but have friends who work in ML teams


>Well, I think a lot of competitive advantage of Google/Facebook

But 99.9% of people are not trying to compete with Google or Facebook. They are looking for basic insights.

I do agree that ML isn't and shouldn't be "easy". But we inevitably and thankfully get abstracted away from the inner workings of things over time.


Love the statistics recommendations by Prof. Jordan ! Thanks a bunch !


What is it not enough for? Yes, having a PhD may help if you are developing the libraries, but developers are consumers.


ML is easy to get set up, but often difficult to debug if you don't really understand the details. Within the last week I pair reviewed a recommender system written in MLlib (ala this post http://spark.apache.org/docs/latest/mllib-collaborative-filt...), that was doing strange things, despite performing well on a test set. It turned out the metric being used on that page was not a good one for our purposes, and the algorithm had zoomed in on a degenerate solution that nailed the test score. This was clear to me after about 2 minutes by looking at the auxiliary matrices generated. The less experienced person I was helping did not how to proceed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: