But those are needed for L5 anyway, Lidar or noLidar. The big issue is the world model, what's the prior when uncertainty rises, how to spot really out of ordinary stuff, and how to spot really ordinary-looking but dangerous stuff.
The visual cortex part is what current deep learning stuff handles pretty well. What's not so obvious is how to generate a model that can manage the "usual problems" (unclear road signs, debris on road, pedestrians/cyclists, harm minimization in accidents) without "freaking out" (phantom braking).
For example if the car in front did not even slow down while going through some obstacle (the proverbial plastic bag), then it's very likely that we don't have to either. But if something looks like a drunk guy trying to cross the highway, then it's caution time.
That said, without the actual data/studies/access on what Tesla does and how, it's hard to say how good they are at this. (Sure, in theory everything needed to drive a car well can be inferred from regular human vision visuals. But that means their model has to be able to do counterfactual reasoning - I see this thing as a wall, but other cars go through it, so it's not a wall, so maybe it's a marking. Otherwise it'll very hard to just figure out things frame-by-frame and simply by raw object detection.)
The visual cortex does a lot more. It can predict future states, it does things like attention, it can classify states that are abnormal and direct attention to it, it can make analogies between different objects and use those to decide what is normal and what isn't. It's a lot more than just object recognition - it can also generate object classes on-the-fly and use them in the future.
We are very, very far from that. And having such a developed visual processing system is a prerequisite for feeding an executive processing system that can do counterfactual reasoning.
It's not just about the processing. To match human vision, cameras would need to be able to swivel and to adjust their focus extremely quickly. Animal eyes are a lot more than static cameras.