This is completely wrong, if anything an increase in the usage of LLMs to generate small pipelines will lead to increased demand for professional pipelines to be built. Because if any small thing breaks the dashboards/features break which is immediately noticeable. I think you'll see a big increase in the number of models a data scientist can create, but making those python notebooks production ready can't be done by an LLM. That's to say as analysts create more potential use cases, there will be more demand to get those implemented.
There's so much that goes into ensuring the reliability, scalability and monitoring of production ready data pipelines. Not to mention the integration work for each use case. An LLM will give you short term wins at the cost of long term reliability - which is exactly why we already have DE teams to support DA and DS roles.
>This is completely wrong, if anything an increase in the usage of LLMs to generate small pipelines will lead to increased demand for professional pipelines to be built. Because if any small thing breaks the dashboards/features break which is immediately noticeable. I think you'll see a big increase in the number of models a data scientist can create, but making those python notebooks production ready can't be done by an LLM. That's to say as analysts create more potential use cases, there will be more demand to get those implemented.
I agree. There is a lot of data people want that isn't made because of labor costs. Not just in quantity, but difficulty. If you can only afford to hire one analyst, and the analyst's time is only spent on cleaning data and generating basic sums, then that's all you'll get. But if the analyst can save a lot of time with LLMs, they'll have time to handle more complicated statistics using those counts like forecasts or other models.
> If you can only afford to hire one analyst, and the analyst's time is only spent on cleaning data and generating basic sums, then that's all you'll get. But if the analyst can save a lot of time with LLMs, they'll have time to handle more complicated statistics using those counts like forecasts or other models.
That applies to so many other jobs.
My productivity as a single IT developer, making a rather large and complex system mostly skyrocketed when LLM's became actually useful (around GPT4 era).
Work where i may have spend hours dealing with a bug, being maybe 10 minutes because my brain was looking over some obvious issue that a LLM instantly spotted (or gave suggestions that focused me upon the issue).
Implementing features that may have taken days, reduces to a few hours.
Time taken to learn things massive reduces because you can ask for specific examples. Where a lot of open source project are poorly documented or missing examples or just badly structured. Just ask the LLM and it puts you in the right direction.
Now, ... this is all from the perspective of a 25+ year experienced dev. The issue i fear for more, is people who are starting out, writing code but not understanding why or how things work. I remember people before LLM's coming in for Senior jobs, that did not even have basic SQL understanding, because they non-stop used ORM's. But they forgot that some (or a lot) of this knowledge was not transferable to different companies that used SQL or other ORM's that may work different.
I suspect that we are going to see a generation of employees that are so used to LLMs doing the work but not understanding how or why specific functions or data structures are needed. And then get stuck in hours of LLM loop questioning because they can not point the LLM to the actual issue!
At time i think, i wish this was available 20 years ago. But then question that statement very fast. Was i going to be the same dev today, if i relied non-stop on LLMs and not gritted by teeth on issues to develop this specific skillset?
I see more productivity from Senior devs etc, more code turnout from juniors (or code monkies), but a gap where the skills are a issue. And lets not forget the potential issue of LLM poisoning with years of data that feeds back on itself.
I see it as a gray area - long term there will be a need for both and you will have just one tool to choose from when presented with time-budget-quality constraints.
Yeah I can also see it very much depending on the demands - I'm definitely not saying every pipeline has to be the most reliable, scalable piece of software ever written.
If a small script works for you and your use case / constraints there's nothing I can say against it, but when you do grow past a certain point you'll need pipelines built in a proper way. This is where I see the increased demand since the scrappy pipelines are already proving their value.
There's so much that goes into ensuring the reliability, scalability and monitoring of production ready data pipelines. Not to mention the integration work for each use case. An LLM will give you short term wins at the cost of long term reliability - which is exactly why we already have DE teams to support DA and DS roles.