Doing sequential reading into a queue for workers to read is a lot more complicated than having a file format that supports parallel reading.
And the fix to allow parallel reading is pretty trivial: escape new lines so that you can just keep reading until the first unescaped new line and start at that record.
It is particularly helpful if you are distributing work across machines, but even in the single machine case, it's simpler to tell a bunch of workers their offset/limit in a file.
And the fix to allow parallel reading is pretty trivial: escape new lines so that you can just keep reading until the first unescaped new line and start at that record.
It is particularly helpful if you are distributing work across machines, but even in the single machine case, it's simpler to tell a bunch of workers their offset/limit in a file.