I think you're ignoring a lot of ways in which this system will not easily extend to more complex tasks.
-While the retrieval heuristic is sensible for the domain, it's not applicable to all domains. In what situations should you favor more recent memories over more relevant ones?
-The prompt for evaluating importance is domain-specific, asking the model to rate on a scale of 1 to 10 how important a life event is, giving examples like "brushing teeth" (a specific action in the domain) as a 0, and college acceptance as a 10. How do you extend that to a real-world agent?
-The process of running importance evaluation over all memories is only tractable because the agents receive a very small number of short memories over the course of a day. This can't scale to a continuous stream of observations.
-Reflections help add new inferences to the agent's memory, but they can only be generated in limited quantities, guided by a heuristic. In more complex domains where many steps of reasoning may be required to solve a problem, how can an agent which relies on this sort of ad hoc reflection make progress?
-The planning step requires that the agent's actions be decomposable from high-level to fine-grained. In more challenging domains, the agent will need to reason about the fine-grained details of potential plan items to determine their feasibility.
> This can't scale to a continuous stream of observations.
My mind doesn’t scale to a continuous stream either.
While I’m typing this on my phone 99.99% of all my observations are immediately discarded, and since this memory ranks as zero, I very much doubt I’ll remember writing this tomorrow.
I did not read the original post, but your reflections are a great enrichment to what I think the post is about, so congratulations for this good addition.
But, it is. The application domain here is fairly trivial, but the logic is both simple and highly general.
> but what would be required to achieve increasingly complex behaviors?
Basically, three things on top of this:
(1) more input adaptors to map external data into language, and
(2) a bigger context space to process more current & retrieved data simultaneously, and
(3) more output adaptors to map intentions expressed in language to substantive action.
But the basic memory/recall system seems fairly robust and general, as does the basic interaction system.