I’m really disappointed that I wasn’t in the right headspace to name this post “the trick is to surrender to the flow”, the lyric from Phish’s The Lizards.
My brother still follows them across the country. Reading this and the comments gave me real insight into what longtime developers are feeling
I’m brand new to coding and have only done some AI-assisted work while learning. It blows my mind that people can do this at a high level without assistance - major respect
yeah, it's curious. I sometimes ask it why it ignored what is explicitly in its memory and all it can do is apologize. I ask -- I'm using Claude with a 1M context, you have an explicit memory -- why do you ignore it and... the answer I get it "I don't know, I just didn't follow the instructions."
For it to follow the instructions I had for it. Call me naive and stupid for thinking the 1M context window on the brand new model would actually, y'know, work.
Just dealt with this last night with Claude repeatedly risking a full system crash by failing to ensure that the previous training run of a model ended before starting the next one.
It's a pretty strange issue, makes me feel like the 1M context model was actually a downgrade, but it's probably something weird about the state of its memory document. I wasn't even very deep into the context.
Microservices require distributed debugging; distributed debugging requires distributed tracing. Just imagine, as I've been trying to push forwards in my own Ph.D. work, that you could debug a process across microservices. This is why we want this; possibly, done a bit more resiliently and thoroughly than the original OP.
…or, I guess Barbara Liskov in 1987, with Argus. And yet, we still seem to debug programs interactively in isolation. Perhaps it’s because all of those systems assumed a system developed in isolation, that didn’t evolve independently and weren’t implemented in different programming languages communicating through different network protocols instead of function/method invocations.
Academics have tried to make this a reality for years. I suggest revisiting Waldo's "A Note on Distributed Computing" and working forwards from there. If you want to go back further, look at Argus, Emerald, and the original Hermes (from DEC.)
As someone who programmed Erlang both professionally and published academically at Erlang venues for a long time, no.
These optimizations "for runtime" are not well supported by Erlang (i.e., cluster performance changes dramatically when behavioral characteristics of message passing switch from local to remote to remote cluster very quickly) and were long discussed in Waldo's paper back in the 90s, dynamic relocation is not supported well (i.e., unless you use global, which falls apart quickly under network anomalies, of which I, and several others, wrote paper(s) about), and the runtime hardly provides any information on introspection on cluster performance.
Sadly, distributed Erlang had the edge on programming distributed systems almost 20 years before they became pervasive, but has since been left to atrophy and hasn't seen any real innovation in quite a long time.
OpenTelemetry's Java implementation does this, but it actually does it in a way that non-GRPC things can access this context as well by ensuring that it propagates throughout both the CoroutineContext and the thread-local state that's used by OpenTelemetry itself to propagate tracing information into Java code that is used by a Kotlin coroutine that happens to execute code that was written in Java.
e.g., I handle a request, get the incoming context, have to stash it because I might execute a coroutine that is suspended/resumed across different threads, and subsequently then execute another GRPC call in a Java library, that happens to start, get rescheduled and resume on receiving the response on a different thread, in a possibly different thread pool.
The OpenTelemetry handling for this is quite complex: it must be used as a javaagent so it can actually instrument underlying libraries with the necessary code for handling thread scheduling/context switches in both thread pools (e.g., ForkJoinPool), threads themselves, with cooperative scheduling in application code (e.g., Thread) and Kotlin's coroutine handling with is mostly codegen (e.g., async, suspend fun.)
Finally, in my own Ph.D. work, we did a similar thing to propagate trace identifiers for a dynamic analysis for fault injection, and we quickly ran into a problem that --- not only is the propagation difficult in itself --- but, you also run the risk of running out of header space if you store any (longish?) information when GRPC is run over HTTP2 because of the maximum allowed header size.
I'm also not sure what you mean by the context doesn't propagate between containers and/or pods -- GRPC isn't aware of these Docker/Kubernetes aspects at all.
Do you actually mean that unless explicitly propagated to a subsequent downstream RPC the data is dropped? If so, that's by design.
However, most large-scale organizations that are doing distributed tracing (e.g., Twitter, Uber) have either invented, reproduced, or leveraged OpenTelemetry's design for this precise thing.
Naive context propagation isn't (really) the difficult part with most of these designs -- it's what you've done, using an interceptor, reading the data and assigning it automatically on subsequent requests -- the challenge is dealing with this under many different, real world conditions: a.) concurrency and thread scheduling; b.) not all services use the same version of downstream RPC libraries; c.) not all calls are GRPC, and some use HTTP (and, different HTTP libraries, at that.); and d.) you cross message passing boundaries: i.e., I receive request, write to Kafka queue/reliable workflow backend (e.g., Cadence, Temporal) and re-read the request and then execute a subsequent RPC as a result of that message.
If you're using Kotlin, I suspect you will run into these challenges. Tune your thread pools up/down, restrict your JVM's resources, and you'll suddenly see that if the thing that handles the request uses different threads/coroutines/etc. then the code block that issues the downstream RPC, you'll start dropping the context without explicit handling of that case.
In fact, a very simple test case in Java where you use several, concurrently executed CompleteableFuture's that each issue RPCs, in a very small thread pool should be enough to see the issue.
Thanks for the comment, it seems like you know a lot more about this than I do.
This was a solution that has worked well for my company that averages <1 req/s, so yes I have not tested it under more extreme conditions. This is version 1.0.0 so it is quite new and naive by design. I was posting here to get some feedback on the initial version and see how I can improve it, which you have given me!
Feel free to contribute to the project! It seems like your expertise applies nicely!
It's too bad that there is still no JEP for an official context in Java.
Disclaimer: I wrote the OTel one, and am sad to see yet more context implementations being made, including the OTel one, these really need to all be on the way out.