Why would one need to check Datadog every morning? Wouldn't alerts fire if there...

seneca · 2026-03-16T01:49:00 1773625740

I'm not sure if this is what the writer was getting at, but I tend to check telemetry for my production applications regularly not because I'm looking for things that would fire alerts, but to keep a sense of what production looks like. Things like request rate, average latency, top request paths etc. It's not about knowing something is broken, it's about knowing what healthy looks like.

Understanding what your code looks like in production gives you a lot better sense of how to update it, and how to fix it when it does inevitably break. I think having AI checking for you will make this basically impossible, and that probably makes it a pretty bad idea.

danpalmer · 2026-03-16T06:24:10 1773642250

This is a good answer, and I agree that having a good production intuition like this is important. You're probably also right that having AI do it probably doesn't get that value.

I'm not sure I'd do this once a day. I tend to take note of things to build that intuition when I have other reasons to go and look at dashboards, and we have a weekly SLO review as a team, but perhaps there's a place for this in some way.

seneca · 2026-03-16T19:40:16 1773690016

Yeah, agreed. Daily isn't really necessary outside of initial launch and maybe a busy season. It's really just often enough to build a good sense of production use, and keep it up to date.

vrosas · 2026-03-16T02:14:35 1773627275

Almost no one actually knows how to set up their monitoring. Like, they know the words but not the full picture or how the pieces should actually fit together. Then they do shit like this to try and make up for that fact.

bdangubic · 2026-03-16T02:19:15 1773627555

the ones that know do not check anything every morning

import · 2026-03-16T02:17:54 1773627474

Well, the industry standard solution is correct monitoring and alerting. This doesn’t sound like “the right way”.

bak3y · 2026-03-16T01:24:11 1773624251

Exactly what I came to say, alerts need tuning if you're having to check your monitoring tools by hand.

dathinab · 2026-03-16T01:31:49 1773624709

I read the article as a way for AI to check, classify and potentially partial fix the alerts you see when logging-in in the morning.

And for many alerts you need to look at other events around it to properly classify and partially solve them. Due to that you need to give the AI more then just the alerts.

Through I do see a risk similar to wrongly tuned alerts:

Not everything which resolves by itself and can be ignored _in this moment_ is a non issue. It's e.g. pretty common that a system with same rare ignoble warns/errs falls completely flat, when on-boarding a lot of users, introducing a new high load feature, etc. due the exactly the things which you could fully ignore before hand.