Out of curiosity, (a) did you receive this error at the start of a session or in the middle of it, and (b) did you manage to find/confirm valid findings within the scope/codebase 4.7 was auditing with Sonnet/yourself later on?
I just gave 4.7 a run over a codebase I have been heavily auditing with 4.6 the past few days. Things began soothly so I left it for 10-15 minutes. When I checked back in I saw it had died in the middle of investigating one of the paths I recommended exploring.
I was curious as to why the block occurred when my instructions and explicitly stated intent had not changed at all - I provided no further input after the first prompt. This would mean that its own reasoning output or tool call results triggered the filter. This is interesting, especially if you think of typical vuln research workflows and stages; it’s a lot of code review and tracing, things which likely look largely similar to normal engineering work, code reviews, etc. Things begin to get more explicitly “offensive” once you pick up on a viable angle or chain, and increase as you further validate and work the chain out, reaching maximum “offensiveness” as you write the final PoC, etc.
So, one would then have to wonder if the activity preceding the mid-session flagging only resulted in the flag because it finally found something seemingly viable and started shifting reasoning from generic-ish bug hunting to over exploitation.
So, I checked the preceding tool calls, and sure enough…
What a strange world we’re living in. Somebody should try making a joke AUP violation-based fuzzer, policy violations are the new segfaults…
I view this paradox as just an effect of poor framing. We should not look at it as “I am against intolerance/hatred/XYZ”, but “I want to minimize intolerance/hatred/XYZ.” The first focuses on local, case-by-case contexts, the latter in aggregate. Some XYZs, in some contexts, have properties that make them effective local tools to mitigate themselves in an aggregate context, which is probably a better candidate paradox here.
the claim is that it moved sales forward in time, but it'll have a corresponding dip in sales later, whereas a good sales campaign increases total volume (virtually no dip, brings in new customers, etc)
reply