I tried it 10 times and while the wording is different, the answer remained corr...

kenjackson · on March 4, 2024

I got an answer with GPT-4 that is mostly wrong:

"Sally has 2 sisters. Since each of her brothers has 2 sisters, that includes Sally and one additional sister."

I think said, "wait, how many sisters does Sally have?" And then it answered it fully correctly.

sigmoid10 · on March 4, 2024

The only way I can get it to consistently generate wrong answers (i.e. two sisters) is by switching to GPT3.5. That one just doesn't seem capable of answering correctly on the first try (and sometimes not even with careful nudging).

m_fayer · on March 4, 2024

A/B testing?