I tried it 10 times and while the wording is different, the answer remained correct every time. I used the exact question from the comment above, nothing else. While determinism is a possible source of error, I find that in these cases people usually just use the wrong model on ChatGPT for whatever reason. And unless you set the temperature way too high, it is pretty unlikely that you will end up outside of correct responses as far as the internal world model is concerned. It just mixes up wording by using the next most likely tokens. So if the correct answer is "one", you might find "single" or "1" as similarly likely tokens, but not "two." For that to happen something must be seriously wrong either in the model or in the temperature setting.
The only way I can get it to consistently generate wrong answers (i.e. two sisters) is by switching to GPT3.5. That one just doesn't seem capable of answering correctly on the first try (and sometimes not even with careful nudging).