01 — Does it make the output more reliable?This is the first question for any new feature or architectural change. Not "is it useful", not "do users want it", not "can we ship it by the end of the quarter". Does it actually make the outputs of our system more reliable? If the answer is no, the feature does not ship — regardless of how much we want it or how much pressure there is to build it.
02 — Does it reduce transparency?The second question is whether the change hides something that should be visible. If a feature requires us to obscure the debate, compress the disagreement, or present a contested conclusion as settled, it does not ship. This applies even when the hidden information might confuse or disappoint users. Confusion from honest uncertainty is better than confidence from manufactured clarity.
03 — Would we be comfortable publishing the methodology?Every scoring algorithm, every prompt, every orchestration decision should be something we are willing to describe publicly and defend. If we find ourselves designing something we would not want to explain, that is a signal that we are building for appearance rather than for substance. We stop and redesign.
04 — Does it advantage any single model unfairly?We are not in the business of ranking AI models. Our system should be neutral between providers and give each model the best possible conditions to contribute its genuine perspective. Any change that systematically advantages one provider — even if that provider has a commercial relationship with us — requires explicit justification that it serves users' interests, not ours.