I recently wrote about a problem I had as a result of imagining that Ansible runs were re-entrant. (Spoiler: they are generally not.) After kicking this around a little I realized that you should not care whether Ansible runs are re-entrant. I like cherry pie so I will explain myself with a pie analogy.
If you are baking a pie for dinner tonight and something goes wrong, you would probably try to salvage it. If you realize you forgot an essential ingredient you might try to pull the top crust and add it in. If you screwed up the oven temp, you can adjust the baking time, temp or both.
But if you screw up while baking pies for the County Fair, it’s very different. You’re going for the blue ribbon, so no compromises. To bake the best pie you can, you would Just Start Over (and keep doing so until you Do Not Screw Up.)
If you’re bothering to automate server setup to begin with, you have already done all the work to make the best pie server you can, so there’s no reason to settle for anything less. When anything at all goes wrong during a build, why not just start over and get a clean build? It seems so obvious now.
A note about Chef:
Chef’s a different animal. The m.o. for a full-on Chef implementation is to continuously run the recipes against your machines from a Chef server, as much as for configuration/re-configuration as for initial deployment. We haven’t used Chef for a couple years, but I recall it seemed more tolerant to interrupted runs. Still, you can’t do better than a clean run, especially on an initial build.