The article misses a key bit of information TF needs when making a plan:<p><pre><code> 4. The previous Terraform configuration
</code></pre>
This is effectively stored by state.<p>We need this because if a resource is removed from the new config then Terraform needs to be able to delete the existing resource from the world. If we don’t have the state then Terraform must either:<p><pre><code> 1. Not delete it from the world
2. Or risk deleting something not managed by Terraform
</code></pre>
If everything were managed by Terraform then perhaps we would not need state, but this is not realistic in my view.
I respectfully disagree. If TF was stateless, you'd have to manage situations by hand that involve changing the idempotency key, such as the name of a VM. You'd also have to manage situations by hand where a resource is removed from the config.<p>The whole point of TF is that it has state and doesn't require workarounds for these scenarios. Yes, you have to maintain state, but the state problems usually come from buggy providers, not Terraform itself. For example, try and use the GitHub provider to create a repo with GitHub Pages enabled from the gh-pages branch. It won't work because the authors didn't respect a fundamental rule of writing a provider: one state-changing API call = one resource. If you don't respect that, you have to do state handling yourself and you're almost certain to have state-related bugs.
Having worked with both Terraform <i>and</i> Ansible code that created AWS resources - and operates very similarly to the model described here [0, see `filters` arg] - I generally disagree.<p>For example, if you changed an ID in your stateless Terraform, you'd have to insert some kind of code to destroy (or rename, if possible) the old resource. Or modify the Terraform DSL to include that kind of information, I suppose, and keep a historical record in the code perhaps. Then there's the question of what happens if someone modifies the physical resource out from under you -- could end up creating brand new resources rather than tracking the existing ones.<p>Also, it's nice to know that <i>your</i> Terraform instance is what created a thing -- if you ran a stateless `terraform destroy`, it's possible you could be deleting resources that someone else created that happened to match what your Terraform code defined. More of an edge case, I admit, but at scale these things have a way of happening...<p>That said, resources that don't have "physical" IDs work similarly to the stateless model by necessity. For example, VPC route table rules: [1, see the "Import" section].<p>Refactoring Terraform code <i>is</i> super annoying because of state, though, I'll give you that.<p>[0]: <a href="https://docs.ansible.com/ansible/latest/collections/amazon/aws/ec2_instance_module.html" rel="nofollow">https://docs.ansible.com/ansible/latest/collections/amazon/a...</a>
[1]: <a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/route" rel="nofollow">https://registry.terraform.io/providers/hashicorp/aws/latest...</a>
> Ansible, Puppet, etc. don’t have intermediate stores of the hosts’ configuration, but then again they are used for different things.<p>Ansible, Chef, Puppet and co existed long before Terraform. If the stateless way would work better, these tools take over the cloud infrastructure space. But they didn't. To me it seems like a good sign that the their approaches didn't fit to infra.<p>Additionally, one of the biggest terraform selling points in the early days was the change plan feature. The ability to see the entire change that is about to happen as a result of a config change. I don't think it's easy to implement such thing in a stateless system.<p>Is it possible to create a stateless config for a subset of the resources/providers with a better functionality? Absolutely! Octodns seems like a good example. Another great example would be any of the serverless frameworks out there that do a much better job than Terraform at managing the lifecycle of the functions. But can this approach be applied to every provider and resource?
I maintain a Terraform provider for Kubernetes. And one of the main reasons for that is because the Terraform state ensures purging of deleted resources.<p>Something that kubectl is not capable of. The lastAppliedConfig annotation does not help for purging, because once the manifest has been deleted on disk, there is no way of knowing what to delete from the server. The unusable apply --purge flag is the best example of this issue<p>I think the state mainly exist to know what has been created in the past but since been deleted from manifests and therefore needs to be purged. The caching/performance argument is rather weak, because Terraform refreshes by default anyway before any operation.
"Stateless is better", until you remove a tf file or resource from your code and Terraform have no way to tell if that resource ever existed in the provider during the apply to delete it, because there's nowhere to compare and it is impractical to query all existent resources and all tags/ids from those resources in the cloud provider on every infrastructure change (imagine multiple CIs doing that all day long, there's not enough api quota and your pipeline will take forever just for terraform know what to change in a medium/large organization).<p>That deletion and modification problems happens in Ansible and other provisioning tools that relies in idempotency, and that is one of the things that makes Terraform different from them. A stateless Terraform is useless, it's better use other provisioning tools.
Basically agree with the article. I've used direct cloud formation, AWS SAM, Ansible, terraform, and AWS CDK to spin up infrastructure...<p>My hard line opinion is that if something NEEDS state management to exist and update, it's a pet, treat it like a pet. Don't mix pets with the rest of your automated machinery except to the minimum extent required, when absolutely necessary.<p>We had to rewrite an Ansible role because a patch level version upgrade on the recommended community package started destroying security groups... Another time we had to roll back our deployment Ansible image and update 30 repos because a minor level version bump in another Ansible recommended package suddenly required log groups to have an expiration value set in AWS or the module fell over with an null reference on the AWS lookup/comparison... So we couldn't use the latest version to fix it.<p>I've almost never had this happen with any AWS tool... Sure, there's drift possibilities, but those are controllable by mainly not letting humans do things, and not having multiple cooks in the kitchen changing things in automation, which are good ideas for terraform and Ansible to...<p>I also encourage modular designs, any of which (except cdn/db, and dns related stuff, in my use cases) can be torn down and re-built with only the brief outage nonexistence causes.. we only have done that once in 3 years, and we believe the issue was actually on AWS's internal side anyways.<p>I've spun up over 260,000 vms over 4-5 years with one cloud formation template and the basic SDK call, and we've never bothered to convert it to another tool because it's never broken.. we occasionally tweak it, use gp3 instead of gp2, etc, but it's never needed us to unexpectedly side track a sprint for 1-3 days
Ahahaha there are commenters that don't realise that Cloudformation IS the state for your infrastructure that you've provisioned.<p>Ansible is stateless because every operation is suppose to be idempotent. Unless your ansible is doing a HTTP PUT request to an API I suspect you're misusing the tool for something it's not meant to do.<p>State is a good thing with infrastructure and terraform got it right.
Well, this would ruin the point of terraform and turn it into Ansible... You really need all three to get any use out of terraform:<p><pre><code> - The desired state in your .tf files
- The actual state in your provider (what you describe)
- The expected state in the state file
</code></pre>
This essentially gives you drift detection and delta updates. A simple "terraform refresh" and "terraform plan" read your actual state and does a diff between desired state and actual state. If all you had was the real world state, then the absence of resources gives you zero information. The other way around: planning a change without comparing it against a stored state gives you no way to actually purge the real world state.<p>You could technically argue that everyone should keep their .tf files in VCS/SCM and then have terraform first check the real world against the previous commit before creating a delta based on the current changes, but then you're just moving state to Git which is already a state backend...<p>The triangle this creates is why terraform generally is better than most other IaC systems which either don't have all three legs (and thus collapses them into a 2-dimensional also-ran tool) or they do but only for one special system (i.e. only AWS/GCP/Azure and no integration with anything else).<p>Next thing you know someone is coming to advocate against locking and hash comparison...<p>Edit: the best 'simple' explanation I could come up with is: you can't remove or update what you don't know shouldn't exist anymore. And you can't realistically 'download' the configuration of an entire cloud to 'check' all the tags for state information.
Is "remained stateless" accurate? The document only indicates that they used to try storing state in tags, not that it was stateless. "Stateless", to me, would mean that it tries to compare current resources with your current configuration without any metadata.<p>If you mean that they should have tried to keep using tags, not every resource and cloud provider supporting tags pretty much ends the possibility of that.<p>If you mean they should have gone completely stateless, I would say that state is great for detecting configuration drift. If Terraform tries to create a new resource, is it being created for the first time, or was it manually removed? You can only tell with state. Sure, you can get this information in a number of other ways, but one of the major strengths of Terraform is as an easy drift detection tool.<p>I get the desire for state not to exist, but in reality it is essential for what Terraform does.
AWS should really find a way to replace CF with something sane like TF so we can have both cloud resources and cloud workload described with native and managed tech. TF is pain but necessary.
Thank you OP for answering a question I’ve been long curious about but never bothered to look into, and sharing here.<p>I love/hate Terraform. It’s better than any other tool I’ve used for what it does, but the abundance of subtly leaky abstractions is tedious. And then when you mess up your state occasionally, yea that’s super annoying too.
Not having to deal with external state was one of the foundational design goals of octoDNS. The other being specifying a record once and pushing it to multiple providers. Those two in combination were the main reasons we didn't end up using Terraform to manage DNS and started octoDNS.<p>That did require making the initial decision that octoDNS would own the zone completely so that it could know it's safe to remove records that aren't configured, but later work with filters and processors does allow softening that requirement.<p>I've always had similar feelings about Terraform's state, in fact I started in on a prototype of an IaC system specifically to see if it was workable to avoid external state. The answer as far as that POC made it was yes. I was able to create VPCs, subnets, instances, to the point I stopped working on it. It was generally straightforward, but there was a hiccup for things that don't have a place to store metadata of some sort and the biggest issue was knowing when they were "owned" by the IaC.<p>I think some of the other issues that things would eventually run into would be around finding out what exists (again to be able to delete things.) The system would essentially have to list/iterate every object of every possible type in order to decide whether or not to delete them. Similar to octoDNS this could be simplified by making the assumption that anything that exists is managed, but that's not workable unless you're starting greenfield and it would still require calling every possible API to list things.<p>Anyway, I see why Terraform went the way it did, but I still wish it wasn't so. Thinking about it now makes me want to pick the POC back up...<p>(maintainer of octoDNS)
A huge issue with the statefile IMO is storing secrets and API keys.<p>We try to make sure EVERYTHING is done through service identities (Azure). No secrets in deploy scripts or resources.<p>Terraform will happily generate and store passwords and API keys for resources even if we don't want to use them and save the keys to everything in a file on a "random" disk and suddenly we need to lock this file down.<p>With a statefile we have to worry that secrets gets accidentally stored -- I am sure it id possible to configure it away case for case, but that is a more fragile approach.<p>With tools without a statefile the whole problem just goes away.<p>For us this was enough reason to disqualify Terraform.
Uh, how do you delete resources with this model?<p>If you don't have any state, and you have an empty module, did you just create it, or did you just remove all the resources from it? The former requires no action, the latter requires API calls to delete something that I no longer have a record of.<p>More generally, do I have to completely enumerate the entire state of every service available to my AWS account to determine whether there's something that shouldn't be there vs. the contents of my Terraform modules?
If Terraform is stateless, how does it know what it needs to / can delete?<p>You'll either have to:<p>- Move the state management elsewhere, and invoke different commands depends on what and how resources are changed. This will make automation difficult, and doesn't solve the problem.<p>- Make Terraform assume that everything it sees is under its management, deleting everything not defined in the current configuration. This will make Terraform hard to adopt in an environment with existing infrastructure.
Can the git repo that the terraform config lives inside of serve as state? For example, one commit ago this part of config was there and now it’s gone, showing temporal intent that it should be deleted in real life as well?
Given that there are well over 100 comments in this thread it’s possible that someone already mentioned this, but for people who are short-staffed but highly adventurous NixOS has an interesting take on all this. There’s even a Terraform binding called Terranix that isn’t too bad.<p>It’s a real journey and not for people under short-term time pressure, but of all the pain-in-the-ass things I’ve learned in computing over the years it’s paid me back well-above average.
Simple counterexample from this week. I added a Heroku app webhook to Terraform. They have some flags, and an ID. Without storing the state, there would be no way, using only the Heroku API, to know which webhook is supposed to be managed by Terraform.<p>I don't understand where the 99% comes from. Maybe I'm just using the wrong services, but it seems more like 50% to me, anecdotally. Maybe 90% if you include hacks like storing "tags" in other fields like the description or resource name.<p>If it's truly 99% for you, it seems like it wouldn't be too hard to make a tool that generates an ephemeral TF state file from the services on the fly. Win-win? Or you'd run into the next problem, collecting all the state all the time would probably be really slow in big projects.
Terraform having state is a trade-off:<p>Pros:<p>- it can manage resources where not all information can be queried<p>- reading state is almost always significantly faster<p>- it can identify deleted resources<p>Cons:<p>- state can go out of sync<p>- every resource type has to implement state well while tracking changing features, so it’s fragile<p>- fixing broken state can be a PITA<p>Without making these trade-offs, terraform would only be able to support a smaller subset of the providers it currently does and perform poorly.
Terraform could have been stateless if the target provider was stateless as well.<p>Having said that I don't mind state at all! There is always a storage layer and a schema somewhere storing some kind of information.
Having a state file isn't a bad idea. State files are a logical map from your code to the AWS resources. You can actually import a resource that Terraform never created into a state file so that Terraform can manage it.<p>But of course, you could just write the code to explicitly include that pre-existing resource, rather than have to run a command to tell a state file to explicitly include it.<p>The problem with Terraform isn't that it keeps its own state file. The problem is that Terraform is just <i>really dumb</i>, regardless of the state file. It doesn't know how to auto-import existing resources. It doesn't know how to overwrite existing resources. It doesn't know how to detect existing resources and incorporate them into its <i>plan</i>. It doesn't know how to delete resources that were unexpected. It's so stupid that it basically just gives up any time any complication happens.<p>Puppet and Ansible aren't that stupid. They both will make a best effort to deal with existing resources and work around problems. It's not the presence or absence of an external state database that make those tools "more advanced", it's simple logic.
If you do any non-trivial devops works on cloud providers it's immediately obvious why this is nonsensical.<p>Let's take the most basic example: auto-generated ids. Many resources in AWS, GCS, etc have auto generated ids (just use tags you say, but many don't have tags or tags are used as part of some other system). Now, when terraform creates that resource you have to modify the config to contain the id. But if you have any sense terraform runs as part of a CI system that lets others review your code before merging, deploy to staging, etc.<p>So now does the terraform process need to make an automatic git push? What if there's a conflict? Does it make a PR that has to be manually merged? All of this is much more complicated than just having one JSON file in S3.<p>I have actually managed resources with Ansible where you have this problem and it's worse. And this is just _one_ thing.<p>Is Terraform's state story perfect? No. There are definitely annoyances, and one thing I'd love to see is a way to declaratively handle imports, renames, etc. when you need to, but it's better than the alternative.
Interesting read. Seems to be focusing on the neck of the woods the author is exposed to.<p>Terraform isn’t about cloud apis only. Terraform allows me managing keycloack realms, ssh keys, cloud resources, postgres databases, git repos, imap accounts, and so on.<p>The state is there to be treated as the source of truth. It gives an answer to „do I have what I want to have”. With that state, it is possible to cross reference various resource types without having to load the curent state on every run. I’m surprised that the author did not see that as a performance issue.<p>Imagine that you have to load all route53 state, all buckets, ec2 instances, iam roles, … on every execution, and imagine you have 200+ machines… That’s what we used to do with puppet and chef, no?<p>Turned out that always fetching the view of the world from an api is pretty expensive and quickly exhausts api rate limits.<p>It’s a pretty weak article without any effort to suggest how could it work without a state.
Trends come and go. I too use terraform now because that is what my customers use.<p>Previously we have used Ansible for the same thing, in a largeish environment, in production. It had the obvious benefit of declarative and stateless.<p>The other comments here seem to focus on how that works badly together with manual state changes made externally from the system. The answer is that it requires another way of working where the state <i>is</i> the git repo. The question is not what to do when someone spawns extra test nodes, but why they did not do it in a version controlled manner.<p>Perhaps as a reaction to the discipline required, something about Kubernetes attracted a lot of people used to manipulating state by manual interaction. Every installation I have seen has a web interface in use, whereas not as many have in the Ansible world. I fully expect this pendulum to swing back and become more declarative and version controlled again.
Personally I think the custom dsl’s are a bigger issue. I spend a lot of time wrangling tf to have reusable, configurable modules.<p>The more i use tf, the more i think it would be better to remove _all_ dynamic features and use a real language to generate tf configs as flat, static files.
If you use Terraform, you should only update your infrastructure through Terraform and persist the state in a shared place (e.g. S3 versioning). I see people having hard time when they use both AWS CLI, AWS console and Terraform.
amen! as another commenter here rightly put:<p>state or idempotency, pick one.<p>hello? how is this even a question? i have a hard time believing that idempotent infrastructure mutations are not the right move almost always.<p>shameless plug[1], i’ve been exploring an aws specific approach to infrastructure that is stateless and idempotent for exactly these reason.<p>slow, finicky, stateful deploys are about as awful as it gets. add a pinch of lowest common denominator among all providers, and that’s a tough pill to swallow.<p>there’s got to be another way, aws should be fun!<p>1. <a href="https://github.com/nathants/libaws" rel="nofollow">https://github.com/nathants/libaws</a>
“level triggered” vs “edge triggered” is the kubernetes term for stateless vs stateful. It’s one of the reasons why any cluster configuration drift becomes eventually consistent with desired state.
State or Idempotency. Pick one.<p>If you want stateless, then you can use Ansible and use their providers. Enjoy spawning new instances everytime you change your infrastructure, rather than having existing ones change.
This looked suspicious when I started reading. But when I reached comparison to Puppet and Ansible my feeling about author's lack of understanding of TF got reinforced.
You couldn't do everything that Terraform does today without a local state, but perhaps that would be a good thing? Call it "Terraform strict mode".<p>As every application developer knows, duplication of state is a primary source of bugs. To combat this, React/Flux type of architectures became extremely popular where state flows in one direction only. They dictate that, no you can't just cheat a little and use jQuery to modify some element, it _will_ get bulldozed on the next render. And a lot of Terraform headaches do come from this analogous reconciliation of what really is (our cloud env), what we want (our TF code) and this intermediate state of what TF thinks the cloud state is.<p>So, by saying that you cannot have resources outside those defined by TF there is actually a massive simplification with far reaching consequences possible.<p>How I imagine the experience would be:<p>- You could say that your Dev env is a shitshow and always will be, of manually created resources and only partially TFed. But your Production and Staging envs are opted in to "strict mode". This means that if there is a conflict you do only have two options: import the offender or destroy it, and the critical mindset change is that this is a good thing and will save us a lot of tears later on.<p>- Caching is an orthogonal concern. Terraform mixes these two together to its detriment, but the nature of a cache is such that you can safely blow it away and perhaps the next reconciliation will be slow, but it will be accurate. I also don't believe it would actually be that slow, tools like Cloudcraft map the entire metadata of your account in seconds.<p>- I find the excuse that some resources don't support tags intellectually lazy. Of the top of my head thinking about it for a minute, could you tag a parent resource with the child metadata you need? E.g. individual DNS records don't have tags, OK tag the Zone with childA=value. Same thing with tag length limits, you can work around it, concatenate values or whatever. However, in a truly strict mode you wouldn't even need metadata in tags because the TF code describes the entire target environment.<p>I hope Terraform would entertain such a strict stateless mode. Unfortunately it will probably take another tool, because the problem is not so much technical as it is an entire mindset change.
I like the idea of stateless (or at least not having to manage the state as in the case of CF) but I’d hate to see the performance times for API calls if one needed to do a search for tags to conduct a CRUD. Particularly if this were a multi-account setup.
Avoiding state is theoretically a nice goal.<p>But a generic “none” backend as described in the OP would simply be impossible. The support for diffing desired vs actual state must be implemented in the resource provider, which in turn need to be supported by the cloud API. Ubiquitous labels, namespaces and global query performance seem to be the primary blockers today, judging by most other comments here.<p>Interesting thought if someone would attempt to make such a provider. Also interesting to look at existing providers if they avoid state internally when possible or just use it because it is available.<p>Looking at the challenges with kubectl apply --purge, already having all enablers laid out, it would require big effort.
You can use terraform in a "stateless" manner!<p>Define everything in like cdk (... I use ruby to generate the tf.json), generate the code, import everything you can without error, and apply the rest.<p>Performance will be _bad_ but that will completely eliminate state problems.
Terraform has to have state, for the reasons outlined by other commenters.<p>But, it is undoubtedly a pain in the arse in practice - invariably someone else is doing a change on a branch but has already applied it to the development infra using the shared state bucket, which then pollutes it for everyone else as you don't have those changes so your applys will now want to undo them.<p>As a first timer with this stuff, I think a key lesson learned is to balkanise your terraform quite heavily - we have a tfstate per environment but that is nowhere near granular enough, slicing it into smaller pieces would obviate many of those 'pollution' problems.
I haven't found a good way to handle resources that were created outside TF. Like a EC2 instance running in staging to debug an issue, but then forgotten about. They don't exist in TF state or config so TF simply ignores them.<p>I've tried terraformer[1], but I can't tell how well maintained that is (it failed to get my creds, I had to modify the code to fix it, and then it crashed with an obscure error).<p>Anyone have a good approach?<p>1. <a href="https://github.com/GoogleCloudPlatform/terraformer" rel="nofollow">https://github.com/GoogleCloudPlatform/terraformer</a>
I think the author of this article misunderstands what Terraform is for and what it does.<p>The author compares Terraform to Ansible and Puppet, but these are not analogous tools. If you compare to Pulumi, you'll notice it also has state.<p>Keeping track of what is live and what is <i>expected</i> to be live is much of the utility of Terraform.<p>If the author is getting some Terraform state pain (e.g., from drift caused by a team making changes to infrastructure and forgetting to represent those changes in code), they should try using Terraform Cloud or building a CI pipeline for their infrastructure.
"None" state store definitely would be good for one of our case - automatic preview deploys for pull requests.
Statefull approach works good 99.9% of time but rare errors like inability to destroy resource because of cloud provider issues cause that you need manual interventions - i.e. resource cant be created as not managed by state etc.
Even rare errors are not rare when you have 20+ devs :-)<p>That needs some manual fixes i.e import state, killing resources etc but not all devs has access rights or knowledge to do this.
>Also, if Terraform configuration is refactored, for example, to wrap a bunch of frequently copy-pasted resources into a module, state must be manually reconciled before proceeding.<p>Not true since Terraform 1.1 introduced <i>moved</i> blocks : <a href="https://www.terraform.io/language/modules/develop/refactoring" rel="nofollow">https://www.terraform.io/language/modules/develop/refactorin...</a>
Given the ability to import resources this actually wouldn't be that big of a lift for some providers like AWS.<p>Wouldn't take much to hack something together to test this out, either... parse the TF for resources, lookup what is used for their IDs, run TF import with discovered IDs from the service provider and then your local state is up to date, run your plan / apply and blow away the state when you finish. But this is super gross IMO :)
Would say that half of the point with Terraform is that you have a canonical expected view of the infrastructure and the state as a log to assert that in each environment. It forces you to do all changes in a code first manner. Terraform only solves the provisioning part and is easier to work with than Ansible. One need something like Ansible too for configuration though.
something that can help reduce the need for state is aggressively partitioning systems into separate aws accounts.<p>then you can KNOW that no other random infrastructure should exist in an account.<p>terraform definitely does help coordinate the bunk beds of room mates. wouldn’t want them to accidentally discard each other’s pillows as they move in and out of the shared space.<p>separate billing is a nice bonus.
As TFA article mentions, we already have Ansible and friends. You can use Ansible with AWS <a href="https://www.ansible.com/integrations/cloud/amazon-web-services" rel="nofollow">https://www.ansible.com/integrations/cloud/amazon-web-servic...</a> if you want to
I think the best thesis for this is decentralisation. The stateless approach only works if this one tf file (or set of files) manage(s) the whole world.<p>As soon as you get to multiple teams managing (say) AWS infra, you can't infer that a resource present in infra but not in tf file means a resource should be deleted.
I am a big fan of Azure ARM/Bicep. I can’t imagine needing to deal with stateful IaC. Perhaps Azure doesn’t need to do that because it’s able to provide guarantees about their own platform or something.<p>The actual infrastructure and its configuration <i>is the state</i>. It does a diff to see what needs changing.
I am not sure how feasible the proposal is.
Certainly cloudformation is too archaic without Jinja & co.
Terraform doesn't need extras, but provider dependencies can be painful.
Assuming every resource has a name field, auto prefix it with "tf-"<p>If the resource is prefixed with "tf-" but missing in the terraform config, delete it.
Re: handling unmanaged resources, you have 2 options:<p>1. Everything not declared is unmanaged. Deleting is done by a "deleted" attribute of some sort. The resource is declared as deleted.<p>2. Everything unmanaged is declared as unmanaged. The resource is declared as unmanaged and TF would ignore it.<p>Either of those would work without extra state.
You could work in some way without a state but actually the state is a feature which helps you.<p>Imagine you go down the ansible route and you write idempotent ansible code than you could argue: “See. I don’t need state. My code is idempotent. I just use this playbook to apply”<p>Now think of deleting resources.<p>You could have a delete playbook maybe. Then how would choose whether to create or delete stuff?
Maybe a colleague gives you a ticket: “please delete”<p>Now there are various scenarios:<p>You do your thing (on your laptop) and tell everybody else in your team to not run the CreatePlaybook as you have to run the DeletePlaybook first for this one machine. Afterwards you delete the actual machine/resources from your ansible repository and tell the team: “please use the newest main branch”.<p>So: Here is your equivalent to terraform state, the coordination effort on your side since you are the only person who currently “knows” what’s going on with deletion/applying.<p>So your next idea is: “no problem, I make a Pipelines which runs the playbook on your behalf”. And the pipeline will update maybe the Git repo accordingly in some way - after the run (since ansible needs to know in the run, what to delete).<p>Everybody can see the pipeline . The pipeline will ensure you sequentially apply the playbooks to coordinate with your colleagues.<p>Next problem: how does the pipeline know which playbook and resources to run on?<p>You create a selection box for your hosts and for the playbook yaml to trigger the pipeline.<p>=> there is your state.
Your ticket information are transferred to your manual labor to fill in the correct items in your selection box. To reestablish want went on you now have to look in the ansible code and the pipeline Paramus and pipeline logs.<p>More examples are: you only want to update some stuff with your ansible playbook. Therefore you might introduce tags on the resources and the playbook knows how to handle those tags. The extreme case might be:<p>TTag:state:present, tag:state:absent.<p>Then you can run a single playbook which can call the deletion and installation playbook for you and everybody is happy that you have everything visible in Git.<p>Problem here: your 2 step process to decommission things from git. First a commit which sets state:absent. Then run the pipeline and then another git commit to delete the code.<p>So what I am saying is:
You can do all of this with ansible for sure. But you will have state somewhere:<p>In a Ticket, a Pipeline log, in git, in a wiki<p>I am not saying absible should not be used. It makes sense to configure things. (I personally would wrap a terraform hull around my absinble code and call it, just to have terraform handle the locking for my playbooks)<p>But just watch out for the hidden state in your workflows and better make it explicit.
This is why people love GitOps for traceability.<p>You can all do this by hand and document your process in a Wiki for your colleagues so that they known what the “tag:state:absent” means for them.
Or you can rely on somebody who has done this for you already and maintains documentation and what not.
Terraform it’s slowly losing relevance. You’re making a good point but they haven’t been open to chance for decades are are content to become another Sun Microsystems. Better idea: advocate the ‘state only if needed’ approach to Pulumi.
Completely agree. Thanks for writing this up. Specifically because I've always thought this, but also don't value my opinion on the topic. I was first exposed to devops a few years ago, and as a result was learning docker, k8s, terraform, salt, &c. at the same time. (I hated it, and now I'm happy writing C++ again)<p>What I could never wrap my head around was why the heck the tools had to expose so much complexity. I want the description of my infrastructure to be a single, or set of text files, written preferably in JSON or YAML like everything else, and I want to run a command that behaves in the same way as GNU make.