Let me paint you a picture.
It’s 2 AM. Your phone buzzes. Production is down. You log into the server, run a few queries, and slowly realize what happened: someone logged into the wrong box last Tuesday and “just quickly” tweaked a config. Not the test server. The production one. And nobody noticed until right now.
Sound familiar?
That kind of thing is exactly what pushed me toward the cloud. And now, apparently, people want to go back.
I Get It. I Really Do.
There’s a real conversation happening about repatriation. Cloud costs are going up, data sovereignty rules are tightening (especially here in Europe), and CFOs are staring at their Azure invoices like they received a bill in a foreign language. Owning your own hardware again suddenly sounds appealing. Predictable, even.
And to be fair, the arguments aren’t completely wrong. There are situations where on-prem makes sense. But I’ve noticed that the conversation often skips past the hard parts: the stuff nobody mentions when they’re building the PowerPoint slide with the cost comparison. The slide that makes it look obvious.
Before you rack those servers and dust off your IPMI (Intelligent Platform Management Interface) knowledge, let me share a perspective from someone who’s spent time in both worlds and has the incident tickets to prove it.
What Cloud Actually Gave Me
Honestly, the thing I love most about the cloud has nothing to do with elasticity or global reach. It’s Infrastructure as Code.
When I provision a VM today, I write Terraform. Every property, every disk size, every network security group rule is declared, version-controlled, and reviewed before anything touches a datacenter. When that VM goes to production, I know exactly what’s on it, because it’s sitting right there in Git. Same for the next one. And the one after that.
I have seen too many situations where servers were installed just a little bit differently. A slightly different collation here. A different service account there. A Windows update that everyone forgot to apply. And then months later you get a production incident that takes days to diagnose because you’re comparing systems that look identical but aren’t. Configuration drift is a slow and silent killer, and it thrives in environments where people are clicking through wizards instead of committing code.
Tools like PowerShell DSC v3 exist to solve this, and they’re great. DSC can run anywhere, cloud or on-prem, and push a system toward a desired state. But there’s a catch: you need the machine first, and someone had to build it. DSC keeps you honest after the fact. IaC prevents the mess before it starts.
When your entire environment lives in Terraform and gets deployed through a pipeline with approvals, gates, and audit logs, the 2 AM phone call becomes a lot less common. And when it does happen, you know exactly where to look. You open the pipeline history, you check the last approved run, you compare. That’s a very different troubleshooting experience than “let’s check what’s different between these two servers” when nobody documented what they changed.
The pipeline also gives you something that’s easy to underestimate: a paper trail. Every change to infrastructure goes through a pull request. It gets reviewed. It gets approved. It gets logged. When something breaks, you’re not guessing who changed what and when. You know.
The Hidden Cost Calculation Nobody Does Correctly
The main argument for going back on-prem is usually cost. “We ran the numbers.”
Ask to see those numbers. Every line of them.
On-prem cost is deceptively foggy. There’s the hardware, sure, but there’s a lot more stacked on top of that than most people put in the spreadsheet:
Hardware lifecycle: Servers don’t last forever. Plan for a refresh every 3 to 5 years. That’s not just the purchase price, it’s the migration effort, the downtime, the testing, and the disposal of the old kit. Factor in spares too, because when a disk controller fails on a Friday evening, you don’t want to wait for a delivery.
Warranty and support contracts: Basic warranty gets you pretty far in year one. By year three you’re looking at extended support contracts, and next-business-day on-site hardware replacement adds up fast when you have a rack full of servers.
Power and cooling: Usually handled by facilities, which means it often doesn’t show up in the IT cost comparison at all. A rough rule of thumb is that power and cooling adds 30 to 40 percent on top of the hardware running cost. If your datacenter is shared or leased, this is usually buried in a monthly charge that nobody fully unpacks.
Networking infrastructure: Switches, routers, firewalls, cables, patch panels, transceivers. All of it has a purchase cost, a maintenance cost, and eventually a replacement cost. And then there’s the configuration that lives in someone’s head.
The people: This is the one that kills most on-prem cost comparisons when you actually count it properly. It’s not just one engineer. It’s the engineer, the backup engineer for when the first one is on holiday, the team lead, the manager, and the part of the security person’s time that goes toward firewall rules and patching compliance. Add up the fully-loaded cost of all that time spent on keeping infrastructure running rather than building things, and the number gets uncomfortable quickly.
Opportunity cost: What are those people not building while they’re managing hardware? This one is genuinely hard to quantify, but it’s real.
Now compare that to something like Azure SQL. No OS to patch. No storage to provision. No failover cluster to maintain. High availability is built in. Backups happen automatically. Geo-replication is a few clicks. It’s just a database that works, and the people who would have maintained the old stack are free to do something else.
The cloud bill is visible and easy to attack in a budget meeting. On-prem cost is spread across departments, partially invisible, and very easy to undercount when someone is putting together the business case for repatriation.
So When Does Going Back Actually Make Sense?
I’m not saying never. There are real, valid reasons to move workloads back on-prem, and I’d rather be honest about them than pretend the cloud is always the right answer.
Data sovereignty and compliance: If your regulators require data to stay within a specific physical boundary, and your cloud provider can’t guarantee that to the satisfaction of your legal team, you don’t have a lot of wiggle room. This is a genuine driver in parts of Europe right now, especially in sectors like finance, healthcare, and government. GDPR interpretation varies by country, and some regulators are taking a harder line than others on what “processing in the EU” actually means when the parent company of the cloud provider is American.
Predictable, constant workloads: If you have a system that runs at the same load around the clock, 365 days a year, with no burst and no flexibility needed, you are paying a premium for elasticity you never use. A reserved instance or savings plan helps, but it’s still a different pricing model than owning hardware outright. For genuinely stable, long-lived workloads, the math can flip.
Latency requirements: Some workloads need to be physically close to other systems or to end users in a way that cloud regions can’t fully address. Real-time processing, certain industrial applications, trading systems, these sometimes have hard latency requirements that make co-location or on-prem the only practical answer.
Strategic risk and vendor lock-in: Putting everything in Azure or AWS or GCP creates a concentration risk. Pricing changes, service deprecations, outages that take out entire regions, these are real events that have happened. Some organizations want a degree of independence that comes with owning their own infrastructure. That’s a reasonable position.
Regulatory or political pressure: We’re seeing this more and more, especially in Europe. Government agencies pushing organizations toward national cloud providers or on-prem solutions for reasons that go beyond pure technology decisions.
These are real reasons. “It feels cheaper” or “the cloud is expensive” on its own is not a real reason until you’ve done the full math, including all the costs I mentioned above.
If You’re Seriously Considering It, Here’s How to Approach It
Go into this conversation with data, not gut feelings. Here’s a framework I’d use to build the case properly, in either direction.
Step 1: Build an actual TCO model
Don’t use a back-of-the-envelope number. Build a proper spreadsheet that covers at minimum: hardware purchase and refresh cycle, all support and warranty contracts, power and cooling estimate (check with facilities), networking infrastructure, and fully-loaded staff cost for everyone who touches the infrastructure. Include the one-time migration cost too, because moving workloads back on-prem isn’t free. There’s hardware to procure, systems to set up, data to migrate, and testing to do before you cut over.
The Azure TCO (Total Cost of Ownership) Calculator at azure.microsoft.com/pricing/tco is a reasonable starting framework. Yes, Microsoft built it to show you why the cloud is cheaper, but the cost categories it uses are solid, and you can use the same structure to do a fair comparison. Fill it in honestly on both sides.
Step 2: Classify your workloads before deciding anything
Not everything should get the same answer. Take your workload inventory and sort it:
| Workload Type | Cloud Fit | On-Prem Fit | Notes |
|---|---|---|---|
| Variable or bursty compute | Strong | Weak | Cloud’s elasticity is worth paying for here |
| PaaS databases like Azure SQL | Strong | Weaker | You lose a lot of managed service value on-prem |
| Steady-state, compliance-sensitive | Neutral | Strong | Best candidate for repatriation |
| Dev and test environments | Strong | Weak | Pay only when you use it, shut it down when you don’t |
| Legacy apps that can’t be containerized | Weak | Neutral | Might actually be easier on-prem depending on dependencies |
| High-throughput, low-latency data processing | Neutral | Strong | Depends heavily on workload characteristics |
The answer is almost never “move everything back”. It’s usually “move these specific workloads back, and leave the rest where it is”. Hybrid is the boring answer that is often the correct one.
Step 3: Do an honest skills audit
This is what Alexander Arvidsson’s original post was really about, and it’s the part that tends to disappear in boardroom discussions because nobody wants to admit they don’t know how to do something anymore.
When was the last time your team configured a Windows Failover Cluster from scratch? Dealt with a SAN that was behaving strangely? Set up Kerberos constrained delegation? Figured out why a NIC was dropping packets? Managed Active Directory at scale, not just as a sideshow to Entra ID, but as the actual backbone of your authentication infrastructure?
These skills atrophy. Fast. And the people who had them ten years ago either moved to cloud roles or left the industry entirely. Hiring someone who still has deep on-prem infrastructure experience is genuinely hard right now, and they know it, which means they’re expensive.
If you’re planning a serious repatriation effort, you need to be honest about what your team can actually do today, not what they could do five years ago. Build a skills matrix. Map it against what a new on-prem environment will actually require. Then figure out how you’re going to close the gaps, through training, through hiring, or through bringing in consultants for the initial build. And budget for all of it, because the consulting bills for “we didn’t know we needed to know that” are brutal.
Step 4: Plan for operational maturity from day one
If you do move workloads back on-prem, the single biggest mistake you can make is slipping back into the old way of doing things. No documentation, changes made directly on servers, configurations that live in someone’s memory.
Use PowerShell DSC v3 for configuration enforcement. If a server drifts from its desired state, you want to know about it, and ideally have it corrected automatically. Set up monitoring from the start, not as an afterthought. Tools like Grafana with a Prometheus or InfluxDB backend work well on-prem and give you the kind of visibility you’d expect from cloud-native monitoring.
Use Terraform with a local provider, or Bicep on Azure Stack HCI if you want to keep your existing IaC muscle memory. If you’re on VMware or Hyper-V, there are Terraform providers for both. Write your infrastructure the same way you would for the cloud: declared, versioned, reviewed, deployed through a pipeline.
Build approval workflows for infrastructure changes. Use Azure DevOps or GitHub Actions with a self-hosted runner if you want to keep the same toolchain. The point is that no change to infrastructure should happen without a record of who approved it and why.
Set up your patching strategy before you need it. On-prem means you own the patching now. Windows Server Update Services, or a third-party patch management tool if you have a mixed environment. Define your maintenance windows, your rollback procedures, and your testing process before you’re sitting in front of a server that just had a bad update applied at 11 PM.
Step 5: Think about what happens when things go wrong
Cloud providers invest billions in redundancy that most organizations couldn’t replicate on-prem even if they wanted to. If you’re going back to on-prem, you need a real disaster recovery plan.
Think through: what happens if a server dies? What’s the RTO and RPO you’re committing to? Do you have spare hardware, or are you waiting for a delivery? Where are your backups, and have you actually tested restoring from them recently? If your datacenter floods or burns, what’s the plan?
These aren’t reasons not to go on-prem, they’re just questions you have to answer before you commit. The cloud made a lot of this invisible. On-prem makes it your problem again.
The Real Question
The conversation about going back to on-prem is really a conversation about control, cost, and risk. All of those are legitimate things to optimize for, and the cloud isn’t automatically the right answer for every organization in every situation.
But on-prem is a location, not a solution. The problems that made it painful before, configuration drift, undocumented systems, hidden maintenance costs, knowledge that only lives in one person’s head, those don’t disappear just because you own the hardware. You have to solve for them deliberately, and that takes real effort and real discipline.
The cloud forced a lot of organizations to think more carefully about how they run infrastructure: to write it down, version it, automate it, and treat it like something that needs to be managed properly rather than a collection of servers that someone set up once and has been maintaining ever since. That mindset is the actual lesson worth keeping, wherever your workloads end up running.
If the numbers make sense, if the compliance requirements demand it, if the skills are there or you’re willing to invest in them: go back. Just don’t go back to the way things were. Go back smarter.

