Datadog Gold Partner logo

AWS vs Azure vs Google Cloud For SaaS Startups — Part 1

Article:AWS vs Azure vs Google Cloud For SaaS Startups-Part 1_1

By Charles Chen.Jun 27, 2022

I’ve spent the last year working with startups from one company in life sciences that closed a $100m Series C to a YCombinator startup that increased their ARR by $1m in 1 month to a startup that was funded on a pitch deck.

Through this journey, I’ve had the opportunity to explore and build on all three major cloud providers and learned a few lessons along the way, and formed my own preference and recommendations for early stage startups seeking the right platform to build on.

Every company and every domain is different, but there are objective differences in the offerings from the three major cloud providers and picking the right one can significantly improve your startup’s ability to iterate in the search for product market fit.

My Rankings for SaaS Startups

When looking at these platforms from a 10,000 foot view, there is not enough granularity to be able to spot the differences and it’s easy to believe that there is parity between the platforms. There are countless diagrams and posts that discuss how the services and offerings map between these providers, but this is not one of those posts.

Having worked hands-on with these platforms, the reality is that there are a number of ground-level differences between them that can have a long term effect on the speed and trajectory of a startup. I’ve also had a chance now to work with ex-Amazon engineers and gotten insight into Amazon as a technology organization and how that affects AWS as a product.

With that in mind, here is my ranking for the three major cloud platforms for SaaS startups:

  1. Google Cloud (GCP)
  2. Microsoft Azure
  3. Amazon Web Services (AWS)

We’ll examine the platforms on three key facets to help decide which cloud is right for a given team.

  1. Financial Benefits
  2. Ease of Use
  3. Core Strength of Each Platform (Part II)

Let’s explore why and when startups should consider each of these platforms.

Financial Benefits

Both Google Cloud and Microsoft Azure provide hefty and generous benefits for startups.

Google provides up to $100k in credits for two years ($200k total) as a part of their Google for Startups Cloud Program.

2._Charles article P1
Being in third place from a marketshare perspective, Google provides the most generous financial benefits to attract SaaS startups.

Microsoft provides up to $150k in credits as part of their Microsoft for Startups program.

3._Charles article P1
Microsoft provides credits throughout the lifecycle of a startup that scales with the maturity and growth.

AWS also offers a startup program, but it comes in third with $100k (thanks to u/seijulala):

4._Charles article P1
AWS Activate provides startups less than 10 years old with $100k of credits.

The clear winner here is Google with a very generous $200k in credits with Microsoft not far behind. For early stage startups, not having to worry about thousands of dollars in spend can be a big lift that allows the startup to experiment.

I have also found that both Microsoft and Google have very good true free tiers for many services whereas it seems that with AWS, one has to be far more vigilant about usage of “free” tier services

Case in point is AWS Kendra which has a free tier that includes one index but allows you to create more than one index, even in the free tier. Imagine my surprise when I woke up to a $750 AWS bill one day:

5._Charles article P1
I find myself being a bit more vigilant with AWS’s “free” tier offerings than I am with Google or Azure. This is a surprise bill I received while experimenting with Kendra

In other words, many of the offerings in the AWS “free” tier behave more like a free trial whereas the free tier in Azure for Cognitive Services — for example — are completely free with simple constraints on the scale.

To be fair, the AWS support team reverted the charges after I got in touch with them, but it seems that it would be better if the free tier offerings were more explicit about constraining usage.

Ease of Use

For startups seeking product market fit, keeping it simple, stupid (KISS) can be critical for several reasons:

  1. There’s less risk and friction with experimentation and trying different approaches.
  2. It allows the team to adapt technology and architecture quickly with each iteration.
  3. Low number of choices makes it easier to evaluate and pick the right technology solution
  4. It makes it easier to onboard new engineers as you grow your team.

For this reason, especially for young teams that do not have the breadth and depth of engineering experience, I think Google again comes out on top with Microsoft close behind and AWS in a distant third.

In fact, I actually believe that startups that do not already have engineers with deep AWS experience are putting themselves at a disadvantage by picking AWS because of the platform complexity and lack of “coherence” and obvious defaults.

Resource Containers

One of the most clear differences between Google and Azure versus AWS is that both Google and Azure offer the concept of a resource container below the account.

In Google, these are “projects”. In Azure, these are “resource groups”. In AWS, there is only the account. This lack of a resource container below the account level actually creates friction for managing permissions and creating sandboxes for experimentation.

While it may seem like a trivial difference, it is much easier to manage access and control around these containers than it is to do so with accounts in AWS. When you’re trying something out like a Terraform or Pulumi IaC deployment, it is easier to be able to delete the entire resource container and delete everything with very little cost in terms of time and effort.

In contrast, the lack of a resource container below the account in AWS means that either one deletes the account or one needs to be very systematic about cleaning up when experimenting.

It’s true that CloudFormation stacks should — in theory — clean themselves up (this is not always the case in practice), but for startups in a highly experimental phase, it is often the case that the easiest way to experiment with some new component is to manually set it up in the web console rather than defaulting to CloudFormation or Terraform.

In addition, I have found that AWS is particularly bad about cleaning up CloudWatch logs when deployed using any AWS automation (CloudFormation or CDK) which creates a bit of anxiety that something more expensive may have been missed!

Adaptability and Nimbleness

Of the three, I would argue that AWS provides the lowest level of abstraction of the three platforms for any comparative platform behavior. For the experienced AWS engineer, this provides more control and fine tuning system interactions. But for startups unfamiliar with AWS, it creates complexity. In many cases, what could be done “out-of-the-box” on either Google or Azure requires writing and deploying a Lambda in addition to other layers (IAM, routing, networking) to get the desired behavior.

In fact, there is so much “connective complexity” with AWS that there are several layers of tools just to make AWS usable by wrapping common abstractions and patternsOff the top of my head — not including cross platform solutions like Terraform and Pulumi — there’s at least:

  1. CloudFormation
  2. Cloud Development Kit (CDK)
  3. Serverless Framework
  4. Serverless Application Model
  5. Copilot (not to be confused with GitHub Co-pilot)
  6. Amplify

While one may not think of the latter two as deployment tools, I think of them as ways to re-package the AWS infrastructure complexity to make it more approachable because otherwise, building solutions for the abstractions provided by Copilot and Amplify is an incredibly complex endeavor on AWS. Copilot and Amplify are effectively generating CloudFormation stacks that abstract the connective complexity of working with lower level paradigms in AWS.

Another example is working with managed Kubernetes on the three platforms. In my series experimenting with Dapr, I had the opportunity to work with Kubernetes on all three platforms and it was clear that AWS felt the most disjointed and required the usage of the third party tool ekstcl (on top of AWS CLI and kubectl) to make it usable. In contrast, both the Google CLI and Azure CLI felt more complete and cohesive when it came to interacting with the platforms’ managed Kubernetes functionality.

The low level of abstraction in AWS is like working in C++ compared to TypeScript for Google and C# for Azure. C++ provides much more power, control, and range compared to TypeScript or C#, but the flip side is that it also requires more depth of knowledge of the low level aspects of programming like memory management.

This complexity makes a team less adaptable and less nimble because even minor shifts in architecture require significant investment in changing the often convoluted deployment models made all the more difficult by the lack of resource containers for experimenting.

Lower Cognitive Load

Of the three cloud providers, Google has the least options when it comes to building any particular functionality while AWS seems to have the most options.

This is part of the AWS strategy to have a solution for every niche. But this also creates a lot of cognitive load when trying to figure out which platform solution to use for a particular task.

It also means that each of the solutions on AWS will have limits and gaps that are meant to be closed by some other solution and that creates forward risk because it is often hard to determine which of the limitations are significant until much further down the line. In contrast, the Google Cloud offerings feel more complete and well thought out (perhaps a late comer’s advantage).

For example, webhooks are now a ubiquitous way for building asynchronous system integrations. One very common use case when dealing with webhooks is the need to queue the incoming webhook message and promptly return an HTTP 200 to the source while consuming the payload at a later time, out-of-band.

On GCP, this is very elegantly done with Google Cloud Pub/Sub. While it’s true that AWS SQS and SNS can do the same, what is different is that Pub/Sub has a built-in HTTP push subscription model which just seems so logical. Rather than having to write a background worker to pull the messages or introduce another piece of infrastructure as is the case with SQS, consumption of the queue can be done as just another HTTP API endpoint with Pub/Sub taking care of the HTTP push.

Azure solves this same problem masterfully with the extensive bindings provided for Azure Functions. These bindings are the true stars of Azure and make connecting pieces of a cloud infrastructure together as simple as snapping Lego blocks together.

6._Charles article P1
An example showing how we can connect three discrete I/O endpoints just by using attributes.

In the example above, I’ve connected:

  1. An HTTP trigger input
  2. A long running workflow orchestration
  3. A real-time web-socket signaling channel to the caller

I’ve done so without having to worry about IaC nor learn a special deployment abstraction to connect these three channels. This is by far one of the best reasons for teams to consider Azure because it makes moving and processing data so easy.

Here is another example from my CovidCureID project that demonstrates this in action:

7._Charles article P1
Another example of how Azure Functions can “stack” I/O and move data around fluidly.

In this case, I’ve snapped together:

  1. An inbound trigger when a storage blob is updated.
  2. Two outbound queues that perform two different transformations on the incoming JSON.

In contrast, doing the same in AWS would require using either EventBridge (in addition to SQS and SNS) or writing and deploying Lambdas to forward to HTTP endpoints. Certainly, it is also possible to consume the messages in Lambdas as well, but then the next decision is which of the Lambda Trilogy patterns you choose to implement. This is not to mention that SNS and SQS are two distinct services in AWS whereas Google Pub/Sub and Azure Service Bus logically package these capabilities as a single service. The lesson is that the layers of complexity in AWS are not always apparent when looking at a path at the onset and this has a tendency towards what I can only describe as “sprawl” in AWS.

Deploying what seems like a simple solution to solve a simple problem of queuing webhook messages and processing them withan HTTP API endpoint in AWS would require at least 4 services (Lambda, API Gateway (before Lambda function URLs were released in April), SQS, and SNS) dozens of lines of CDK or Serverless Framework YAML (and likely hundreds of lines of raw CloudFormation 🤣)!

Ease of Onboarding

Being able to quickly onboard new engineers to a platform is critical for a startup as it reduces the drag created when getting a new resource ramped up as the team grows. Given an engineer with no experience with a given platform, my own observation is that it is far easier to onboard new engineers to Google and Azure than it is to AWS.

There are three reasons for this:

  1. GCP and Azure have far better, more cohesive documentation than AWS.
  2. GCP and Azure have more cohesive browser console interfaces than AWS (though the Amazon team has been working to close this gap).
  3. GCP and Azure have better free tier options for experimenting and letting new engineers play around in a sandboxed environment (discussed above)

On the first point, one thing I’ve noticed is that AWS and Microsoft tend to partition their documentation by language:

8._Charles article P1
You can see in this example with DynamoDB that the documentation is segregated by language. It’s not that all of the documentation is organized like this, but the majority of it is.

In contrast, Google’s Documentation is generally organized like this:

9._Charles article P1
Google Documentation for Cloud Firestore with all languages side-by-side

One of the key benefits of this side-by-side approach is that developers onboarding to a new language can link concepts to existing language knowledge by seeing examples side-by-side.

Another advantage is that for polyglot teams, you can see the SDKs side-by-side to determine which feels more ergonomic for the task.

One other really nice feature of the Google documentation is that it does the same for deployment approaches so it is also possible to compare deployment methodology (taken from the docs for Google Cloud Run):

10._Charles article P1
Note how the Google documentation allows templating of the deployment to your parameters.

That Google provides sample Terraform templates is a huge win for startups that are feeling their way towards product market fit.

On the second point — cohesive console interfaces — both Google and Azure have a common design language between the web consoles for each service that makes it sane to figure out how to interact with the service (whether you’re a fan of Azure’s console or not, at least it is consistent) after learning how to navigate through the console’s menus.

AWS, on the other hand, reflects the nature of its more disjointed teams. Until recently, Cognito was perhaps one of the most egregious offenders of this with a user interface seemingly designed in the mid 2000’s by some interns:

11._Charles article P1
The soon-to-be-retired Cognito interface. No comment…

All of these details are not necessarily obvious without being hands on with the platform and services, but having experienced all three first-hand building real-world solutions, it is clear that Google and Azure are likely going to be easier to use for most SaaS startups building APIs and web services.

In Part I, I’ve shared some of the objective ways in which I think Google and Azure are better options for early stage startups still seeking product market fit by providing more free credits and an easier to use platform.

These benefits can significantly alter the trajectory of a small startup by providing a team with more funding of course, but also more agility and velocity compared to starting on AWS. It’s not that startups cannot be successful on AWS — obviously, history has shown that is not the case — but that with GCP and Azure now providing competitive platforms with better ergonomics, choosing GCP or Azure could reduce the complexity and sprawl that is encountered with AWS which can accumulate and hamper startup velocity.

In Part II, I’ll examine what I think the core strengths are for each platform and when an early stage startup should pick one platform over another.

Subscribe if you’d like to be notified when it’s published!

The original article published on Medium.

Related Posts