Continuous Deployment at Upflow

Inside Upflow

Jean-Christophe Delmas

Nov 3, 2025

Summary

Introduction Build and deployment automation Automated testing Infrastructure as Code Blue/Green deployment Simple architecture Quality Assurance (QA)Monitoring & Alerting Rollbacks Root Cause Analysis (RCA)Calculated risk-taking Conclusion

Introduction

At Upflow, we're a small team of ten engineers, so efficiency is crucial. One key aspect of our productivity is continuous deployment: as soon as an engineer merges a pull request, the code goes straight to production with no manual steps required. While this approach may surprise or concern new team members, it works remarkably well in practice.

On average, we deploy between 5 and 10 times per day. This brings many benefits:

Urgent improvements and fixes reach users quickly
Smaller updates make it easier to troubleshoot specific issues and roll back if needed. For example, when we correlate an issue with a specific deployment, smaller changes make it easier to identify the problematic code.
Continuous deployment forces us to automate as much as possible, saving us time.

In this article, we'll share how we successfully implemented continuous deployment at Upflow and how we tackled some of the biggest challenges, including:

Preventing regressions in production
Allowing product managers and designers to test new features effectively
Encouraging engineers to make necessary but risky changes

Build and deployment automation

We apply the principles of trunk-based development to our branching strategy. We have two types of branches:

The main branch: all code merged into this branch is deployed to production and other environments, such as staging or demo.
Short-lived feature branches: used to review and validate code before merging to the main branch.

Once a feature branch is merged to the main branch, it triggers an automated pipeline that builds, tests, and deploys the code.

Here's a simplified representation of our pipeline:

Prepare: Install dependencies and run code generation
Build: Transpile TypeScript code into JavaScript
Check: Type check code and run quality checks such as ESLint and Prettier
Unit / integration tests: Run a suite of tests to verify the code works as expected
Build & upload Docker image: Build a Docker image and upload it to our artifact registry. This image is then used to deploy the application to Cloud Run (the platform we use to run our application)
Deploy to staging: Deploy the new image to our staging environment
End-to-end (E2E) tests: Run a suite of end-to-end tests with Cypress against the staging environment to ensure the application runs as expected in a production-like environment
Deploy to production: Deploy to production

If the main branch pipeline fails, production deployments are blocked until we fix it. To prevent this, we run a similar pipeline on each feature branch before merging to the main branch.

The key differences are:

End-to-end tests run in a local Docker environment, not in Cloud Run
The Docker image isn't uploaded to the registry
No deployments occur

Automated testing

Manually testing software is time-consuming, especially for complex products like ours. Given how often we deploy, manually checking each version for regressions isn't feasible. We also don’t run manual tests in staging before deploying to production because we want our pipeline to be fully automated. A manual step would create a bottleneck, reduce our deployment frequency, and eliminate the benefits that come with it.

That's why automated testing is crucial for us. Our test suite allows us to modify and deploy code to production smoothly with minimal local manual testing. Every time an engineer implements a new feature or bug fix, they must include a comprehensive test suite to prevent future regressions.

The following principles guide our testing strategy to prevent regressions.

Write integration and end-to-end tests, in addition to unit tests

Unit tests validate individual code blocks in isolation without considering how components interact. While useful, they alone cannot prevent all regressions since many issues only emerge from the interaction between components. Furthermore, they don't test interactions with the database, yet many bugs stem from unexpected SQL query behavior.

This is why we heavily invest in integration tests and end-to-end tests alongside our unit tests.

Favor black-box testing

To prevent regressions in production, we primarily use black-box testing. This approach decouples tests from implementation details, making them easier to read and maintain while protecting us from significant changes to the underlying code. In contrast, tests that are tightly coupled to implementation details typically need updates during refactoring, reducing their effectiveness in preventing regressions.

Mimicking the production environment

We strive to run our tests in an environment that closely resembles production. Integration tests use actual Postgres and Redis instances to ensure realistic behavior. For external services like Google Storage, we implement realistic mocks that accurately simulate the behavior of the real service.

Infrastructure as Code

To ensure that successful end-to-end tests in staging will behave the same in production, we strive to make our staging environment as similar as possible to production.

To this end, we use Terraform, an Infrastructure as Code tool, to manage our infrastructure. It allows us to ensure that each environment is configured as expected and to minimize the differences between them.

Blue/Green deployment

Despite our efforts to make staging and production as similar as possible, they can't be 100% identical. For example, we can't use the same secrets in staging, and some features, such as payments, must run in test mode.

As a result, a Docker image that runs successfully in staging may fail to start in production. If we routed all traffic to crashed instances, it would cause downtime.

To prevent this, we use Blue/Green deployment. Cloud Run lets us deploy containers with a new code revision without routing traffic to them immediately. A startup probe checks if new instances are running properly. If they fail, no traffic reaches them, and users aren't affected. If they succeed, Cloud Run routes traffic to the new instances, and users receive the update.

Simple architecture

At Upflow, we value simplicity. We always favor the simplest solution unless we're certain a more complex one is worthwhile.

For example, we chose a monolithic architecture over microservices based on this principle, a decision that's particularly valuable for our deployment pipeline. With microservices, running automated tests across the entire infrastructure would be more difficult, and we'd need extra work to manage breaking changes between service APIs.

Quality Assurance (QA)

At Upflow, software engineers are responsible for preventing regressions in production. They do this primarily through automated testing and occasionally through manual testing in their local environment.

However, for new features, we need validation from the product manager and designer to ensure the code behaves as expected. Given our branching strategy and automated pipeline, they can't test new features in staging before we deploy to production.

We could deploy code from feature branches to make it available to project managers and designers. However, this would require engineers to keep all code in a long-lived branch until the feature is ready for users. This approach would result in large code changes being deployed to production all at once, which would undermine the benefits of small, frequent deployments discussed in the introduction. When working on a large feature, we prefer to deploy code to production progressively.

To do this, we use feature toggles. This allows us to deploy code for unfinished features to production without affecting users. Once the feature is ready for QA, we enable the feature toggle in a test account so the product manager and designer can test it before releasing it to users.

We use the same feature toggle to release the feature in beta to a subset of users. This lets us validate it more thoroughly and detect potential scalability or reliability issues before the full release.

Pre-merge design QA

Feature toggles are valuable, but they can become tedious if overused, so we avoid them for minor changes. The challenge is that some small UI changes still need our designer's approval before going live.

To address this without feature toggles, we use Storybook and Chromatic to provide UI previews from feature branches. This lets our designer verify changes before they're merged into the main branch.

Monitoring & Alerting

As discussed, we have a large automated test suite to prevent bugs from reaching production. However, automated testing alone isn't enough:

Test coverage can't be perfect.
Tests aren't suitable for detecting certain problems, such as performance, scalability, or security issues.

That’s why we also invest heavily in monitoring and alerting to detect problems as soon as they occur in production, whether after a deployment or for other reasons. It allows us to be proactive and fix problems before our users complain about them.

We use Datadog to monitor metrics such as CPU usage, memory usage, and response times, as well as Sentry to track errors. When we detect significant problems in production, such as high CPU usage, new errors, or frequent errors, we receive immediate Slack alerts so we can intervene quickly.

Rollbacks

When a problem in production stems from a recent deployment, we can roll back the code to the previous version in just a few clicks. However, we rarely do so; it happens only a few times per year despite our high deployment frequency.

Root Cause Analysis (RCA)

None of the processes described here is set in stone. They have evolved and will continue to do so as the product grows more complex and new reliability requirements emerge.

When a major issue occurs in production, we conduct a systematic Root Cause Analysis (RCA). Through this analysis, we identify the causes that led to the problem and determine how to prevent similar issues in the future.

For example, early on at Upflow, we didn't invest in end-to-end tests because we wanted to iterate quickly and had only a few users. However, after experiencing several critical bugs, we decided to write end-to-end tests to cover the most critical parts of the product.

Calculated risk-taking

Continuous deployment isn't just about processes and automation; it's also about mindset and culture. Despite the safety nets discussed in this article, deploying risky changes to production can still be intimidating for engineers. For instance, an engineer might avoid implementing valuable refactoring out of fear of breaking something that hasn't been properly tested.

This mindset is particularly harmful in the long term. When projects intended to improve engineer productivity are abandoned due to fear of risk, the consequences are permanent. In contrast, the consequences of failed deployments tend to be short-lived, provided they're neither frequent nor impactful.

That's why cultivating a culture of calculated risk-taking is important to us. Engineers should feel encouraged to take risks, knowing they won't be blamed if a problem occurs in production.

Conclusion

This article covered the solutions we implemented at Upflow to enable continuous deployment: trunk-based development, automated testing, feature toggles, infrastructure as code, monitoring, and root cause analysis.

However, nothing is set in stone—new needs will emerge as the product grows more complex. Here are some solutions we'll likely work on to strengthen our deployment process:

Canary releases: We use feature toggles to release new features to a subset of users. However, new code revisions always deploy to all users at once. To minimize the impact of potential regressions and detect issues before they affect everyone, we could deploy new code revisions to a subset of users first. If everything runs smoothly, we'd then route all traffic to the new revision.
Visual testing: We've implemented front-end unit tests and end-to-end testing to catch regressions in our front-end code. However, these tests don't effectively detect visual regressions from refactoring or dependency upgrades. Visual testing would reduce manual testing needs and prevent visual regressions.