Lessons learned about Software Quality (or delivery?)

Below are a few statements/assumptions that would help me explain the idea and the goal of this post:

  • We build software to solve problems for our customers or for people in general
  • It is important that the software is built and delivered in a reasonably short time
  • It is important that the software is built and delivered with a reasonable quality
  • There are lots of methods and ways of working to develop software, and there is no single way that is correct for all organisations
  • It is difficult to deliver software with both speed and quality

The goal of this post is to share an approach or a combination of approaches I learned in my career to deliver software products faster and with higher quality. There is nothing I invented here. It is merely a summary of methods suggested and tested by companies, individuals through books and frameworks, and applied in a few companies I worked with.

I do not suggest considering this as a delivery/quality strategy. A delivery/quality strategy needs to be tailored to each organisation based on observations and analysis of existing methods, processes, and behaviours. However, I do like to think of it as a mental model that I refer to when building a software delivery and quality strategy.

One additional thing I try to highlight in this post is a common misconception about “Testing” vs “Quality”, and the resort towards End end testing (which is not an incorrect thing) when things go wrong.

Please reach out to me to further discuss and challenge this model. I would love to hear your feedback (A few ways to reach out can be found on my personal website).

Introduction

It is common to encounter lots of struggles with software delivery, and there are multiple reasons for that struggle, which would be different in each organisation. However, there is usually a common theme around quality problems.

Quality is difficult to define. This requires lots of communication with the team and the stakeholders to understand what is acceptable and what is not. Before starting with any change, it would be ideal to talk to people and get their opinion on the current product quality and system behaviour — different individuals would have different opinions based on their roles and their definition of quality.

A couple of items that would be helpful to identify and solve this struggle:

Background

Usually, quality struggles/problems are solved by addressing tests first. This is not incorrect. However, it is necessary to keep in mind that testing is a single component that helps improve quality. So the team needs to consider other components as well.

We want to deliver fast with quality, and we want to start somewhere. There isn’t enough time for discussions, retrospectives, and planning. Everybody wants to get moving fast, and nothing is wrong with that. So the team usually proceeds with increasing the test coverage, starting with unit tests, and ending with E2E tests. That is a great thing to do, and I highly encourage it. But before getting into the details of this approach, I would like to talk a bit about End to End tests.

90% of the people I know in the software development/testing domain say that E2E tests are not good. They’re lengthy, brittle, slow, fragile, flakey, and they cost a lot of money. My personal experience aligns with this description, and I think these tests are usually abandoned within a year or two, depending on the speed of the organisation in reassessing its quality processes (or overall development — and delivery — processes) and adopting new ways of working that could potentially make that pain go away, maybe with some trade-offs.

Possibly it could be understood from the previous paragraph that I am suggesting to avoid writing E2E tests since they’re abandoned anyways after a certain period of time. In fact, I am not. I am proposing to go back to the basics, take a pragmatic approach, minimize the amount of E2E tests to lower the cost and address the quality problem differently.

Let us get into the details.

Why do software companies/individuals still create or push towards creating these E2E tests?

Below are a few valid reasons that would push people in that direction:

  • E2E tests typically simulate real user scenarios. As focusing on the user is what every organisation aims for, this makes E2E tests a super good idea.
  • Encountering regression every time a feature is released to production
  • Manual testing before every release is time-consuming and we should automate it
  • E2E tests help exercise the application with all of its parts connected together, which would identify bugs that the component tests did not

In addition to that, we like to live a stress-free life (or lower the stress levels in our lives). Adding E2E tests would provide confidence before releasing to production, which would give stakeholders some peace of mind.

Usually, there is a big fear of taking out E2E tests.

Why should software companies/individuals limit the amount of E2E tests created?

I believe that putting the end-user first, automating the regression suite, preventing bugs from slipping to production (as much as possible), and aiming to live a stress-free life are all rightful and great choices.

Deciding to solve these issues through adding E2E tests is easy because the team could identify the user scenarios, then automate them through browser simulation. Usually, the team is under lots of pressure to fix/improve the system quality, and picking up E2E would look like the fastest way to go. But as with any decisions, other factors need to be taken into consideration.

Simply the E2E tests are brittle, expensive to write, time-consuming to run, and difficult to maintain. They’re fragile, and small changes to the application would break these tests.

Using this approach to address quality problems would become quickly a problem itself, forming an ice-cream cone:

We should aim for the famous testing triangle instead:

In this approach, the E2E tests are the second line of defense, and they cover dependencies that might have been overlooked. So they’re good to have, but they do not solve the primary problem. Having E2E in place does not mean the quality problem is solved.

Even with a reasonable amount of unit tests (and possibly some integration tests), a long-running suite of E2E tests, and a very big sheet of manual tests, regression bugs might still be encountered in production — More on this in the following section (under Code → Automation → E2E). What can we do?

These questions could be a hint that it is time to look at the overall approach regarding delivering a high-quality software.

A different approach to quality

Notice here that I mentioned quality and not testing, as these are 2 different things. Testing is just one of the activities that would lead to a better quality (it could be argued whether it is the main activity or not — we will not get into that debate).

In this section, we will answer some of the previous questions indirectly (putting end-user first, stop bugs from going to productions, etc.).

Testing should not happen before a release. We should be constantly testing before, during, and after the release. I like the following visualisation from Dan Ashby about continuous testing:

But how to do that practically, there are lots of work that need to happen, and all teams are limited on time and resources. This is not easy and requires discipline and adaptability:

Planning / Grooming

  • Adjust all user stories to be granular and deployable
  • Question / Challenge all feature requests (Why do we need it? Is that the correct thing to implement?)
  • Discuss acceptance criteria on each user story
  • Discuss the testing of the user story (Does it require testing on the UI level, would integration testing or unit testing be sufficient?)
  • Discuss the impact on other areas in the software and deployment risks
  • Discuss the possibility of using feature flags

Most of the time, the product already exists, we cannot just abandon the existing manual tests (especially if there isn’t enough coverage at another level). As the team discusses the impact on other areas early on, this would give a hint whether any further manual regression testing is needed. This is mainly a risk assessment.

Branch

  • Reassess the existing branching strategy and question it
  • Discuss if the branching method is helping with the development of the granular user stories and releasing them
  • Discuss if the branching method is slowing down the developer or the production release?

It might seem that branching is something we set up once and forget about it, but in fact, it is good to assess it every now and again. As the development process evolves, it might be needed to adjust the branching strategy accordingly. For example, it does not make lots of sense to have a release branch if we are releasing continuously, or having 2 main branches, etc.

Code

  • Pairing between the tester and the developer before starting with the development could be very useful (Test First approach — should not take more than 15 minutes per user story)

Go over acceptance criteria

Discuss cases and variations on the acceptance criteria

Write down the variations in a one liner on the Jira ticket (or as a test if possible)

Discuss the technical implementation and the areas that might be affected

Discuss how to deploy the user story (can the backend go first? can it be tested separately? can it be released to production without affecting anything else in the system?)

Re-visit what needs to be automated and at which level — good to have a high level agreement on who is implementing what

Decide whether the item being developed requires to be tested by someone else on local before having it merged to the main branch

  • Code review is very important — giving enough time for the review helps identify issues at a very early stage of the process — if the review is not being paired on then it could be beneficial to run the latest code on local and give it some exploratory testing as well
  • Needless to say, create unit tests, and exercise their failure by changing the input and/or the actual code temporarily
  • Test locally, whether by the tester or another engineer. This is much better than testing in the test environment (this will show failures early on in the process, where the written solution is still fresh) — It is likely that another engineer or tester might not be needed based on the pairing done early on
  • Manually test what is implemented locally (exploratory visual testing) — yes, open that app and make sure it’s behaving as expected
  • Automation

E2E — Browser tests

Cover the main business cases (e.g. if the user story suggests implementing a field that allows setting up a value, then the E2E for it would be to set up a correct value, save then verify the value got saved — we do not test a wrong value at this level, or if a button got enabled or disabled)

What about other cases that are not covered (e.g. a button should be disabled)? These variations should be covered at a different level if possible. If that was not possible, then it should be sufficient to be covered through manual exploratory testing during development — What if it got broken in production later on? if it’s a primary scenario it would be covered by tests and wouldn’t go to production. If it’s a secondary scenario then it’s alright to get broken in production, as long as we have the means to fix it quickly and get it to work (Release method is important here) — but maybe that’s not an option for the business. In this case, it is good to look at visual testing methods such as snapshot testing that get executed maybe on a daily basis, where a comparison between current and previous UI happens. Note that having a bug free software is not possible even with lots of tests in place.

The verification should be done in the fastest way possible. For example, we should not open another page to verify that the setting got saved correctly if we can verify faster that network request got submitted with the correct value (the other page should be exercises in its own test where we verify the values saved are retrieved correctly)

Setting up the data and accessing the feature being tested should happen in the fastest possible way. For example, if we need to click multiple button to get to the field we’re trying to update, it would be better to directly navigate to that page and set up the field if that’s faster (clicking the buttons to get to the field should be exercised in a different scenario) — In case we needed to create user to be able to get a field visible to test, the better create the user in the fastest possible way, whether through API, insertion in the DB, or even mocking that record if that’s possible

Integration tests

API tests should be at this level

This could also between classes or module in the service we’re testing

Could include contract tests, which would help make other integration and E2E tests easier and more performant

Unit tests

Unit testing helps find problems early in the development cycle, which could be bugs or flaws in the specifications

Unit tests provide confidence when refactoring a piece of code, upgrading a library or extending a feature (e.g. in regression testing), and allow the developer to make sure the module is still working correctly

Unit tests make the integrations tests become much easier, as we’re testing parts of the program then testing the sum of these parts

Unit tests allow coverage of inputs, outputs and error conditions, which would help in covering more edge case

Merge & Build

  • Good to keep an eye on the merges in case multiple people were working around the same area, test areas of conflict
  • The build is similar to the branch, is something good to keep assessing and finding ways to enhance it. Is the build slow? are there steps in the build that we can get rid of?

Release & Deployment

  • It is very important to separate these 2 components. This is something to be discussed on a user story basis, where the team defines whether a feature flag is needed or not
  • The release & deployment is the cornerstone of this proposed model. It is proposed to use a continuous deployment strategy for the following reasons:

Minimize the risk. As we add more code (features/fixes) to a single release, the risk will increase.

Faster time to production, which means a faster feedback loop, whether from customer if it was released, or from “Testing in production” if it was a hidden feature

Team experiences less of the burnout associated with the deployment or low-value activities

This would give a faster delivery cycle with better quality in general. Some concerns would be that bugs are still being introduced to production, even with an extensive manual testing regression executed before a release, so how would this be avoided with multiple deployments?

Mainly this should all be covered through the planing to the development, where the team assessed the area of impact. If a large piece of code is going to be deployed/released, this is a hint that it needs to be further groomed and split into multiple deployable stories (it is always worth it to invest this time early on in the process)

Even if there is an impact, it’s easier to regress an area impacted by a single change than by multiple changes

Alongside this comes the culture of reaction to failure. Problems will arise, and the ability to solve these items quickly is key to success. So there should be a good monitoring and “testing in production” practice to respond fast

I know it seems a bit counter-intuitive that releasing more frequently reduces risk. However, what happens, in reality, is that this will show weak points in the process and the system, and helps strengthen them.

To summarize, adopting this process that’s chosen by some of the most successful companies in the world would help with:

  • Reducing the deployment risk — doing smaller changes means there’s less probability something will go wrong, and it’s easier to fix should a problem appear
  • Faster feedback loop — the end-user will be able to reflect faster on the feature if it was visible. Alternatively, testing in production with some juicy data would be of great benefit
  • Less burnout doing a release and test preparation for user stories you might have forgotten

The following graph visualizes how we’re splitting the risk, increasing our speed, and making our response to the problem faster:

Testing in production

Feedback in production is one of the most powerful tools that would allow us to respond quickly to incidents

  • Implement Monitoring and alerting

Effective monitoring can help discover issues quickly (for example if the performance degraded after a release, we can respond quickly)

Monitoring info can help prioritise exploratory tests for future releases where the focus can be directed to highly used components

  • Logging

The log files would provide low-level details to diagnose root cause of an issue on prod

  • Exploratory testing in production is nice and helps the developer experience how the user is seeing the feature
  • Automated E2E smoke testing in production is also a great idea, as a third line of defense

Final thoughts

This process needs to be flexible and adjustable to meet both business and team needs. Quality is a very wide and subjective term. It is important to understand what it means to our product stakeholders. However, delivering quality software products usually can be done through a combination of culture, architecture, tests, and processes.

  • Culture shift

Engineers should be ready to take responsibility of code, quality, deployment, automation, infrastructure

  • App architecture

Going with a distributed architecture, and using microservices will enable frequent updates on each service independently (arguably)

  • Automation Testing

Automated tests with high quality are necessary in this approach in order to be confident when releasing.

  • Monitoring

A solid monitoring tooling should be in place

Have we catered for the items we discussed earlier — putting the end-user first, automating the regression suite, preventing bugs from slipping to production (as much as possible), and aiming to live a stress-free life?

  • We put the end-user first, we are giving the end-user a very fast access to newly implemented items, with a low risk of getting things broken (we are also taking the user opinion into consideration as well through the fast feedback)
  • We have automated the manual regression suite — this might take a long long long time, which we might not have, and that would cause other problems. We came with an approach to automate the testing of what is impacted and what is important. We also suggested fast feedback, fast fix loop. Isn’t that better?
  • Preventing bugs from slipping to production (as much as possible) — we are suggesting a proactive approach to aggressively refine, plan, create tests and small chunks of code, and pair early on in the process to prevent issues from slipping to production
  • Aiming to live a stress-free life — I don’t think this exists, but we managed our stress over time through deploying low-risk items that are well tested individually

Does that mean we will never have a bug on production?

Reference

https://martinfowler.com/bliki/TestPyramid.html

https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html

https://www.martinfowler.com/bliki/ContinuousDelivery.html

https://www.martinfowler.com/delivery.html

https://dzone.com/articles/continuous-delivery-riskier

https://danashby.co.uk/2016/10/19/continuous-testing-in-devops/

https://leanpub.com/testingindevops

Software engineer (JS | REACT | Node | AWS | Test Automation)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store