Maya Reddy | Software Engineer, Ad Formats
Every day at Pinterest, we help our advertising partners reach Pinners through different ad formats, and we provide them with metrics and analytics on how their ads are performing. Maintaining our partners’ trust and delivering value is critical, so the metrics we expose, which are ultimately based on logs containing client events, must be correct.
In early 2017, we had mobile releases every two weeks, and we would do manual QA to verify logging behavior before each release. Soon, client teams began investing in integration test frameworks; integration tests in this context are automated tests that go through the UI flow, beginning from logging in to the app and performing behaviors such as tapping on Pins, saving Pins, etc. At the end of the automated actions, the tests verify the logs match the UI actions.
Why we created the ads logging integration test suite
- Catch and fix bugs earlier in the release cycle — Previously, our reliance on manual QA meant the team might not catch bugs until late in the release cycle, and we were often fixing P1 bugs shortly before ship deadlines.
- Ship code with confidence — Automated tests allow us to run more test cases and be even more confident that new code has not introduced bugs or issues.
- Re-allocate manual testing — Shifting repeated QA tasks to integration tests means manual QA can do more exploratory testing and testing of new features.
How to make sure we test the correct format
We needed a way to guarantee that a certain ad would show up in a user’s feed so that we could run tests on the desired ad format. In the example below, we need an app-install ad to show up.
Example test case:
We use a handy tool that inserts promoted Pins at the API level based on JSON data. We have many different test accounts, each with a specific Pin inserted into that account. Our automated tests can then log into whichever account is needed for that format-specific test!
How we structure the tests
Our ads appear in many places in the Pinterest app. They can appear in the home feed, when a user performs a search, etc. Sometimes, a bug can affect just one surface, so we need to have tests on all the different surfaces. This increases the number of tests, but it makes it easy to split them up logically. We have one base method that performs all the UI actions and logging checks, and we have a test method for each surface that navigates to the surface before calling into the base method.
Below is a test being run on the home feed, search, and related Pins feed. The test is checking that impressions end properly when the contextual menu shows up.
As ads are launched on new surfaces, we need to certify the ad behavior and logging works as expected. For example, in a user’s board, there’s a new section called “More ideas” where ads can now appear, and we want to be able to easily replicate existing tests to run on this new surface. Ideally, teams that own surfaces should be able to own their subset of tests as well, from test creation through maintenance. Since most of the tests are written in a consistent structure, we created a Python script that generates tests for new surfaces by copying and modifying existing tests.
As we added more formats and their associated test coverage, the test suite became larger and larger. Our available on-premise capacity restricted how frequently we could run the full test suite. After some iteration, our current process is to run the test suite on a nightly basis when there’s more capacity available. However, we also picked a small subset of tests to run per commit in order to get an early signal on bugs. We always make sure that the full test suite passes on the release candidate before submitting the build to the App Store/Play Store. We use the following set up for our integration environment:
- Buildkite — We set up Buildkite pipelines for the integration tests to run on remote machines.
- Tests finish faster since they run in parallel.
- Developers can work on other features while tests run.
- We can schedule builds so that tests run nightly or set up integrations so that tests run per commit.
- Metro — An in-house tool used to analyze test results and trends. We’re able to see the success rate over time for a specific pipeline as well as for individual tests.
How we maintain the tests
Once a test is written, it’s added to the staging Buildkite pipeline. The new test then runs over a few days, and we fix any issues that arise. After it’s stable, we add it to the production Buildkite pipeline. On top of the upfront cost of writing tests, there’s some effort required to maintain them.
- Sometimes a feature change will break integration tests. For example, a back button might be changed into a close button. Since we rely on accessibility labels, our tests would then break. In this case, we’d file a bug to the appropriate team so that they can fix the issue.
- Other times, there are issues with how we’ve written tests that become apparent over time. In those cases, we update the test framework/setup itself.
Automated integration tests allow us to have confidence in the metrics we report to our advertisers. We are continuously evolving the process to make our tests easier to run and maintain.
Acknowledgments: Thanks to Wendy Lu, Matt Mo, Joseph Smalls-Mantey, Jordan Maler, Tony Lu, Jerry Marino, Freddy Montano, the Ad Formats iOS team, Metrics Quality & Test Tools team, the iOS Core Platform team, and everyone else who helped out with this effort!