It is a controlled experiment that compares and tests two variations of the same web page, called A and B, that are nearly identical except for one key difference, such as a headline, call-to-action, the color of a button, or the structure of a menu. By directly analyzing how users react to the versions they have accessed, we can study and understand what works best against the metrics and goals we set, so we can make informed, data-driven decisions about which changes improve conversions or engagement. That’s why A/B testing or A/B testing is one of the most widely used systems for testing the effectiveness of interventions and changes made to the site, especially useful for checking how well users actually like it when placed in front of two different options in terms of style, graphics and content. Let’s find out more about A/B Testing and learn how to set it up in the best way for our site, also learning how to minimize the impact of testing activity in Google Search by following the official suggestions of the search engine.
What is A/B testing
Also called bucket test or split test, the A/B test is a controlled experiment with two variants-named precisely A and B-that involves the development of two different versions of the same page for a single variable, to be submitted simultaneously to a sample of users, so as to have concrete data on which one outperforms the other in visitor liking and interactions.
A/B testing is thus a comparative testing methodology that is used to optimize web pages and digital marketing strategies because it provides informed data to understand user preferences and to improve the effectiveness of a piece of content or campaign by analyzing the response of the subject (a sample of the typical audience) with respect to either variable A or B to thereby determine which is the most effective based on specific metrics.
This process is critical because it can directly influence conversion rates, user engagement, and ultimately the success of a website or digital product.
What is the purpose of A/B testing and what goals does it serve?
Widely used in web analytics and a strategic resource for creating the perfect landing page, this tool falls under hypothesis testing or “2-sample hypothesis testing” in the field of statistics.
It is good practice to make use of A/B testing when we aim to maximize performance and when we want to make informed decisions about critical elements of our digital campaigns. Common situations may be when introducing new features, optimizing landing pages or improving direct email marketing campaigns. In addition, A/B testing becomes essential when we have two different theories on a marketing approach and wish to validate the best option before a full rollout.
The goal of the activity is as simple as it is strategic: to identify changes-even seemingly small ones-within a Web page that increase or maximize the outcome of an interest, such as the click-through rate for a banner ad, and that is why it is decisive in Web design and user experience study.
Concretely, A/B testing proves useful in a variety of circumstances: for example, it can be employed when we want to increase the conversion rate of a landing page or when we want to test the effectiveness of a call-to-action, but it is also effective for evaluating the impact of small changes to the design or content of a page, such as changes in titles, images or product descriptions. This is also true in SEO, where variations help us understand whether targeted interventions in titles, meta descriptions or content organization can influence user behavior and, consequently, search engine rankings.
The benefits of A/B testing
It is already clear from what has been written that A/B testing is a powerful and flexible optimization tool that, used correctly, can provide valuable insights and guide evidence-based decisions, improving the user experience and overall performance of a website even in the SEO context.
Let’s think about a landing page and something as seemingly insignificant as the color of the call-to-action button: is it better red or green? With an A/B test we can create two versions, so that one group of users will see red and the other green, and then answer questions such as “Which one favors more clicks? Which one brings in more sales?”
The key aspect is “informed decisions”: in today’s hyper-competitive environment, ignoring collected data would be like navigating without a compass or relying on chance and assumptions. Insights can be useful, but data do not lie. A/B testing allows us to sharpen our digital strategy, to sculpt the user experience to its purest state, one in which every element is optimized for our goals.
Even a small change can mean a significant increase in conversion rate, user engagement, and thus financial return.
Let’s also not forget that user behaviors evolve and the marketplace changes with dizzying speed: A/B testing is not a single experiment but an ongoing approach, a constant dialogue with your target audience. It is the very essence of adaptability: the message that works today may not be effective tomorrow, and what resonates with one segment of users may be ineffective with another.
Exploting the tests for the CRO
Going more into detail, if implemented correctly, the test allows us to experiment with individual elements of the site that we intend to vary and determine whether they are effective for our audience: the most practical and widespread function is the analysis of call to actions, elements that need to be more than designed to produce results.
The lack of effectiveness of the call to action can depend on various reasons, including changes in user behavior, shifts in the audience or the general way in which users interact with our website.
Making small changes and testing them is a critical process, which is part of CRO operations (conversion rate optimization) and involves the use of scientifically valid tests on the elements of user interaction on our site, with the aim of achieving a significant increase in performance at the end of all tests.
How to do an effective A/B test
In order to conduct a successful A/B test, it is essential to adhere to a few key steps. Initially, we define the goal of the test: for example, increase the click-through rate on a button, increase the time spent on the page or another of the KPIs, the performance indicators. Next, we identify the element to be tested and create two versions: version A, which usually corresponds to the current design (the control), and version B with the variation we want to test. Then, we randomly divide our audience into two groups, so that each one displays only one of the two versions. This process must be supported by analytical tools that collect data on user behavior for each version of the page. Once we have completed the test and collected sufficient data, we analyze the results to determine which of the two versions generated the best performance according to the established metrics.
It is critical that the A/B test is conducted in a controlled manner and that variables are kept to a minimum to ensure that the results are attributable to the changes made. In addition, it is important that the test is conducted over an appropriate period of time to collect a representative sample of data and reduce the impact of outliers or external events.
Through the use of A/B testing, we can then implement changes based on hard facts, not assumptions.This working methodology allows us to refine our digital strategies and tailor them to the needs and preferences of our target audience, an essential step in building satisfying user experiences and digital marketing campaigns that bring more results.
Steps to create an A/B test
The creation of a successful A/B test involves a search on several fronts, starting with the size and current status of the website.
In many cases, if we have small sites or with a small audience, it is not necessary to do this type of experiments, but even if we already know the best practices most common for our industry, this test can still indicate what are the useful changes to be made. Rather than launching into this operation, a small site would have better results (and with less effort) interviewing its customers, discovering weaknesses and working on optimizing the site based on the feedback received.
Different is the case of large sites (over 10,000 visitors per month and with hundreds of pages, says the article), because running the A/B tests can help us find out which version will give the best conversion among those of the sample. This activity becomes essential for even larger sites, where even a single element can lead to a great return in conversions.
What we need to do is to establish a general baseline of website traffic, which allows us to continue with evaluating the elements of the website and what we need to change to run an effective test.
Deciding which elements to test
We have reached a decisive stage, that is to decide which elements to vary and to submit to the test, a passage that in turn requires tests (and often) errors in order to obtain the right result.
For instance, if we believe that for our industry a red CTA is seen better than the blue we are currently using, we will change the color of the buttons during the A/B test and find out from the reactions of users which really converts best. The usefulness of the A/B test is that it allows us to find out what our users really respond to, rather than changing the site based on simple hypotheses.
This is an important distinction to make, because often the work of optimizing the site is based on assumptions (however realistic or studied), and unless we have spoken directly with customers we do not know what they are responding to. Instead, the A/B test turns the theory into a high precision tool: after a successful test, we will know what works best for our users.
Tests of usability
The user test is the most essential phase of the usability test, which in turn is a crucial tool in evaluating the usability of the site. In fact, the study of users and audience should always be part of an ongoing process, rather than being conducted randomly.
The purpose of this test is to understand in a more systematic way how people communicate with our site, and the usability test allows us to recognize if there are any difficulties encountered in the conversion funnel of users. The second step is to initiate a screening procedure for user analysis, which can have many goals or only one goal.
For example, the user screening can be designed to:
- Find out how users actually scan the page.
- Assess what is really attracting their attention.
- Address any deficiencies in existing content.
- Find out what are the weaknesses of the buttons and how to correct them.
Examples of A/B tests on the content
As we said earlier, the content is one of the possible subject of testing, especially if we are not sure what type and form may have the greatest impact on our audience.
For instance, if readers are accustomed to a longer text and are prone to interaction when they find it – as in the case of many scholars and people in academic fields – our goal is to produce a text that looks like an essay. However, if the audience is composed of more informal readers, it would be better to use shorter lines of text and paragraphs.
And so, we can do a test by making two pages – one with long, formal content, the other with short, more appealing text – to find out which offers the best results.
Or, we can study the effect of changes on font size, color or spacing, to find out if these factors affect readability and, possibly, understanding and liking the content.
Examples of split test on buttons
Evidence may also relate to seemingly minor details, such as the size of the CTA button: if users who click on the button are less than desired, or if people fill out the form by entering incorrect information, You can intervene to understand where the problem is and test different elements of the form, including the button.
For example, we can improve the written text of the form to get better and more accurate conversions; many experts in this process recommend providing users with a deadline for a request, with a message reporting to the user “We will reply to your message within 24 – 48 hours”, an intervention known as “setting user expectations”.
A small solution of this kind also develops trust, as it tells people that they will receive a guaranteed response (obviously to be respected) without fear that their request will end in an oblivion.
Varying brand communication
The way we communicate is as important as the content that is interpreted by the audience, recalls Brian Harnish from Search Engine Journal, and you “can’t know exactly what your audience will respond to without testing live variables” and it’s not possible to know in advance “what will elicit the reaction you want.”
That is why it is important to “test live variables and variations of those variables.”
One of the most important tests involves changing the message, particularly headlines, phrases in content, taglines, and call-to-action wording, because the communication may not be as effective as we want (and think).
Split tests can reveal things about our users that we may never have suspected and are, as mentioned, a very useful tool for figuring out if, how, and when to make changes to the site that can yield real improvements (including in terms of profitability).
The key to the success of such tests is to create a solid methodology: after planning and fine-tuning the final elements, we need to execute them and, most importantly, verify in the given time the response of users. Only then can we be (more) confident that we are making interventions that will work and yield the desired results.
Do not rely on chance for site changes: set goals first
It is quite obvious to say that changes on the site should not be made randomly, intervening non-strategically on individual elements and hoping for the best. By doing so, we reduce the chances of having concrete and measurable results, because we do not know to what to attribute the possible effects found.
Especially for large sites in very complex industries, it is not possible to draw appropriate conclusions about what to change and what not to change within the pages if we do not know how the target users interact with the Web site, and that is where A/B testing comes in.
Conversions don’t just happen, and getting real results-for example, real people buying our service or product-requires a significant investment of time, studying how users are performing on the site and making appropriate changes to stimulate their actions.
The brutal reality of digital marketing is this, says Harnish again: we have to “know what the data says about the changes before we can make the changes.” The whole raison d’être of such tests is to allow us to have concrete results on changes before we make them, to consolidate assumptions about optimizations into a real data set.
While a super-expert marketer will know what changes to make based on his or her experience and thus be able to reduce the time needed for testing and analysis that improves a Web site’s performance, a less experienced marketer will do random testing that does not always achieve the desired result.
A concrete example: testing to improve affiliate campaigns
The article also provides an example to understand how split testing works by analyzing one of the ways “to make money on the Internet today,” namely affiliate marketing, which can be done by offering “customers a free product or service provided as an affiliate” that earns a commission for each sale.
If we get enough information about our customers, we can build a relationship with them, because we realize a relationship of trust; this study process also allows us to learn a lot about the way they use the website and implement changes that can help users achieve what they are trying to achieve, which is to buy our service or product.
What elements to check in the A/B test
Harnish also points out which elements have the greatest potential for return on ROI during testing, which include among others:
- All CTAs on the site.
- The overall background color of the site.
- The colors of the elements of the entire page.
- The photography of the pages.
- Content and its structure.
- Any element on the page that requires user interaction.
Site elements are critical adaptations that can result in significant increases in performance: for example, if we think a conversion button is not working properly, we can submit a variation (different color, or changed text) to a sample of the actual users of the site, who can provide us with factual evidence on the effectiveness of the new solution.
With A/B testing, we can test the performance of one element at a time, varying and comparing it to an alternative version that contains changes designed around the hypothesized behavior of the users.
In reality, the elements we can test can vary widely depending on the specific objectives and context in which we operate, although we usually prefer to work on a set of aspects, those that actually have the potential to significantly influence user behavior and site performance, such as:
- Page Titles.Titles are critical for capturing user attention and ranking in search engines.Testing variations of titles can reveal which formula attracts the most and engages the target audience.
- Meta Descriptions.Although meta descriptions do not directly affect SEO ranking, they can influence CTR from search results: testing different descriptions can help you understand which ones are most effective in persuading users to click.
- Call-to-Action (CTA).CTAs guide users to specific actions, such as signing up for a newsletter or purchasing a product: variations in the text, color or position of CTAs can have a significant impact on conversions.
- Images and Videos.Visual elements can influence engagement and conversion: testing different images or videos can determine which are most effective in communicating the desired message.
- Page Layout and Structure. The layout of elements on a page can affect usability and user experience; changing the structure can lead to improvements in navigation and dwell time on the page.
- Text Content. The clarity, persuasiveness, and length of text can influence how users interact with a page; testing different writing approaches can help you find the tone and style best suited to your audience.
- Contact or Signup Form. The simplicity or complexity of a form can affect the number of conversions; testing variations in the number of fields or design can optimize the data collection process.
- Navigational Elements. The ease with which users can navigate and find information on a site is crucial. Testing different menus or navigation patterns can improve user experience and reduce abandonment rates.
- Loading Speed. Although not directly testable with a traditional A/B test, loading speed is a critical factor for user experience and SEO; variations in image compression or code optimization can be tested to improve loading times.
- Social validation elements.Reviews, testimonials, and case studies can increase trust and influence the purchase decision; testing their position or the way they are presented can impact conversions.
It is important to note that as we work to test the effectiveness of these elements through A/B testing, we must maintain a holistic view of the user experience and conversion path, ensuring that the changes we make are consistent with the overall brand and message of the site. In addition, any testing must be conducted systematically, ensuring that we collect enough data to do statistically meaningful analysis before making permanent changes.
What tools to use to monitor results
The final and most important stage of the testing activity is the verification of results, which obviously needs tools and methods that can accurately track the performance of different variants and provide reliable data for analysis.
Typically, we can rely on:
- Web analytics tools, such as Google Analytics, to monitor web traffic and analyze user behavior, segmenting data by testing variant and tracking key metrics such as conversion rate, time on page, and bounce rate.
- A/B testing tools, such as Optimizely, designed specifically to offer A/B testing, multivariate and customization capabilities, integrating real-time analytics; they enable A/B testing and help define the metrics we might decide to vary.
- Heatmap and session logging tools, such as Hotjar, with which to study how users interact with different variations of a page.
- Statistical methods, such as statistical significance tests (such as Student’s t-test or chi-square test) that help determine whether observed differences between variants are due to chance or reflect a true difference in performance. In addition, calculating the confidence interval for key metrics can provide an estimate of the range within which the true values of the metrics can be expected to lie, given a certain confidence level (e.g., 95 percent).
- Organic Traffic Analysis tools, such as SEOZoom or other SEO tools.
A different approach to site interventions
The final piece of advice that comes from Harnish is to approach A/B testing with an open mind, without prejudging the elements to be tested and the eventual results, because that way we can really find out what we don’t know.
For example, we can run a test of how the site displays contacts (such as phone number), using distinct elements within the same segment of the data audience sample to determine which get the best results. Or we can run a content test, to quickly make sure we are not wasting time and effort on content that simply will not work.
This approach allows us to have a real, empirical demonstration of how well the site is working in each and every element, and in some cases we can get really great results on final conversions simply by implementing a variation that improves something that wasn’t working before.
The relationship between A/B testing and SEO: benefits and risks
There is a delicate relationship between A/B testing and SEO, because performing the testing activity can expose us to risks and damage to organic visibility, which mainly from potential conflicts with search engine guidelines and the impact that the changes can have on the user experience and search engine perception of the site – in particular, we remember cases of possible negative effects that conducting a test of variations in page content or page URLs can cause on performance in Google Search.
However, if we work effectively and correctly, A/B testing becomes a valuable tool because it allows us to apply informed changes to our pages, overcoming one of the great limitations of SEO: since it is not an exact science, what works for one site may not work for another. By experimenting “in the field,” however, we might verify that a simple change to a title can increase CTR, indicating to search engines that that page is particularly relevant to certain queries and generating a virtuous circle of increased visibility and traffic.
Another aspect that should not be overlooked is the time factor: SEO needs time to “bring results,” and testing can also take longer periods than other digital channels, as search engines can be “slow” in indexing and evaluating changes. This means we need to be patient and wait for the data to accumulate before drawing firm conclusions.
SEO and A/B testing: guidance from Google
In short, A/B testing in SEO must be handled with care, and it is no coincidence that Google itself has provided specific guidance on how to conduct experiments of content variants or page URLs without incurring penalties or ranking damage.
Appearing online in September 2022 and last updated in late November 2023, this guide warns us of the main problems and suggests solutions to minimize the risks of A/B testing on SEO.
The document first clarifies what is meant by site testing, which is “when you try different versions of your website (or part of your website) and collect data on how users react to each version.” Therefore, two types of testing fall under this definition:
- A/B testing, which, as mentioned, involves testing two (or more) variations of a change. For example, we can test different fonts on a button to see if this increases button clicks.
- Multivariate testing, on the other hand, involves testing more than one type of change at a time, looking for the impact of each change and potential synergies between changes. For example, we might try different fonts for a button, but also try changing (and not changing) the font of the rest of the page at the same time: is the new font easier to read and therefore should be used everywhere, or does using a different font on the button than on the rest of the page help attract attention?
Google reminds us that we can use a software to compare behavior with different page variants (parts of a page, whole pages, or entire multipage streams) and monitor which version is most effective with our users. In addition, we can test by creating multiple versions of a page, each with its own URL: when users try to access the original URL, we redirect some of them to each of the URL variations and then compare the users’ behavior to see which page is most effective.
Again, we can run tests without changing the URL by inserting variations dynamically into the page, even using JavaScript to decide which variation to display.
Testing sites and Google, the aspects to consider
Depending on the types of content we are testing, the guide says, it may not even be “very important whether Google crawls or indexes some of the content variations in the course of the testing activity”. Small changes, such as the size, color or position of a button or image, or the text of the call-to-action may have a surprising impact on users’ interactions with the page, but “often have little or no impact on the snippet or search result ranking of that page.”
Also, if Googlebot crawls the site “often enough to detect and index the experiment,” it will likely index any updates we make rather quickly at the conclusion of the test.
Google’s best practices for testing sites
The paper also goes into more technical and practical details, providing a set of best practices to follow to avoid negative effects on site behavior in Google Search when testing variations on the site.
- Do not cloak test pages
Do not show one set of URLs to Googlebots and a different set to humans, and thus “don’t do Cloaking”, a tactic that violates Google’s Instructions, regardless of whether we are running a test or not. The risk of these violations is “causing the site to be demoted or removed from Google’s search results, likely not the desired result of the test.”
Cloaking matters whether we run it through server logic or robots.txt or any other method, and alternatively Google suggests using links or redirects as described below.
If we use cookies to control the test, we should keep in mind that Googlebot generally does not support cookies: therefore, it will only see the version of the content accessible to users with browsers that do not accept cookies.
- Using rel=”canonical” links
If we are testing with multiple URLs, we can use the rel=”canonical” attribute on all alternate URLs to indicate that the original URL is the preferred version. Google recommends using rel=”canonical” rather than a noindex meta tag because it “more closely matches the intention in this situation.” For example, if we are testing variations of your home page, we don’t want “search engines not to index the home page, but only to understand that all test URLs are duplicates or variations similar to the original URL and should be grouped together, with the original URL as canonical.” Using noindex instead of canonical in such a situation could sometimes have unexpected negative effects.
- Using 302 redirects and not 301 redirects
If we are running a test that redirects users from the original URL to a variant URL, Google invites us to use a 302 redirect (temporary) and not a 301 redirect (permanent). This tells search engines that the redirect is temporary (it will only be active as long as the experiment is running) and that they should keep the original URL in their index, rather than replacing it with the target URL of the redirect (the test page). JavaScript-based redirects are also fine.
- Launch the experiment only as long as necessary
The amount of time needed for a reliable test varies depending on factors such as conversion rates and the amount of traffic the website receives, and a good testing tool tells us when we have collected enough data to draw a reliable conclusion. Once the test is finished, we need to update the site with the desired content variants and remove all test elements, such as alternate URLs or test scripts and markup, as soon as possible. If Google discovers “a site running an experiment for an unnecessarily long time”, it may interpret this as an attempt to deceive search engines and act accordingly, especially if the site offers a content variant to a large percentage of its users.