Recently, I was faced with a rather daunting SEO challenge: the migration of a very outdated website built with a little-known CMS called Zope. The previous developer, unfortunately, was not available to provide us with the database for data extraction, which made the process even more complex, since the site contained tens of thousands of pages.
The scrape and import strategy
Given the lack of cooperation in accessing the data, after consulting with the client, I had to resort to good old-fashioned scraping methodology to retrieve essential information from the site.
For this work, I deemed the Screaming Frog spider as an appropriate tool. This tool, usually used to do error scans of a site, has very powerful features for extracting data from a website as well.
The feature in question is called Custom Extraction.
Screaming Frog’s “Custom Extraction” is an advanced feature of the SEO Spider that allows you to extract specific data from a web page while scanning the site. This can include any HTML elements, attributes or even inline scripts using XPath expressions, CSS Path or Regex (regular expressions).
The data I needed were:
- Previous URL
- Title
- H1
- Post content.
All crucial elements to preserve SEO value during migration.
This process took several days and brought up hundreds of thousands of URLs. Too many! I knew there would be much less useful content.
I did an initial skimming of URLs that I was definitely not interested in-there were thousands and thousands of search pages that were linked but not indexed.
I then took other URL patterns that I didn’t know might or might not bring traffic and entered them into SEOZoom’s “URL Analysis” tool.
Although only 1000 URLs can be analyzed at a time, I just needed confirmation. To understand whether by not importing a certain type of URL I could have done any harm.
The ideal partner for your SEO strategy!
Fortunately, my hunch was correct and there was indeed a lot of garbage that was just going to waste Crawl Budget.
Good news. Out of 500 URLs analyzed only 3 had any keywords placed. This simplified my work because I had to focus only on the /archive/ path limiting the useful URLs to only 32 thousand.
Technical implementation: importing the data
After scraping, the data were exported in CSV format and imported into the MySQL database of the new site, developed in WordPress. I added two additional fields to the database:
- newurl (urlnuova)
- new id (idnuova)
Initially these two fields are empty, but I will need them in the next phase to manage the migration bridge.
I then enlisted the help of ChatGPT to develop a custom PHP script that, operating within the WordPress environment, would read URLs from the Screaming Frog export to generate new posts. One particularly challenging aspect was handling the publication dates, which were often in inconsistent or incorrect formats.
Here is the sample code for the script I created.
Implementation of the bridge script
I now have in the wordpress database all the data imported from the old site to the new one. The 301 redirects need to be handled.
Joined with the following .htaccess rule
#rules for migration bridge script
RewriteRule ^archive/(?:year-)?\d{4}(?:/[a-zA-Z]+)?/(.*)$ /bridge.php [L,QSA]
This few-line PHP script is critical for the redirect phase: when the search engine or Google lands on an old URL, the .htaccess file sends the request to the bridge script that queries the database.
If the old URL -> new URL match is found in the database a 301 redirect occurs.
If no match is found, the user is redirected to a 404 (or if you prefer 410) page.
Extra scripts
In addition to the core scripts for migrating text content, I developed parallel scripts for handling multimedia resources, such as images and PDFs. These scripts were responsible for importing such files into the WordPress media library, ensuring that all media were properly associated with their respective posts. However, to keep the focus of this case study on the essentials, I have chosen to concentrate primarily on the basic methodology used to manage the migration.
Testing phase
Before the final launch, I implemented a crucial testing phase to simulate the interaction Google would have with the new site. To do this, I once again used Screaming Frog, selecting the “List” option from the “Mode” menu. This mode allowed me to load all the old URLs, directing the spider exclusively to the specified paths, without letting it navigate freely.
Because of the difference between the hostname of the production site and the test site, I had to make a substitution in the CSV file, changing www.domainname.com to dev.domainname.com. During testing, I ran into several 404 errors, indicating a mismatch in the redirect paths. In particular, some date paths were incorrect, for example, “January-200024” where an extra zero made the path inaccessible.
To solve these problems, I introduced additional rules to check and correct the incorrect dates. After several hours of scanning and fixes, I was able to stabilize the redirects, with all URLs correctly pointing to the new paths with a 301 redirect. Finally, I rescanned all 301 destination URLs to confirm that they were indeed navigable and free of further errors.
Putting into production
After successful completion of the testing phase, the site went online. The transition from test to production status was followed through Google Search Console, a tool I used to ensure that no significant problems emerged once the site became publicly accessible and indexable by Google.
In addition, I used SEOZoom to monitor the URLs that represented the pillars of site traffic. I did not review the entire site at this early stage, but preferred to focus on those pages that were critical for traffic and visibility. This targeted approach allowed me to function as a “canary in the mine,” an early indicator that the site was maintaining its most important keywords and not experiencing SEO performance drops due to migration.
Was it a perfect migration?
Short answer: NO!
I maintained open and transparent communication with the client before we started about the limitations and challenges of the project, openly discussing the possible outcomes of the migration. This honesty helped to set realistic expectations and prepare the client for the upcoming changes.
Overall, the migration was not perfect, as limited access to the original database and the fact that I was not involved in the creation and optimization of the new site prevented the full potential from being realized. However, considering the starting point, which was the absence of accessible data, the work proved to be extremely stimulating and full of challenges.
Keep your site under control
Website migrations rarely succeed in retaining 100% of previous traffic. This phenomenon depends on many factors, such as differences in site architecture, changes in content, and inevitable fluctuations in search engine rankings, not to mention that the migration occurred astride a Google algorithmic update that did not provide us with objective data.
It is crucial to anticipate potential post-migration problems, to avoid drastic changes that may confuse users or search engines, and to set realistic goals.
Despite the difficulties, the client was able to move away from an outdated and out-of-control system, gaining a new platform on which it can now develop and improve content more effectively and independently.