Migration

How to Move a 7,500-Order Shopify Store to WooCommerce With Every Penny Reconciled

When a client says "we want off Shopify", the conversation usually splits into two paths. The trivial path: a few dozen SKUs, off-the-shelf migration tool, done in a couple of days. The non-trivial path is the one this post is about - and why I built a custom pipeline for it.

When a client comes to me and says "we want off Shopify", the conversation usually splits into two paths. The trivial path: their store is a few dozen SKUs, a few hundred orders, content marketing on top. Any of the off-the-shelf migration tools (Cart2Cart, LitExtension, the official S2W importer) will do an acceptable job, and the move ships in a couple of days.

The non-trivial path is the one this post is about. The client had:

  • 75 active products with 144 variations and 391 product images
  • 6,441 customers, going back several years
  • 7,476 historical orders representing £406,695.36 in trading
  • 10,117 line items across those orders
  • 1,096 discount codes, some of them one-time-use codes already issued to past customers
  • 45 blog posts under /blogs/news (Shopify does not export blog content)

When the historical financials matter to a client (accounting reconciliation, customer support continuity, ad-platform attribution windows), "approximately correct" is not the standard. Every line had to land. So I wrote a custom pipeline. This is what I built and what I learned.

Why custom and not a plugin

The off-the-shelf importers are perfectly fine when none of the following are true:

  1. You need every order to reconcile to the penny against your Shopify finance reports.
  2. You have one-time-use discount codes already sent to customers that must still work post-cutover.
  3. Your blog content has internal links, embedded images, and SEO juice you cannot afford to lose.
  4. Your product option names exceed 28 characters (a real WooCommerce slug limit that bites people).
  5. You want a sandboxed dry run before any write hits the production database.

The third-party tools either silently lose information against these constraints or refuse to map data they cannot interpret. When the gap matters, you build a pipeline that matches your data shape exactly.

The architecture, in one diagram

   Shopify CSVs + crawled HTML
              |
              v
   [1] Extract        Python normalisers per entity write JSON
              |
              v
   [2] Transform      Fuzzy match line items to product / variation IDs
              |
              v
   [3] Load           PHP loaders invoked through wp eval-file,
                      running inside the WP / WC context
              |
              v
   [4] Validate       Five validators compare source aggregates to DB state
              |
              v
   [5] Fix loop       Re-run failed records until green or three
                      stale iterations, then write manual_review.json

Five stages, executed in order, each writing its output to disk so any step can be re-run independently.

Two passes, one codebase

The first thing I did before writing any loader code was set up a local sim environment. Docker Compose, mariadb:11.4, wordpress:latest, a wp-cli sidecar. I pulled a mysqldump of the live Kinsta database and imported it into the local mariadb. I installed the same plugin set as live.

Then I wired the pipeline so that MIGRATION_TARGET=sim and MIGRATION_TARGET=live swap only the DB host and the wp-cli invocation. Nothing else.

Every bug I found surfaced in sim. The live pass ran the identical code with one env var flipped, and produced identical results. This is not novel, but it is non-negotiable when the live target has real revenue history attached.

The idempotent contract

Every loader respects the same rule. Each imported record gets a meta key that links it back to its Shopify source ID:

  • Products get _shopify_product_id
  • Variations get _shopify_variation_id
  • Customers get _shopify_customer_id (on the user record)
  • Orders get _shopify_order_id (on the WC order)
  • Coupons get _shopify_discount_id
  • Blog posts get _shopify_blog_slug
  • Attachments get _shopify_image_url

Before inserting, every loader checks for an existing record with the matching meta value. If found, it updates in place. If not, it inserts.

The practical effect: if the pipeline crashes halfway, you re-run it. If a single order failed validation, the fix loop re-runs only that order. Nothing duplicates, nothing strands. Re-running a clean pipeline on a clean DB produces zero changes, and that "zero changes" check is itself a green-light signal before flipping to live.

The technical calls that earned their keep

Bulk PHP loaders via wp eval-file

My first attempt used the WooCommerce REST API from Python. I measured it. Spawning 7,476 REST calls would have taken hours of HTTP overhead, plus I had to handle auth, retries, and rate limiting myself. I threw that branch away and rewrote the loaders as PHP scripts invoked through wp eval-file. Each loader runs once, stays in the WP / WC PHP memory the whole time, and gets WC's internal hooks and HPOS sync for free. End-to-end speedup: roughly 50x.

The loader runner in Python is a thin shim that does ssh ... wp eval-file ... and parses the output. That is the whole interface.

HPOS-aware order writes

WooCommerce's High-Performance Order Storage moves orders out of wp_posts into wp_wc_orders and a small set of related tables. The data is still there if you write raw SQL, but the public WC_Order API handles the dual-write to lookup tables and meta correctly. I went through WC_Order and $order->save(), never raw inserts to wp_wc_orders, even when the table was right there.

If HPOS is enabled and you write raw SQL to wp_wc_orders, you will silently lose data in the lookup tables and the order search index will lie to you for weeks. Use the API.

Local product attributes, not global taxonomies

Shopify lets product option names be anything. Free-form strings. We had option names that exceeded WooCommerce's 28-character slug limit, which breaks global product attribute taxonomies. WC's solution is "use a shorter name", which is not actionable on imported data.

The workaround: use local (per-product) attributes instead of global taxonomies. The downside is they are not aggregated across products, so a "filter by 500mg" facet across the catalog has to be rebuilt with a different mechanism. The upside is the import actually works. For this client, the trade was worth it.

Email suppression during order import

7,476 orders means 7,476 potential customer notifications, plus admin notifications, plus payment-method-specific emails. Sending those during a migration is a great way to get customers asking "why am I being told my order from 2024 is now Processing".

The cleanest approach I found is a one-line filter:

add_filter( 'pre_wp_mail', '__return_false' );

Drop this in the order loader, do the work, remove the filter at the end. WC's notification dispatch will fire normally but every individual mail attempt returns false at the lowest level.

Content-hash image dedup

Shopify's product images are referenced by URL. The same physical image might appear on multiple products with different URLs (variants, alt sizes). I did not want to upload the same JPEG 17 times.

The dedup key was _shopify_image_url on the WP attachment. Before downloading and uploading, check the meta. If the URL has been imported, attach the existing attachment ID to the new product. The 391 product images on disk represent 391 unique URLs, not 391 unique uploads.

Blog crawled, not imported

Shopify exports do not include blog content. There is no blogs.csv. I wrote a small crawler that walks /blogs/news, parses the article HTML, pulls out title / date / author / body / hero image, and writes it to JSON. The loader is the same shape as the others. 45 posts captured, including their featured images.

Fuzzy line-item matching

The thorniest data problem in any commerce migration is historical line items. Shopify stores line items as name + price + quantity + product_id where the product_id is a Shopify ID that does not exist in WC. Worse, the name field on a Shopify line item is whatever was on the product at order time, which may have been edited, renamed, or deleted since.

I used RapidFuzz token_set_ratio with a 90 threshold to match line item names to product / variation IDs. 64.9% auto-matched. The remaining 35.1% saved as name + price + qty line items with no product linkage, which WooCommerce supports natively. The customer-facing order pages and admin views all render correctly. Revenue reconciliation works because line item totals come from the saved total field, not from the product price lookup.

For accounting purposes, an unlinked line item is fine. For analytics purposes ("what are my best-selling SKUs over the last three years"), it is a slight gap. I documented the gap.

The validation strategy

Five validators ran after every load. Each compared source CSV aggregates to DB state and emitted a diff_*.json report.

  • diff_products.json: source product count, names, prices vs WC state
  • diff_customers.json: email address sets, billing country distribution
  • diff_orders.json: order count, status distribution, revenue totals
  • diff_coupons.json: coupon code set, expiry dates
  • diff_blog.json: post count, slug set, featured-image presence

Any validator emitting a non-empty diff blocked the pipeline. The fix loop then re-ran the relevant loader for the missing or mismatched records, up to three iterations of zero progress before flagging for manual review.

The acceptance bar was zero diffs across all five validators, plus one final re-run of the pipeline producing zero changes (the idempotency check). Only then did I flip MIGRATION_TARGET=live.

The final numbers

The acceptance criteria were met across every entity:

  • Products: 75 / 75 (100%)
  • Variations: 144 / 144 (100%)
  • Product images: 391 / 391 (100%)
  • Customers: 6,441 / 6,441 (100%)
  • Coupons: 1,096 / 1,097 (99.9%, 1 case collision, unavoidable)
  • Orders: 7,476 / 7,476 (100%)
  • Order line items: 10,117 / 10,117 (100%)
  • Revenue: £406,695.36 / £406,695.36 (exact)
  • Blog posts: 45 / 45 (100%)

The single coupon lost was a case-collision: Shopify is case-insensitive on coupon codes but allowed two coupons with codes differing only in case to exist simultaneously. WooCommerce normalises to lowercase. One had to win. I documented it; the client agreed.

What I would do again

Plan mode for the architecture spike. Before any code, get the full plan in front of the human. Mistakes at the architecture level cost days, not hours, to unwind.

Local Docker sim that clones live exactly. Every bug I shipped to sim, I caught in sim. None of them made it to live. The cost of the sim setup is roughly half a day; the cost of finding bugs on live is whatever the client's recovery patience allows, which is usually less than half a day.

Idempotent loaders. The number of times I re-ran a partially failed loader during development is in the dozens. Each re-run cost nothing because the loaders converge. The number of times I re-ran on live: twice. Both safe.

One PHP file per entity, invoked through wp eval-file. This is the architectural choice that made the timing work. Anyone proposing per-record REST calls for a migration of this size should be politely asked to measure first.

HPOS-aware code from day one. If your target site has HPOS enabled and you write to the order tables directly, you will create a bug that takes a week to find. Always go through WC_Order and $order->save().

What is next for this site

The migration was the technical lift. The remaining work is the operational lift: tax setup with the client's accountant, DNS cutover from the current Shopify pointer to the Kinsta IPs, transactional email through a proper SMTP provider, consent management for analytics, ownership handover of every account into the client's name. That stuff is not interesting engineering, but it is what makes the difference between "site moved" and "site live".

If you are about to migrate a real eCommerce store off (or onto) Shopify, the takeaway is small: pick the right tool for the data shape, build the safety net before you write the loaders, and reconcile every number before you flip the DNS. The rest is execution.


Tools used: Python 3.11, RapidFuzz, WordPress 6.9, WooCommerce 10.7 (HPOS enabled), MariaDB 11.4, Docker Compose, wp-cli, Kinsta hosting. The pipeline itself is roughly 2,000 lines of Python and 1,500 lines of PHP. The architectural decisions matter more than the line count.

Filip Rastovic
Filip Rastovic
Shopify Developer & CRO Specialist · Stargazer Studio

Need a complex eCommerce migration done right?

For migrations where every line has to land - financials, customer history, SEO equity, discount codes - send me a brief and I'll quote a fixed price within 24 hours.

Book a free call More articles
Filip Rastovic
Book a Call Get started today