Migrating a WordPress website requires planning. If you are changing the theme and/or plugins even more planning is required. In my journey of WordPress development, I’ve came across the need for a few utilities during the migration planning phase. With the invention of page builders the need to list all shortcodes used in the site is paramount for migration planning.
The purpose of this script was to provide a report to clients and inform any scoping documentation that needed to occur. For instance, changing from Visual Composer to raw HTML is no easy task, most modules have hard-coded HTML that needs to be parsed and during the migration process you may need to parse that output and either drop it directly in the content, or use the settings from specific modules for your new architecture.
So, to list all shortcodes for a site, you can’t really rely on the fact the shortcodes are registered, especially if you’re scraping an older site that has been through a few iterations. The goal of this script was flexibility and give the user ( me ) a way to provide a comprehensive report of the following:
- Shortcode itself ( the name )
- Post ID it’s tied to
- Parameters in use ( in JSON format )
- CSV export, to provide to the team and/or client(s)
Direct database query versus filtered content
Right now, the shortcode scraper grabs data directly from the database, instead of using get_the_content() like you would assume. This is because when using get_the_content() the shortcodes are parsed if registered, and the rendered output is displayed. Sure, I could disable shortcode parsing and all that other stuff, but why? Why use WP_Query to talk over data then the database is like right there? I chose to go the direct database route out of simplicity here, it ensures the data coming from the post_content field is, as you would expect, raw and unfiltered.
How accurate is it?
Glad you asked! The code uses a regex pattern to match anything in between brackets, specifically
 and the parameters within those brackets. Of course, this does indeed lead to some mis-information and requires the developer to do some cleanup on the CSV before reporting it to the client and/or team. This is quite rare and is very dependent on how the editors of the current website use brackets. For instance I’ve came across some strings like “[ If this were IE this wouldn’t happen ]”. Which shows the shortcode as If with a ton of parameters.
How to provide a report!
While the ability to list all shortcodes from all posts in WP-CLI format is great, most of the time you’ll need a spreadsheet so you can do additional mapping and brainstorming. Enter the –export flag for this script. The –export command will provide a list of all shortcodes in CSV format at the same location as the WP-CLI script. Additionally it will date the CSV for you ( if for some reason you need to re-run the report at a later date ).
The source code for this has mutated over time, in fact I patched a couple things just a few minutes ago. Originally this lived on a secret gist of mine and was updated when ever I needed it. But now, I’m giving it to you, maybe it’ll be helpful, maybe not? Right now it is not multi-site compatible but definitely can be should you see the need to list all shortcodes from all posts… on ALL blogs.
The command syntax is pretty simplistic:
wp shortcode-scraper scrape [--export]