opensource.google.com

Menu

Security Crawl Maze: An Open Source Tool to Test Web Security Crawlers

Friday, June 21, 2019

Scanning modern web applications for security vulnerabilities can be a difficult task, especially if they are built with Javascript frameworks, which is why crawlers have to use a multi-stage crawling approach to discover all the resources on modern websites.

Living in the times of dynamically changing specifications and the constant appearance of new frameworks, we often have to adjust our crawlers so that they are able to discover new ways in which developers can link resources from their applications. The issue we face in such situations is measuring if changes to crawling logic improve the effectiveness. While working on replacing a crawler for a web security scanner that has been in use for a number of years, we found we needed a universal test bed, both to test our current capabilities and to discover cases that are currently missed. Inspired by Firing Range, today we’re announcing the open-source release of Security Crawl Maze – a universal test bed for web security crawlers.

Security Crawl Maze is a simple Python application built with the Flask framework that contains a wide variety of cases for ways in which a web based application can link other resources on the Web. We also provide a Dockerfile which allows you to build a docker image and deploy it to an environment of your choice. While the initial release is covering the most important cases for HTTP crawling, it’s a subset of what we want to achieve in the near future. You’ll soon be able to test whether your crawler is able to discover known files (robots.txt, sitemap.xml, etc…) or crawl modern single page applications written with the most popular JS frameworks (Angular, Polymer, etc.).

Security crawlers are mostly interested in code coverage, not in content coverage, which means the deduplication logic has to be different. This is why we plan to add cases which allow for testing if your crawler deduplicates URLs correctly (e.g. blog posts, e-commerce). If you believe there is something else, feel free to add a test case for it – it’s super simple! Code is available on GitHub and through a public deployed version.

We hope that others will find it helpful in evaluating the capabilities of their crawlers, and we certainly welcome any contributions and feedback from the broader security research community.

By Maciej Trzos, Information Security Engineer
.