Hello everyone,
Michael here, and in today’s post, we’ll take a look at how well Tesseract could possibly read a sample of my handwriting.
So far, we’ve tested Tesseract against standard computer-font text, a photo of a banner with text, and a common US tax document. Aside from the standard computer-font text, Tesseract didn’t work well with either the banner or the tax document.
However, can Tesseract work well with reading my handwriting? Let’s find out!
But first, a little pre-processing…
Before we test Tesseract on my handwriting, let’s follow the pre-processing steps we’ve followed for the other three Tesseract scenarios: pip install the necessary packages and import them onto the IDE.
First, the pip installing:
!pip install pytesseract
!pip install opencv-python
Next, let’s import the necessary packages:
import pytesseract
import numpy as np
from PIL import Image
And now, the initial handwriting Tesseract test
Now, upon initial testing, how well can Tesseract read this sample of my handwriting?:

Let’s find out, shall we:
testImage = 'handwriting.png'
testImageNP = np.array(Image.open(testImage))
testImageTEXT = pytesseract.image_to_string(testImageNP)
print(testImageTEXT)
Output:
Interestingly, Tesseract didn’t seem to pick up any text. I thought it might’ve picked up something, as the image simply contains black text on a white background. After all, there are no other objects in the image, nor is the information arranged like a document.
Could a little bit of image preprocessing be of any use with this image? Let’s find out!
Preprocessing time!
For this example, let’s try the same technique we used in the other two lessons-thresholding!
First off, let’s grayscale this image:
import cv2
from google.colab.patches import cv2_imshow
handwriting = cv2.imread('handwriting.png')
handwriting = cv2.cvtColor(handwriting, cv2.COLOR_BGR2GRAY)
cv2_imshow(handwriting)
Next, let’s do a little thresholding on the image. Since the image is black font with white text, let’s see how a different thresholding technique (THRESH_BINARY_INV) might be able to assist us here:
ret, thresh = cv2.threshold(handwriting, 127, 255, cv2.THRESH_BINARY_INV)
cv2_imshow(thresh)
The technique we used here-THRESH_BINARY_INV-is the opposite of what we used for the previous two lessons. In inverse binary thresholding, pixels above a certain threshold (127 in this case) turn black while pixels below this threshold turn white. I think this type of thresholding could be quite useful for handling black text on a white background, as was the case here.
Any luck reading?
Once we’ve done the thresholding, let’s see if that made a difference in the image’s Tesseract readability:
handwritingTEXT = pytesseract.image_to_string(thresh)
print(handwritingTEXT)
Output:
Interestingly, unlike the previous two Tesseract scenarios we tested (the photo of the banner and the W-2 document), no text was read at all after thresholding.
Honestly, I thought the handwriting scenario would do far better than the banner photo or W-2 given that the contents of this image are simply black text on a white background. I mean, Tesseract was able to perfectly read the image in The Seven-Year Coding Wonder, and that was red text on a lime-green background. I guess this goes to show that while Tesseract has its potential, it also has several limitations as we’ve discovered.
Here’s the GitHub link to the Google Colab notebook for this post-https://github.com/mfletcher2021/blogcode/blob/main/OCR_handwriting_readings.ipynb.
Thanks for reading,
Michael