Let’s Make Our Own Python Game Part Two: Creating The Bingo Card (Python Lesson 50)

Advertisements

Hello everyone,

Michael here, and in today’s post, I’ll pick up where we left off last time. This time, we’ll create the BINGO board!

In case you forgot where we left off last time, here’s the code from the previous post:

Now let’s continue making our BINGO game. In today’s post, we’ll focus on generating the BINGO card.

Outlining the BINGO board

As you may recall from the previous post, all we really did was create one giant lime green square:

It’s a good start, but quite frankly, it won’t work for our BINGO board. How can we improve the look of our BINGO card? Add some borders!

Here’s the code to add the borders:

for xcoord in x:

for ycoord in y:
screen.blit(square1.surf, (xcoord, ycoord))
pygame.draw.rect(screen, pygame.Color("Red"), (xcoord, ycoord, 75, 75), width=3)

Pay attention to the last line in these nested for loops-the one that contains the pygame.draw.rect() method. What this method does is draw a square border around each square in the BINGO card. This method takes four parameters-the game screen (screen in this case), the color you want for your border (this could be a color name or hex code), a four-integer tuple that contains the x and y-coordinates for the square along with the square’s size, and the thickness of the border in pixels. Let’s see what we get!

The BINGO card already looks much better-now we need to fill it up!

  • Just a tip-for the border generation process to work, make the dimensions of the border the same as the dimensions of the square generated. In this case, we used 75×75 borders since the squares are 75×75 [pixels].

B-I-N-G-O

Now, what does every good BINGO card need. The B-I-N-G-O on the top row, of course!

Here’s how we can implement that:

font = pygame.font.Font('ComicSansMS3.ttf', 32) 

if ycoord == 75:
if xcoord == 40:
text = font.render('B', True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))
elif xcoord == 115:
text = font.render('I', True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))
elif xcoord == 180:
text = font.render('N', True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))
elif xcoord == 255:
text = font.render('G', True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))
elif xcoord == 330:
text = font.render('O', True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

And how does all of this code work? Let me explain.

The B-I-N-G-O letters will usually go on the top row of the card. The line if ycoord == 75 will ensure that the letters are only drawn on the top row of the card since point 75 on the y-axis (on our gamescreen) corresponds to the top row of the card.

Since there are five letters in B-I-N-G-O, there are also five conditional statements that account for the five letters we’ll be drawing onto the top five squares. There are also five x-coordinates to account for (40, 115, 180, 255, 330).

But before we actually start drawing the text, I want you to take note of this line-font = pygame.font.Font('ComicSansMS3.ttf', 32). This line allows you to set the font you wish to use for text along with its pixel size-in this case, I wanted to use Comic Sans with a 32 pixel size (hey, this isn’t a business project, so I can have some fun with fonts). Unforntuantely, if you want custom fonts for your game, you’ll need to download a TFF (TrueType font) file and save it to your local drive.

  • Another tip-if the TFF file is saved in the same directory as your game file, you just need the TFF file name as the first parameter of the pygame.font.Font() method. Otherwise, you’ll need the whole filepath as the first parameter.

As for the five conditional statements, you’ll notice that they each have the same five lines. Let’s explain them one-by-one.

First off we have our text variable, which contains the text we want to write to the square. The value of this variable is stored as the results of the font.render() method, which takes four parameters-the text you wish to display, whether you want to antialias the text (True will antialias the text, which simply results in a smoother text appearance), a 3-integer tuple representing the color of the text in RGB form, and another 3-integer tuple representing the color of the background where you wish to apply the text-also in RGB form.

  • For a good look, be sure to make the backgound color the same as the square’s color.

Next we have our textRect variable, which represents the invisible rectangle (or square) that contains the text we will render.

Upon initial testing of the text rendering, I noticed that my text wasn’t being centered in the appropriate square. The testX and testY variables are here to fix it by using the simple formula (square size-rectangle width/height) // 2 (use width for the x-center point and height for the y-center point). What this does is help gather the textRect x-center and y-center to in turn help center the text within the square. However, these two variables alone won’t center the text correctly, and I’ll explain why shortly.

  • In Python, the // symbol indicates that the result of the division will be rounded down to the nearest whole number, which helps when dealing with coordinates and text centering.

Last but not least, we have our wonderful screen.blit() method. In this context, the screen.blit() method takes two parameters-the text you want to display (text) and a 2-integer tuple denoting the coordinates where you wish to place the text.

Simple enough, right? However, take note of the coordinates I’m using here-(xcoord + testX, ycoord + testY). What addind the testX and testY coordinates will do is help center the text within the square.

After all our text rendering, how does the game look now?

Wow, our BINGO game is starting to come together. And now, let’s generate some BINGO numbers!

B….1 to 15, I….16 to 30 and so on

Now that the B-I-N-G-O letters are visible on the top of our card, the next thing we should do is fill our card up with the appropriate numbers.

For anyone who’s played BINGO, you’ll likely be familiar with which numbers end up on which spots on the card. In any case, here’s a refresher on that:

  • B (1 to 15)
  • I (16 to 30)
  • N (31 to 45)
  • G (46 to 60)
  • O (61 to 75)

With these numbering rules in mind, let’s see how we can implement them into our code and show them on the card! First, inside the game while loop, let’s create five different arrays to hold our BINGO numbers, appropriately titled B, I, N, G, and O.

from random import seed, randint 

[meanwhile, inside the while game loop...]

    B = []
seed(10)
for n in range(5):
value = randint(1, 15)
B.append(value)

I = []
seed(10)
for n in range(5):
value = randint(16, 30)
I.append(value)

N = []
seed(10)
for n in range(5):
value = randint(31, 45)
N.append(value)

G = []
seed(10)
for n in range(5):
value = randint(46, 60)
G.append(value)

O = []
seed(10)
for n in range(5):
value = randint(61, 75)
O.append(value)

Also, don’t forget to include the line from random import seed, randint (I’ll explain why this is important) on the top of the script and out of the while game loop.

As for the array creation, please keep that inside the while game loop! How does the BINGO array creation work? First of all, I first created five empty arrays-B, I, N, G, and O-which will soon be filled with five random numbers according to the BINGO numbering system (B can be 1 to 15, I can be 16 to 30, and so on).

Now, you’ll notice that there are five calls to the [random].seed() method, and all of the calls take 10 as the seed. What does the seed do? Well, in Python (and many other programming languages), random number generation isn’t truly “random”. The seed value can be any positive integer under the sun, but your choice of seed value determines the sequence of random numbers that will be generated-hence why random number generation (at least in programming) is referred to as a deterministic algorithm since the seed value you choose determines the sequence of numbers generated.

  • If you don’t have a specific range of random number you want to generate, you’ll get a sequence of random numbers Python’s random number generator chooses to generate.
  • You simply need to write seed() to initialize the random seed generator-the random. part is implied.

After the random seed generators are set up, there are five different loops that append five random numbers to each array-the line for n in range(5) ensures that each array has a length of 5. Inside each loop I have utilized the [random].randint() function and passed in two integers as parameters to ensure that I only recieve random numbers in a specific range (such as 1 to 15 for array B).

Now, let’s display our numbers on the BINGO card! Here’s the code to run (and yes, keep it in the while game loop):

 if ycoord != 75:

if xcoord == 40:
for num in B:
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 115:
for num in I:
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 180:
for num in N:
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 255:
for num in G:
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 330:
for num in O:
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

Confused at what this code means? The last five lines in each if statement essentially do the same thing we were doing when we were rendering the B-I-N-G-O on the top row of the card (rendering and centering the text on each square). However, since we don’t want the numbers on the top row of the card, we include the main if statement if ycoord != 75 because this represents all squares that aren’t on the top row of the card.

Oh, one thing to note about rendering the numbers on the card-simply cast the number variable (num in this case) as a string/str because pygame won’t render anything other than text of type str.

With all that said, let’s see what our BINGO card looks like:

Well, we did get correct number ranges, but this isn’t the output we want. Time for some debugging!

D-E-B-U-G

Now, how do we fix this board to get distinct numbers on the card? Here’s the code for that!

First, let’s fix the BINGO number array creation process:

    B = []


value = sample(range(1, 15), 5)
for v in value:
B.append(v)

I = []

value = sample(range(16, 30), 5)
for v in value:
I.append(v)

N = []

value = sample(range(31, 45), 5)
for v in value:
N.append(v)

G = []

value = sample(range(46, 60), 5)
for v in value:
G.append(v)

O = []

value = sample(range(61, 75), 5)
for v in value:
O.append(v)

In this code, I first added the sample method to the from random import ... line as we’ll need this method to ensure we get an array of distinct random numbers.

   else:

y2 = [150, 225, 300, 375, 450]
if xcoord == 40:
for num, ycoord in zip(B, y2):
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 115:
for num, ycoord in zip(I, y2):
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 180:
for num, ycoord in zip(N, y2):
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 255:
for num, ycoord in zip(G, y2):
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

if xcoord == 330:
for num, ycoord in zip(O, y2):
text = font.render(str(num), True, (0,0,0), (50,205,50))
textRect = text.get_rect()
textX = (75 - textRect.width) // 2
textY = (75 - textRect.height) // 2
screen.blit(text, (xcoord + textX, ycoord + textY))

Remember the else block where we were rendering the text? Well, I made a few changes to the code. I first added another array of y-coordinates (y2) that is essentially the same as the y array without the number 75. Why did I remove the 75? I simply wanted to ensure that no numbers are drawn on the top row of the card, and 75 represents the y-coordinate that displays the top row of the card.

While iterating through our five BINGO number arrays aptly titled B, I, N, G and O, we’re also iterating through the y2 array to ensure that the correct numbers are rendered in the correct squares.

  • In case you’re wondering about the zip() line in our for loop, the zip() function allows us to iterate through multiple arrays at once in a for loop. However, the zip() function only works if the arrays you’re looping through are the same length.
  • If you want to iterate through multiple arrays of unequal lengths, include this import at the beginning of your script-from itertools import zip_longest. The zip_longest() function will allow you to iterate through multiple arrays of unequal length. Also remember to pip install the itertools package if you don’t have it on your laptop already.

Using our revised code, let’s see what our BINGO card looks like now!

Wow, the BINGO card already looks much better! However, if you’re familiar with BINGO, you know the center square in the N column is considered a “free space”. Let’s reflect this with the addition of one simple line of code:

N[2] = 'Free'

This line will replace the middle element in the N array with the word Free, which in turn will display the word Free on the center of the BINGO card:

Nice work!

Testing the card

Now that we’ve generated quite the good-looking BINGO card, the last thing we’ll need to do is test it to ensure we get a new card each time we open the game!

Here’s what our card currently looks like:

And let’s see what happens we we close and restart the game:

It looks like we got the same card. Now, playing with the same card every time would be a quite boring, right? How do we fix this bug? Here’s some code to do so (and keep in mind this is just one way to solve the problem):

possibleSeeds = []

value = sample(range(1, 10000), 5)
for v in value:
possibleSeeds.append(v)

meanwhile, inside the game loop

...

seed(possibleSeeds[0])

To solve the BINGO card generation bug, I used the same array generating trick I used for generating the BINGO arrays which is gather a specific number of integers from a specific integer range and use a loop to create a 5-element integer array. This time, I used integers ranging from 1 to 10000.

Inside the game loop, I set the random seed to the first element of the possibleSeeds array. Why did I do this? When I set my seed() to 10, I managed to see the same BINGO card each time I started the game because since the seed() value was fixed, the same sequence of random numbers are generated each time because using the same seed() each time you run a random number generator will give you the same sequence of random numbers each time it’s run. However, using the first element of the possibleSeeds array won’t give you the same random number sequence (and in turn, the same card) each time because the possibleSeeds array generates a sequence of five different integers with each iteration. Since you get a different random number sequence each time, the random number generation seed will be different each time, which in turn results in a different BINGO card generated each time the game is run.

  • Keep the seed() method inside the game loop, but keep the possibleSeeds array outside of the game loop because inserting the array into the game loop will generate random 5-integer sequences non-stop, which isn’t a desirable outcome.

Now, let’s see if our little trick worked. Let’s try running the game:

Now let’s close this window and try running the game again!

Awesome-we got a different BINGO card! How about another test run-third time’s the charm after all!

Nice work. Stay tuned for the next part of this game development series where we will create the mechanism to call out different BINGO “balls”.

Also, here’s a Word Doc file with our code so far (WordPress won’t let me upload PY files)-

Thanks for reading,

Michael

Let’s Make Our Own Python Game Part One: Getting Started (Python Lesson 49)

Advertisements

Hello loyal readers,

I hope you all had a wonderful and relaxing holiday with those you love-I know I sure did. I did promise you all new and exciting programming content in 2024, so let’s get started!

My first post of 2024 (and first several posts of 2024 for that matter) will do something I haven’t done on this blog in its nearly 6-year existence-game development! Yup, that’s right, we’ll learn how to make a simple game using Python’s pygame package. And yes, this game will include graphics (so we’re making something way cooler than a simple text-based blackjack game or something like that).

Let’s begin!

Setting ourselves up

Before we even start to design our game, let’s install the pygame package using the following line either on our IDE or command prompt-pip install pygame.

Next, let’s open our IDE. You could technically use Jupyter notebook to start creating the game, but for something like game creation that utilizes graphics (and likely lots of code) I’d suggest an IDE like Spyder.

Now, where do we begin?

To start, here are the first three lines of code we should include in our script:

import pygame
from pygame.locals import *
import sys

What game will I teach you how to program? Well, in this series of posts, we’ll learn to make our own BINGO clone.

Why BINGO? Well, compared to many other games I could possibly teach you to program, BINGO seemed like a relatively easy first game to learn to develop as it doesn’t involve multiple levels, much scoring, health tracking, or final bosses (though we could certainly explore games that involve these concepts later on).

Let’s start coding!

First off, since we are programming a BINGO game, we’ll need to draw squares. 30 of them, to be precise, as simple BINGO games utilize a 5×5 card along with five squares at the top that contain the letters B, I, N, G, and O.

Seems simple enough to understand right? Let’s see how we code it!

class Square(pygame.sprite.Sprite):

def __init__(self):

super(Square, self).__init__()


self.surf = pygame.Surface((75, 75))


self.surf.fill((50, 205, 50))

self.rect = self.surf.get_rect()



pygame.init()



screen = pygame.display.set_mode((800, 600))



square1 = Square()

First of all, to draw the BINGO squares, we’ll first need to create a Square class and pass in the pygame.sprite.Sprite parameter into it like so-class Square(pygame.sprite.Sprite).

What is the Sprite class in pygame? For those who are familiar with fanatsy works (e.g. Shrek, Lord of the Rings), a sprite is a legendary mythical creature such as a pixie, fairy, or elf (among others). In pygame a sprite simply represents a 2D image or animation that is displayed on the game screen-like the squares we’ll need to draw for our BINGO board.

The next line-the one that begins with super-allows the Square class to inherit all of the methods and capabilites of the Sprite class, which is necessary if you want the squares drawn on the game screen.

The following three lines set the drawing surface (and in turn, the size) of the square, set the color of each square on the gameboard using RGB color coding (yes, you can make the squares different colors, but I’m keeping it simple and coloring all the squares lime green), and get the rectangular area of each square, respectively.

The next two lines initiate the instace of the game-using the line pygame.init()-and set the size of the screen (in pixels). In this case, we’ll use an 800×600 pixel screen.

The last line initiates a square object for us to draw. The interesting thing to note here is that even though we’ll ultimately need to draw 30 squares for our BINGO board, we only need one square object since we can draw that same square object 30 different times in 30 different places.

Even with all this code, we’ll still need to actually draw the squares onto our game screen-this code just ensures that we have the ability to do just that (it doesn’t actually take care of the graphics drawing).

Let’s run the game!

Now that we have created the squares for our BINGO game and imported the necessary packages, let’s figure out how to get our game running! Check out this chunk of code that helps us do just that!

gameOn = True


while gameOn:


for event in pygame.event.get():

if event.type == KEYDOWN:


if event.key == K_BACKSPACE:

gameOn = False




elif event.type == QUIT:

gameOn = False

First, we have our boolean variable gameOn, which indicates whether or not our game is currently running (True if it is, False if it isn’t).

The while loop that follows is a great example of event handling (I think this is the first time I mention it on this blog), which is the process of what your program should do in various scenarios, or events. This while loop will keep running as long as gameOn is true (in other words, as long as the game is running).

You’ll notice two event types that will shut the game down, KEYDOWN and QUIT. In the case of KEYDOWN, the game will shut down only if the backspace key is pressed. In the case of the QUIT event, the game will quit if the user presses the X close button on the window. However, something to note about the QUIT event is that pressing X alone doesn’t quit the game-I know because I tried using the X button to quit the game and ran into an unresponsive window that I ended up force-quitting. Don’t worry, I’ll explain how to quit the game properly later in this post.

Drawing the squares

Now that we have a means to keep our game running (or close it if we so choose), let’s now draw the squares onto the gameboard. Here’s the code to do so:

screen.blit(square1.surf, (40, 75))

screen.blit(square1.surf, (115, 75))
screen.blit(square1.surf, (180, 75))
screen.blit(square1.surf, (255, 75))
screen.blit(square1.surf, (330, 75))
screen.blit(square1.surf, (40, 150))
screen.blit(square1.surf, (115, 150))
screen.blit(square1.surf, (180, 150))
screen.blit(square1.surf, (255, 150))
screen.blit(square1.surf, (330, 150))
screen.blit(square1.surf, (40, 225))
screen.blit(square1.surf, (115, 225))
screen.blit(square1.surf, (180, 225))
screen.blit(square1.surf, (255, 225))
screen.blit(square1.surf, (330, 225))
screen.blit(square1.surf, (40, 300))
screen.blit(square1.surf, (115, 300))
screen.blit(square1.surf, (180, 300))
screen.blit(square1.surf, (255, 300))
screen.blit(square1.surf, (330, 300))
screen.blit(square1.surf, (40, 375))
screen.blit(square1.surf, (115, 375))
screen.blit(square1.surf, (180, 375))
screen.blit(square1.surf, (255, 375))
screen.blit(square1.surf, (330, 375))
screen.blit(square1.surf, (40, 450))
screen.blit(square1.surf, (115, 450))
screen.blit(square1.surf, (180, 450))
screen.blit(square1.surf, (255, 450))
screen.blit(square1.surf, (330, 450))

pygame.display.flip()

Even though you’ll only need to create one square object, you’ll need to draw that object 30 different times since the BINGO board will consist of 30 squares drawn in a 6×5 matrix. To draw the squares, you’ll need to use the following method-screen.blit(square1.surf, (x-coordinate, y-coordinate). The screen.blit(...) method drawes the squares onto the screen and it takes two parameters-the square1.surf, which is the surface of the square and a two-integer tuple stating the coordinates where you want the square placed (x-coordinate first, then y-coordinate).

After the 30 instances of the screen.blit() method, the pygame.display.flip() method is called, which simply updates the game screen to display the 30 squares. You might’ve thought the screen.blit() method already accomplishes this, but this method simply draws the squares while the pygame.display.flip() method updates the game screen to ensure the squares are present.

Quitting the game

As I mentioned earlier in this post, I’ll show you how to properly quit the game. Here are the two lines of code needed to do so:

pygame.quit()

sys.exit()

To properly end the pygame session, you’ll need to include these two commands in your code. Why do you need them both? Wouldn’t one command or the other work?

You need both commands because the pygame.quit() method simply shuts down the active pygame module while the sys.exit() method proprely shuts down the entire window.

And now, let’s see our work!

Now that we’ve got the basic game outline set up, let’s see our work by running our script!

As you see here, we simply have one giant lime-green square. However, that lime green square consists of the 30 squares we drew earlier-the squares are simply drawn on top of each other, hence why the output looks like one big square. Don’t worry, in the next post we’ll make this square look more like a BINGO board!

A small coding improvement

As you noticed earlier in this post, I was calling the screen.blit() method 30 times while drawing the squares. However, there is a much better way to accomplish this:

x = [40, 115, 180, 255, 330]

y = [75, 150, 225, 300, 375, 450]

for xcoord in x:
for ycoord in y:
screen.blit(square1.surf, (xcoord, ycoord))

In this example, I placed all possible x and y coordinates into arrays and drew each square by looping through the values in both arrays. Here’s the output of this improved approach:

As you see, not only did we improve the process for drawing the squares onto the game screen, but we also got the same result we did when we were calling the screen.blit() method 30 times.

For your reference, the code

Just in case you want it, here’s the code we used for our game development in this post (and we will certainly change it throughout this series of posts). I’m copying the code here since WordPress won’t let me upload .PY files:

import pygame

from pygame.locals import *
import sys

class Square(pygame.sprite.Sprite):
def __init__(self):
super(Square, self).__init__()

self.surf = pygame.Surface((75, 75))

self.surf.fill((50, 205, 50))
self.rect = self.surf.get_rect()

pygame.init()

screen = pygame.display.set_mode((800, 600))

square1 = Square()

gameOn = True

while gameOn:
for event in pygame.event.get():

if event.type == KEYDOWN:

if event.key == K_BACKSPACE:
gameOn = False

# Check for QUIT event
elif event.type == QUIT:
gameOn = False

x = [40, 115, 180, 255, 330]
y = [75, 150, 225, 300, 375, 450]

for xcoord in x:
for ycoord in y:
screen.blit(square1.surf, (xcoord, ycoord))

# Update the display using flip
pygame.display.flip()

pygame.quit()
sys.exit()

Thanks for reading, and I look forward to having you code along with me in 2024!

Michael

And Now Let’s Create Some AI Art (Midjourney Version)(AI pt. 15)

Advertisements

Hello everybody,

Michael here, and for my final post of 2023, I wanted to try something a little different! Usually on this blog, I like to only use tools that are open-source (aka free)-this way, all of you can follow along with my tutorials.

However, for this post I wanted to try something different-Midjourney! Just like DALLE-2, Midjourney is an AI text-to-art generator (you may recall that I explored DALLE-2 in the post /And Now Let’s Create Some AI Art! (AI pt.6)). However, unlike DALLE-2, Midjourney cannot be used for free. But since I wanted to fool around with Midjourney, I thought I could do post on it for all of my loyal readers!

Let’s begin!

Five fast facts about Midjourney

In my intro paragraph, I did mention that Midjourney was an AI text-to-art generator. Here are five more fast facts about Midjourney:

  • It works in a very similar manner to DALLE-2 in the sense that both tools are text-to-art generators.
  • However, unlike DALLE-2, Midjourney wasn’t developed by OpenAI (it was created by Midjourney Labs).
  • As of this writing, Midjourney is currently in open beta mode, as has been the case since its creation in July 2022.
  • Midjourney utilitzes Discord as an interface to generate its AI art.
  • As long as you’re on a paid subscription, you can generate as many images as you want (unlike DALLE-2, where the free trial limits you to a certain number of image generations a month)

Setting up Midjourney

Before we start polaying aroung with the magic of Midjourney, you’ll need two things to set it up:

  • A Midjourney subscription
  • A Discord account (I’ll explain this later)

If you need assistance setting up Midjourney, please follow the stpes in this link-https://docs.midjourney.com/docs/quick-start.

Now, why would you need a Discord account? See, even though Midjourney is separate from Discord, I mentioned earlier that Midjourney currently uses Discord as its interface to generate AI art via a Midjourney Discord bot (which makes Midjourney a bit more convoluted to set up than DALLE-2).

And Now Let’s Make Some Midjourney AI Art

Once you’ve gotten Midjourney set up, let’s get started creating our very own AI art!

First, let’s open up our Discord Midjourney bot:

When you open up the bot, you’ll see the bot’s homepage. To start creating Midjourney art, go to any of the channels that start with newbies.

As you can see here, I am currently in the newbies-122 channel, which is where I can start generating AI art.

To begin with the AI art generation, I will first run the /imagine command and then type the prompt A Christmas card featuring Santa Claus and his reindeer saying "Happy Holidays To You" that is drawn in pencil sketch with lots of color. Let’s see what Midjourney spits out!

As you can see, after a few minutes, Midjourney will (just like DALLE-2) spit out four different images based off the prompt you submitted. Also, if you look at each image closely, you’ll see that Midjourney, like DALLE-2, doesn’t have the hang of generated coherent text (but it sure is good at generating gibberish).

However, you may be wondering what all of these buttons below the generated images do. Allow me to explain:

  • The U1-U4 buttons allow you to output only one of the four generated images. U1 represents the image in the upper left hand corner, U2 represents the image in the upper right hand corner, U3 represents the image in the lower left hand corner and U4 represents the image in the the lower right hand corner.
  • The V1-V4 buttons also represent the four images (V1=upper left hand image, V2=upper right hand image, V3=lower left hand image, V4=lower right hand image) but unlike the U1-U4 buttons, these buttons allow you to modify the prompt on an individual image-or as Midjourney calls it, “remixing” each image.
  • The refresh button allows you to generate four different images with the same prompt.
  • A note about these buttons: you can also use them for other users’ images, not just your own (I mean, it is fun to see other users’ prompts).

A little note about the Midjourney interface

If you’re playing around with Midjourney, you’ll notice that you’re far from the only one generating AI art. In fact, there are certianly going to be thousands of users at any given trying to generate their own AI art. While I think it’s neat that anyone can log onto Midjourney at any time, it also makes the user interface less user-friendly since if you want to find your generated art, you’ll need to do quite a bit of scrolling!

This image was taken at 11:32AM, so this should give you an idea as to how many people are generating Midjourney art at once!

Luckily, if you want to easily find your Midjourney art, head on over to https://www.midjourney.com/explore, log in to your Midjourney account, and navigate to My Images:

In this interface, I have all of the images I generated through Midjouney throughout the duration of my subscription. This way, in case I want to find any image I generated on Midjourney, all I need to do is go here.

  • As you can see from the screenshot above, the functionality to /imagine new prompts from this interface has yet to be implemented as of this writing (December 2023).

You can even click on an image to see its prompt in case you want to use and/or modify that prompt for a future image generation:

As you see here, this image was generated as part of the Christmas card prompt I wrote earlier. Personally, even though I asked the image to generate a Happy Holidays To You card featuring Santa and his reindeer, I instead get whatever this is. Santa has one reindeer in this image (and its wearing what I think is a necklace of leaves for some reason). The reindeer has six legs. There is no text in this image. The other creatures in this images look like two gerbils and two gremlin-things (it would be a stretch to call them elves). At least there’s something resembling a Christmas village in the background.

Let’s try some other scenarios

Next up, let’s try some other Midjourney scenarios! First up, let’s see Midjourney’s capabilities for generating realistic looking photographs.

Since it’s the holidays, let’s try this prompt next-/imagine A Nikon photo of the Avengers at a Christmas party at Avengers tower. Iron Man, Black Widow, Hulk, Captain America, Thor, and Hawkeye are there. 16:9 aspect ratio

Yes, Midjourney can even set aspect ratios and camera model-styles for the AI images it generates. Let’s take a look at the images we got from this prompt:

Throughout these four images, the only character that Midjourney seems to get right each time is Iron Man. Marvel fans like myself will liekly recognize several mistakes throughout these four images:

All of this just goes to show you that Midjourney, like DALLE-2, doesn’t have the best attention to detail when generating AI art.

The images I got when I used this prompt-/imagine A Nikon photo of Kang The Conqueror and Thanos at a Christmas Party, 16:9 aspect ratio-weren’t much better. Here’s one such image:

I mean, at least Midjourney made Thanos purple, but Kang the Conqueror is another story entirely (unless he happens to be one of Kang’s many variants).

Let’s try generating AI people!

So now that we’ve explored some AI art-generation scenarios, let’s try something different! As you may recall from the post And Now Let’s Create Some AI Art! (AI pt.6), DALLE-2 wasn’t the best when it came to generating images of real people. However, let’s see if Midjourney is up for the task!

Here’s the prompt I’ll use-/imagine Stephen Curry drawn in Simpsons style

And here are the generated images:

Not gonna lie, I’m surprised that not only did Midjourney generated a pretty accurate-looking Stephen Curry, but also that it generated the correct Golden State Warriors logo and generated the text Golden State Warriors correctly. However, Midjourney didn’t really replicate the Simpsons art style and in the third image, it got Steph’s jersey number wrong (he wears a #30 jersey, not a #35-which is the number Kevin Durant wore during his stint with the Warriors).

Now, just as I did with my DALLE-2 experiment, I’ll try to generate an image of a female public figure and see where that goes. Here’s the prompt I used-/imagine A drawing of Margot Robbie in colored pencil sketch, and here’s the output:

Not gonna lie, but I’m amazed at how much Midjourney’s generated images actually resemble the real Margot Robbie. Unlike DALLE-2, which didn’t allow me to generate image of female public figures for some reason, Midjourney does allow for these types of image generations and does a scarily accurate job of it too.

And now, let’s go to an AI-generated place!

So, we’ve tested how well Midjourney can replicate pop culture and people, but let’s see how well it knows places. Here’s the prompt I’ll use-/imagine A neon rendering of Bicentennial Capitol Mall State Park in Nashville, TN-and here’s the output:

From these images, I see that Midjourney at least got the Capitol part right (Bicentennial Capitol Mall State Park does have a view of the Tennessee state capitol building from the park), but the four images generated look like they could be a part of Downtown DC, not Downtown Nashville. At least Midjourney included a park in each of these four images.

If you’re wondering what Bicentennial Capitol Mall State Park looks like, here’s a picture of it (more specifically, the amphitheater in all its glory):

This is just a small sampling of the things Midjourney is capable of, and although it doesn’t always have the best attention to detail (or greatest text-generation abilities), it is still an amazing AI tool-though it can never, ever, ever replace human creativity or talent (or your friendly neighborhood coding tutorial writer).

With all that said, thank you all for another wonderful year of programming and development (and I hope you learned something along the way). Have a happy and festive holiday season to you all, and I’ll see you in 2024 for another amazing year of development and learning! Keep calm and code on!

AI-generated Santa wishes you a happy holiday season!

Michael!

Python Lesson 48: Image Borders (AI pt. 14)

Advertisements

Hello everybody,

Michael here, and in today’s post, we’ll explore image border creation using OpenCV!

Let’s create some image borders!

Before we start creating image borders, here’s the image we’ll be using for this lesson:

This is an image of Fry and Bender pumpkins (two main characters from the animated sitcom Futurama if you weren’t familiar) that I created with a few Sharpie markers this past Halloween (creative I know).

Now, let’s read in our image to the IDE in RGB form:

import cv2

import matplotlib.pyplot as plt

pumpkins = cv2.imread(r'C:\Users\mof39\Downloads\20231031_231020.jpg', cv2.IMREAD_COLOR)
pumpkins = cv2.cvtColor(pumpkins, cv2.COLOR_BGR2RGB)

Now that we’ve done that, let’s add a simple yellow border around our image:

pumpkinsNormalized = pumpkins / 255.0


pumpkinsWithYellowBorder = cv2.copyMakeBorder(pumpkinsNormalized, 30, 30, 30, 30, cv2.BORDER_CONSTANT, value=[1, 1, 0])

plt.figure(figsize=(10, 10))
plt.imshow(pumpkinsWithYellowBorder)
plt.show()

To add a simple yellow rectangular border around our image, we’ll need to use the cv2.copyMakeBorder() method and include the following parameters:

  • The image where you will add a border
  • Four integers indicating the border’s thickness (in pixels) on the top, bottom, left, and right sides of the border, respectively
  • One of five possible OpenCV image border modes-in this case, I used the border mode cv2.BORDER_CONSTANT, which adds a simple colored rectangular border to the image.
  • A three-integer array indicating the border color in RGB notation (more on that shortly)

Now, you may see something unfamiliar above-pumpkinNormalized. This indicates that I have normalized the image. What does that mean?

In this case, normalizing an image means scaling all the pixels-and in turn, colors-to ensure that all of the pixels and colors in the image are using the same scale. This is important since I discovered that OpenCV has a bug (as of this writing) where when you try to add a border to an image, it will add a border to the greyscale version of the image (even if you read it into the IDE in RGB scale). Normalizing the image ensures that the border will be added to the RGB version of the image.

  • It likely goes without saying, but if you want the border on the RGB image, please remember to add the border to the normalized image, not the initial image you read into the IDE (even if you did convert it to RGB colorscale).

Now, as for the RGB color array, let’s dive into that. The array works the same way as other forms of RGB notation (e.g. RGB(210, 12, 12) represents the intensity of the red, green and blue colors) but with one key difference-the values used only range from 0 to 1 (including decimals between these two integers). The values still represent the intensity of the red, green and blue colors in the image, respectively, but the representation looks more like the percent of a certain color and less like an integer. In this example, since I wanted a simple yellow border on the image, I used the array [1, 1, 0] which is the same as saying RGB(255, 255, 0) because in both notations creating yellow requires full (or 100%) red and green but no blue.

Other border modes

Now, one thing to keep in mind with OpenCV’s image border modes is that, as of December 2023, there are no ways to make fun dotted/dashed/dotted-and-dashed borders yet (though of course, that could change).

However, aside from the simple cv2.BORDER_CONSTANT mode that creates a simple rectangular border around the image, there are four other OpenCV image border modes. Let’s explore one of them-cv2.BORDER_REFLECT, which adds a reflective border to the image. To change the border from a simple colored border to a reflective one, let’s change this one line of code:

pumpkinsWithReflectiveBorder = cv2.copyMakeBorder(pumpkinsNormalized, 40, 40, 40, 40, cv2.BORDER_REFLECT)

All I had to do to modify the code to get a reflective border was change this one line by changing the border thickness (30 to 40 pixels), removing the color (since this border mode doesn’t require a color), and changing the border mode to cv2.BORDER_REFLECT and, well, check out the image with a reflective border:

In this image, there are a few spots where the reflective border is hard to find, but it’s there (and quite prominent in the bottom side of the image where if you look close enough, you can see the reflections of the pumpkins.

Thank you,

Michael

Python Lesson 47: Image Rotation (AI pt. 13)

Advertisements

Hello loyal readers,

Michael here, and in this post, we’ll cover another fun OpenCV topic-image rotation!

Let’s rotate an image!

First off, let’s figure out how to rotate images with OpenCV. Here’s the image we’ll be working with in this example:

This is an image of the Jumbotron at First Horizon Park in Nashville, TN, home ballpark of the Nashville Sounds (Minor League Baseball affilate of the Milwaukee Brewers)

Now, how do we rotate this image? First, let’s read in our image in RGB colorscale:

import cv2
import matplotlib.pyplot as plt

ballpark=cv2.imread(r'C:\Users\mof39\Downloads\20230924_140902.jpg', cv2.IMREAD_COLOR)
ballpark=cv2.cvtColor(ballpark, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 10))
plt.imshow(ballpark)

Now, how do we rotate this image? Let’s start by analyzing a 90-degree clockwise rotation:

clockwiseBallpark = cv2.rotate(ballpark, cv2.ROTATE_90_CLOCKWISE)
plt.figure(figsize=(10, 10))
plt.imshow(clockwiseBallpark)

All it takes to rotate an image in OpenCV is the cv2.rotate() method and two parameters-the image you wish to rotate and one of the following OpenCV rotation codes (more on these soon):

  • cv2.ROTATE_90_CLOCKWISE (rotates image 90 degrees clockwise)
  • cv2.ROTATE_180 (rotates image 180 degrees clockwise)
  • cv2.ROTATE_90_COUNTERCLOCKWISE (rotates image 270 degrees clockwise-or 90 degrees counterclockwise)

Let’s analyze the image rotation with the other two OpenCV rotation codes-first off, the ballpark image rotated 180 degrees clockwise:

clockwiseBallpark = cv2.rotate(ballpark, cv2.ROTATE_180)
plt.figure(figsize=(10, 10))
plt.imshow(clockwiseBallpark)

Alright, pretty impressive. It’s an upside down Jumbotron!

Now to rotate the image 270 degrees clockwise:

clockwiseBallpark = cv2.rotate(ballpark, cv2.ROTATE_90_COUNTERCLOCKWISE)
plt.figure(figsize=(10, 10))
plt.imshow(clockwiseBallpark)

Well well, it’s the amazing rotating Jumbotron!

And yes, in case you’re wondering, the rotation code cv2.ROTATE_90_COUNTERCLOCKWISE is the correct rotation code for a 270 degree clockwise rotation because a 90 degree counterclockwise rotation is the same thing as a 270 degree clockwise rotation.

Now, I know I just discussed three possible ways to rotate an image. However, what if you wanted to rotate an image in a way that’s not 90, 180, or 270 degrees. Well, if you try to do so with the cv2.rotate() method, you’ll get an error:

clockwiseBallpark = cv2.rotate(ballpark, 111)
plt.figure(figsize=(10, 10))
plt.imshow(clockwiseBallpark)

TypeError: Image data of dtype object cannot be converted to float

When I tried to rotate this image 111 degrees clockwise, I got an error because the cv2.rotate() method will only accept one of the three rotation codes I mentioned above.

Let’s rotate an image (in any angle)!

However, if you want more freedom over how you rotate your images in OpenCV, use the cv2.getRotationMatrix2D() method. Here’s an example as to how to use it:

height, width = ballpark.shape[:2]
center = (width/2, height/2)
rotationMatrix = cv2.getRotationMatrix2D(center,55,1)
rotatedBallpark = cv2.warpAffine(ballpark, rotationMatrix,(height, width)) 
plt.figure(figsize=(10, 10))
plt.imshow(rotatedBallpark)

To rotate an image in OpenCV using an interval that’s not a multiple of 90 degrees (90, 180, 270), you’ll need to use both the cv2.getRotationMatrix2D() and the cv2.warpAffine() method. The former method sets the rotation matrix, which refers to the degree (either clockwise or counterclockwise) that you wish to rotate this image. The latter method actually rotates the image.

Since both of these are new methods for us, let’s dive into them a little further! First off, let’s explore the parameters of the cv2.getRotationMatrix2D() method:

  • center-this parameter indicates the center of the image, which is necessary for rotations not at multiples-of-90-degrees. To get the center, first retrieve the image’s shape and from there, retrieve the height and width. Once you have the image’s height and width, create a center 2-element tuple where you divide the image’s width and height by 2. It would also be ideal to list the width before the height, but that’s just a programmer tip from me.
  • angle-the angle you wish to use for the image rotation. In this example, I used 55, indicating that I want to rotate the image 55 degrees clockwise. However, if I wanted to rotate the image 55 degrees counterclockwise, I would’ve used -55 as the value for this parameter.
  • scale-This is an integer that represents the factor you wish to use to zoom in the rotated image. In this example, I used 1 as the value for this parameter, indicating that I don’t want to zoom in the rotated image at all. If I’d used a value greater than 1, I’d be zooming in, and if I was using a value less than 1, I’d be zooming out.

Next, let’s explore the parameters of the cv2.warpAffine() method!

  • src-The image you wish to rotate (in this example, I used the base ballpark image)
  • M-The rotation matrix you just created for the image using the cv2.getRotationMatrix2D() method (ideally you would’ve stored the rotation matrix in a variable).
  • dsize-A 2-element tuple indicating the size of the rotated image; in this example, I used the base image’s height and width to keep the size of the rotated image the same.

Now for some extra notes:

  • Why is the rotation method called warpAffine()? This is because the rotation we’re performing on the image is also known as an affine transformation, which transforms the image (in this case rotating it) while keeping its same shape.
  • You’ll notice that after rotating the image using the cv2.warpAffine method, the entire image isn’t visible on the plot. I haven’t figured out how to make the image visible on the plot but when I do, I can certainly share my findings here. Though I guess a good workaround solution would be to play around with the size of the plot.

Thanks for reading, and for my readers in the US, have a wonderful Thanksgiving! For my readers elsewhere on the globe, have a wonderful holiday season (and no, this won’t be my last post for 2023)!

Python Lesson 46: Image Blurring (AI pt. 12)

Advertisements

Hello everybody,

Michael here, and in this post, we’ll explore image blurring! Image blurring is a pretty self-explanatory process since the whole point of image blurring is to make the image, well, blurry. This process has many uses, such as blurring the background on your work video calls (and yes, I do that all the time during work video calls).

Intro to image blurring

Now that we know a little bit about image blurring, let’s explore it with code. Here’s the image that we’ll be using:

The photo above is of Stafford Park, a lovely municipal park in Miami Springs, FL.

Unlike image eroding, image blurring has a pretty self-explanatory description since the aim of this process is to, well, blur images. How can we accomplish this through OpenCV?

Before we get into the fun image-blurring code, let’s discuss the three main types of image blurring that are possible with OpenCV:

  • Gaussian blur-this process softens out any sharp edges in the image
  • Median blur-this process helps remove image noise* by changing pixel colors wherever necessary
  • Bilateral blur-this process makes the central part of the image clearer while making any non-central part of the image fuzzier

*For those unfamiliar with image processing, image noise is (usually unwanted) random brightness or color deviations that appear in an image. Median blurring assists you with removing image noise.

Now that we know the three different types of image blurring, let’s see them in action with the code

Gaussian blur

Before we start to blur the image, let’s read in the image in RGB colorscale:

import cv2
import matplotlib.pyplot as plt

park=cv2.imread(r'C:\Users\mof39\OneDrive\Documents\20230629_142648.jpg', cv2.IMREAD_COLOR)
park=cv2.cvtColor(park, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 10))
plt.imshow(park)

Next, let’s perform a Gaussian blur of the image:

gaussianPark = cv2.GaussianBlur(park, (7, 7), 10, 10)
plt.figure(figsize=(10,10))
plt.imshow(gaussianPark)

Notice anything different about this image? The sharp corners in this photo (such as the sign lettering) have been smoothed out, which is the point of Gaussian blurring (to smooth out rough edges in an image).

Now, what parameters does the cv2.GaussianBlur() method take?

  • The image you wish to blur (park in this case)
  • A 2-integer tuple indicating the size of the kernel you wish to use for the blurring process-yes, this is similar to the kernels we used for image erosion in the previous post Python Lesson 45: Image Resizing and Eroding (AI pt. 11) (we’re using a 7-by-7 kernel here).
  • Two integers that represent the sigmaX and sigmaY of the Gaussian blur that you wish to perform. What are sigmaX and sigmaY? Both integers represent the numerical factors you wish to use for the image blurring-sigmaX being the factor for horizontal blurring and sigmaY being the factor for vertical blurring.

A few things to keep in mind regarding the Gaussian blurring process:

  • Just as you did with image erosion, ensure that both dimension of the blurring kernel are positive and odd-numbered integers (like the 7-by-7 kernel we used above).
  • sigmaX and sigmaY are optional parameters, but keep in mind if you don’t include a value for either of them, both will default to a 0 value, which might not blur your picture the way you intended. Likewise, if you use a very high value for both sigmas, you’ll end up with a very, very blurry picture.

Median blur

Since median blurring helps remove image noise, we’re going to be using this altered park image with a bunch of noise for our demo:

Next up, let’s explore the median blur with our noisyPark image:

medianPark = cv2.medianBlur(noisyPark, 5)
plt.figure(figsize=(10,10))
plt.imshow(medianPark)

As you can see, median blurring the noisyPark image cleared out a significant chunk of the image noise! But how does this function work? Let’s explore some of its parameters:

  • The image you wish to blur (noisyPark in this case)
  • A single integer indicating the size of the kernel you wish to use for the blurring process-yes, this is similar to the kernels we used for Gaussian blurring, but you only need a single integer instead of a 2-integer tuple (we’re using a 5-by-5 kernel here). The integer must be a positive and odd number since the kernel must be an odd number (same rules as the Gaussian blur apply here for kernel creation).

Bilateral blur

Last but not least, let’s explore bilateral blurring! This time, let’s use the non-noise altered park image.

bilateralPark = cv2.bilateralFilter(park, 12, 120, 120) 
plt.figure(figsize=(10,10))
plt.imshow(bilateralPark)

Wow! As I mentioned earlier, the purpose of bilteral blurring is to make a central part of the image clearer while make other, non-central elements of the image blurrier. And boy, does that seem to be the case here since the central element of the image (the park sign and all its lettering) really pops out while everything in the background seems a bit blurrier.

How does the cv2.bilateralFilter() function work its magic? Here’s how:

  • The image you wish to blur (park in this case)
  • The diameter (in pixels) of the region you wish to iterate through to blur-in this case, I chose a 12-pixel diameter as my “blurring region”. It works in a similar fashion to the kernels we used for our “erosion region” in the previous lesson.
  • The next two integers-both 120-are the sigmaColor and sigmaSpace variables, respectively. The sigmaColor variable is a factor that considers how much color should be considered in the blurring process while the sigmaSpace variable is a factor that considers the proximity of several elements in the image (such as the runners in the background). The higher both of these values are, the blurrier the background will be.

Thanks for reading,

Michael

Python Lesson 45: Image Resizing and Eroding (AI pt. 11)

Advertisements

Hello everybody,

Michael here, and today’s lesson will be our first foray into image manipulation with OpenCV. We’ll learn two new techniques for image manipulation-resizing and eroding.

Let’s begin!

Resizing images

First off, let’s start this post by exploring how to resize images in OpenCV. Here is the image we’ll be working with throughout this post

This is an image of a hawk on a soccer goal at Sevier Park (Nashville, TN), taken in August 2021.

Now, how could we possible resize this image? Take a look at the code below (and yes, we’ll work with the RGB colorscale version of the image) to first upload and display the image:

import cv2
import matplotlib.pyplot as plt

hawk=cv2.imread(r'C:\Users\mof39\Downloads\20210807_172420.jpg', cv2.IMREAD_COLOR)
hawk=cv2.cvtColor(hawk, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(9, 9))
plt.imshow(hawk)

Before we start with resizing the image, let’s first get the image’s size (I’ll explain why this information will be helpful later):

print(hawk.shape)

(3000, 4000, 3)

To get the image’s size, use the print([image variable].shape) method. This method returns a 3-integer tuple that indicates height, width and dimensions; in the case of the hawk image, the image is 3000 px tall by 4000 px wide and 3-dimensional overall (px stands for pixels-recall that computer image dimensions are measured in pixels).

Now, how can we resize this image? Take a look at the code below:

smallerHawk = cv2.resize(hawk, (2000, 1500))
plt.imshow(smallerHawk)

As you can see here, we reduced the size of the hawk image in half without cropping out any of the image’s elements. How did we do that? We used the cv2.resize() method and passed in not only the hawk image but also a 2-integer tuple-(2000, 1500)-to indicate that I wanted to reduced the size of the hawk image in half.

Now, there’s something interesting about the (2000, 1500) tuple I want to point out. See, when we listed the shape of the image, the 3-inter tuple that was returned (3000, 4000, 3) listed the image’s height before its width. However, in the tuple we passed to the cv2.resize() method, the image’s width (well, half of the image’s width) was listed before the image’s height (rather, half the height). Listing the width before the height allows you to properly resize the image the way you intended.

Now, what happens when we make this image bigger? Take a look at the following code:

largerHawk = cv2.resize(hawk, (6000, 8000))
plt.figure(figsize=(9, 9))
plt.imshow(largerHawk)

Granted, the image may not appear larger at first, but that’s mostly due to how we’re plotting it on MATPLOTLIB. If you look closely at the tick marks on each axis of the plot, you will see that the image size has indeed doubled to 6000 by 8000 px.

Image erosion

The next image manipulation technique I want to discuss is image erosion. What does image erosion do?

The simple answer is that image erosion, well, erodes away the boundaries on an image’s foreground object, whatever that may be (if it helps, think of the OpenCV image erosion process like geological erosion, only for images). How the image erosion is acoomplished is more complicated than a simple method like cv2.resize(), however let’s explore the image erosion process in the code below:

import numpy as np
kernel = np.ones((5,5), np.uint8)
erodedHawk = cv2.erode(hawk, kernel)
plt.figure(figsize=(10,10))
plt.imshow(erodedHawk)

OK, so aside from the cv2.erode() method, we’re also creating a numpy array. Why is that?

Well, the numpy array kernel (aptly called kernel) is essentially a matrix of 1s like so:

[1 1 1 1 1
 1 1 1 1 1
 1 1 1 1 1
 1 1 1 1 1
 1 1 1 1 1]

Since we specified that our matrix is of size (5, 5), we get a 5-by-5 matrix of ones. Pretty simple right? Here are some other things to keep in mind when creating the kernel:

  • Make sure the kernel’s dimensions are both odd numbers to ensure the presence of a central point in the kernel.
  • Theoretically, you could create a kernel of 0s, but a kernel of 1s is better suited for image erosion.
  • Ideally, you should also include np.uint8 as the second parameter in the kernel creation. For those who don’t know, np.unit8 stands for numpy unsigned 8-bit integer. The reason I suggest using this parameter is because doing so will store the matrix as 8-bit integers, which is beneficial for memory optimization in computer programs.

Now, how does this kernel help with image erosion? See, the 5-by-5 kernel that we just created iterates through the image we wish to erode (hawk in this case) by checking if each pixel that borders the kernel’s central pixel is set to 0 or 1. If all pixels that border the central pixel in the image are set to 1, then the central pixel is also set to 1. Otherwise, the central pixel is set to 0?

What do the 0s and 1s all mean here? Notice how the leaves on the tree in this eroded image look slightly darker than the tree leaves in the original image. That’s because image erosion manipulates an image’s foreground (in this case, OpenCV percieves the tree as the foreground) by removing pixels from the foreground’s boundaries, thus making certain parts of the image appear slightly darker after erosion. The slightly darker tree leaves make the image of the hawk stand out more than it did in the original image.

Thanks for reading,

Michael

Python Lesson 44: Image Color Spaces (AI pt. 10)

Advertisements

Hello everybody,

Michael here, and today’s post will cover how to understand color spaces in images.

Granted, I’ve previously discussed various colorscales you can find in computer programming in this post-Colors in Programming-but in this post, we’ll take a deeper dive into the use of colors in images.

But first, what is a color space?

Well, as the header above asks, what is a color space? In the context of images, a color space is a way to represent a certain color channel in an image.

Still confused? Let’s take the image we used in our first computer vision lesson (it can be found here Python Lesson 42: Intro To Computer Vision Part One-Reading Images (AI pt. 8)). Assuming we’re analyzing the RGB image of Orange Boy, the color spaces simply represent the intensities (or spaces) of red, blue and green light in the image.

And now let’s analyze colorspaces in OpenCV

As the header says, let’s examine color spaces in Open CV! Here’s the image we’ll be using for this tutorial:

This is a photo of autumn at Bicentennial Capitol Mall State Park in Nashville, TN, taken in October 2022.

Before we start exploring colorspaces, let’s read in this image to our IDE using the RGB colorscale (which means you should remember to convert the image’s default colorscale):

import cv2
import matplotlib.pyplot as plt
park=cv2.imread(r'C:\Users\mof39\Downloads\20221022_101648.jpg', cv2.IMREAD_COLOR)
park=cv2.cvtColor(park, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(18, 18))
plt.imshow(park)

Great! Now that we have our RGB image, let’s explore the different color channels!

First off, let’s examine this image’s red colorspace! How can we do that? Take a look at the code below:

B, G, R = cv2.split(park)
plt.figure(figsize=(18, 18))
plt.imshow(R, cmap='Reds')

plt.show()

In this example, I used the first line of code (the one with B, G, R) to split the image into three distinct colorspaces-blue, green and red.

Aside from the standard plt.figure() functions, I did make a slight modification to the plt.imshow() function. Instead of simply passing in the park image, I passed in the R variable so that we see the image’s red colorspace AND passed in the cmap parameter with a value of Reds to display the red colorspace in, well, red.

Now, how can we show the green and blue colorspaces? We’d use the same logic as we did for the red colorspace, except swap the R in the plt.imshow() function for G and B for the green and blue colorspaces and change the cmap values to Greens and Blues, respectively.

Here’s the image’s blue colorspace:

plt.figure(figsize=(18, 18))
plt.imshow(B, cmap='Blues')
plt.show()

And here’s the image’s green colorspace:

plt.figure(figsize=(18, 18))
plt.imshow(G, cmap='Greens')
plt.show()

As you can see from all three of these color-altered images, the sky, park lawn, and buildings in the background are ceratinly more coloed than the trees, which look bright-white in all three color-altered images.

A little more on colorspace

Now that we’ve examined image colorspaces a bit, let’s see how we can find the most dominant color in an image! Take a look at the code below (which uses the park image):

from colorthief import ColorThief
colorthief =
ColorThief(r'C:\Users\mof39\Downloads\20221022_101648.jpg')
dominantColor = colorthief.get_color(quality=1)
print(dominantColor)

(120, 94, 72)

Granted, you could realistically use a package like numpy to find the most dominant color in an image, but the colortheif module is a much more efficient (and more fun) approach.

  • In case you didn’t know, you’ll need to pip install the colortheif module.

After creating a ColorTheif object (and passing in the image’s filepath on your computer), you’ll then need to use the get_color() method and pass in quality=1 as this method’s parameter. Using the quality=1 parameter will extract the most dominant color in an image.

  • You can certainly use a variable to store the most dominant color like I did here (I used the dominantColor variable) but that’s completely optional.

Once you print the dominant color, you’ll notice you don’t get a color name, but rather a 3-integer tuple that represents the frequency of red, blue and green in the image (the tuple is based off of the RGB colorscale). In this case, our most dominant color is RGB(120, 94, 72). What does that translate to?

In plain English, the most dominant color in this image is a very desaturated dark orange. If you take a look at the original RGB image, it makes sense not only because of the color of the park lawn but also due to all the trees and buildings in the image.

What if you want to know not only the most dominant color in an image, but also its color palette? The colortheif module can help you there too! Here’s how:

palette = colorthief.get_palette(color_count=5)
print(palette)

[(120, 94, 72), (179, 192, 208), (130, 160, 197), (28, 31, 32), (182, 141, 108)]

Just as colortheif did with the most dominant color in an image, all colors are represented as RGB 3-integer tuples. The get_palette() function helps returns the top X colors used in the image-the X is represented by the value of the color_count parameter. In plain English, five colors used in this image include:

  • very desaturated dark orange (the most dominant color)
  • grayish blue
  • slightly desaturated blue
  • very dark almost black blue
  • slightly desaturated orange.

This feature is like imagining a painter’s palette in Python form-pretty neat right! As you can see, our painter’s paletter for the park image has a lot of blues and oranges.

Thanks for reading!

Michael

Python Lesson 43: Intro to Computer Vision Part Two-Writing & Saving Images (AI pt. 9)

Advertisements

Hello everybody,

Michael here, and in today’s post, we’ll continue our introduction to computer vision, but this time we’ll explore how to write images to a certain place on your computer using OpenCV.

Let’s begin!

Let’s write an image!

Before we begin, here’s the image we will be working with:

This is an image of Simba/Orange Boy and his sister Marbles (on Christmas Day 2017 excited to get their presents), both of whom got an acknowledgement in The Glorious Five-Year Plan Part Two.

Now, here’s the code to read in the image to the IDE:

cats=cv2.imread(r'C:\Users\mof39\Downloads\IMG_4778 (1).jpg', cv2.IMREAD_COLOR)
cats=cv2.cvtColor(cats, cv2.COLOR_BGR2RGB)

Once this image is read onto the IDE, here’s the code we’d use to not only write this image but also save it to a certain directory on your computer:

import os

imagePath = r'C:\Users\mof39\Downloads\IMG_4778 (1).jpg'
imageDestination = r'C:\Users\mof39\OneDrive\Documents'

cats = cv2.imread(imagePath)
os.chdir(imageDestination)

savedImage = 'simbaandmarbles.jpg'
cv2.imwrite(savedImage, cats)

What does all of this code mean? Let me explain.

You’ll first need to import the os module (or pip install it if you haven’t already done so)-this will help you write and save the image to a specific directory.

The two variables that follow-imagePath and imageDestination-represent the current location of the image on my computer and the location on my computer where I wish to write and save the image, respectively. In this case, my image is currently located in my Downloads folder and I wish to send it to my Documents folder.

The cats variable is the result of reading in the image of the cats to the IDE. The os.chdir() function takes in one parameter-the string containing the image destination path. This function will allow you to set the destination of the image to ensure that your image is written and saved to the location you set in the imageDestination variable.

The savedImage variable allows you to set both the image name and the image extension to the image you wish to save and write-in this case, my image will be named simbaandmarbles and it will have a jpg extension.

Last but not least, use the cv2.imwrite() function to write and save the image to your desired directory (represented by the imageDestination variable). You’ll notice that this function takes two parameters-savedImage and cats in this example-but why might that be? Take a look at the code above and you’ll see why!

See, savedImage is the name we’d like to use for the saved image-this is a necessary paramater because we want OpenCV to save the image using the name/extension we specified. cats saves the image itself to the desired location (or imageDestination).

  • You should certainly change the values of imagePath, imageDestination and savedImage to reflect accurate image locations/destinations/names/extensions on your computer!

But wait! How do we know if our code worked? Take a look at the output below:

True

Since the output of this code returned True, the image was succesfully written and saved to the desired destination on our computer! Want another way to verify if our code worked? Take a look at my Documents folder (which was my imageDestination):

As you can see, my image was succesfully written to my Documents folder with the name/extension I specified (simbaandmarbles/JPG).

Now we know the image was succesfully written and saved to the Documents folder, but how do we know if the rendering worked? In other words, did OpenCV zoom in or crop too much of the image (or change the colorscale during the writing/saving process)? Click on the image to find out:

As you can see, not only did OpenCV correctly write and save the image to the correct location, but it also wrote and saved the image without changing the zoom-in/zoom-out view or the image’s colorscale!

And that, dear readers, is how you can write and save an image anywhere on your computer using eight simple lines of code!

Thanks for reading.

Michael

Python Lesson 42: Intro To Computer Vision Part One-Reading Images (AI pt. 8)

Advertisements

Hello everybody,

Michael here, and today’s Python lesson looks to be quite a bit of fun! Wonder why?

We’re introducing a new topic today-computer vision!

Now, one of the Python concepts I’ve covered over the course of this blog’s run is NLP-or natural language processing, which is a form of AI. Computer vision (or CV for short) is also a form of AI, but instead of using text language, computer vision deals with images.

An intro to OpenCV

To further explore computer vision in Python, we’ll introduce a package called OpenCV. If you don’t already have this package installed, here’s the line of code to run (either on your IDE or command prompt) to install it:

pip install opencv-python
  • Yes, you’ll need to install the opencv-python package. Using the line pip install opencv won’t work.
  • If you pip installing this or any other package on your IDE, include the ! before the pip install line.

Once we get our package installed, let’s start exploring the fun stuff it can do!

And now, time to explore OpenCV

The OpenCV concepts we’ll explore in this post are two of its simpler functions-reading an image onto the IDE and displaying that image onto the IDE!

Before we begin, here’s the image I’ll be using for this section in case you want to follow along with this tutorial.

  • Regular readers of Michael’s Programming Byte’s will likely recognize this cat as Simba/The Orange Boy who (along with his sister Marbles) got a well-deserved recognition on the second part of my fifth anniversary post (The Glorious Five-Year Plan Part Two).

Now to read the image into Python, here’s the code we’ll use for this tutorial:

import cv2

cat = cv2.imread(r'C:\Users\mof39\Downloads\IMG_5427.jpg', cv2.IMREAD_COLOR)
cv2.imshow("image", cat)
cv2.waitKey(60000)
cv2.destroyAllWindows()

Not sure what all this code means? Let’s break it down:

  • To use Python’s OpenCV package, you’d need to run the line import cv2, not import opencv.
  • The cv2.imread() function takes two parameters-the path to your image and the mode you want to use to read the image into the IDE. As of this writing, OpenCV has 13 different modes to read an image into the IDE! The IMREAD_COLOR mode allows you to display the image as a standard color image
  • The cv2.imshow() function also takes two parameters-the word image and the image variable (cat in this case) and unlike the cv2.imread() function, this function displays the image on your IDE (in this case, a window will pop up)
  • The cv2.waitKey() method takes an integer as a parameter. The point of this function is to close the window with the image after a specified number of milliseconds-I used 60000 milliseconds in this case (equal to 1 minute).
  • The cv2.destroyAllWindows() function takes no parameters since all it does is, well, destroy all open windows in the IDE after the specified number of millseconds (specified by the cv2.waitKey() function).

The cv2.waitKey() and cv2.destoryAllWindows() functions are optional to include, but if you don’t include them, the window with the image will simply stay open unless you close it.

Troubleshooting and the magic of OpenCV colorscales

Now here’s what the image looks like after it’s read into the IDE with OpenCV:

I’ll be honest, even though OpenCV did succesfully read and display the image, it didn’t do a good job of processing the image. How can we fix this? Take a look at the code below:


import matplotlib.pyplot as plt
cat=cv2.imread(r'C:\Users\mof39\Downloads\IMG_5427.jpg', cv2.IMREAD_COLOR)
plt.figure(figsize=(8,8))
plt.imshow(cat)

Now, let’s see what kind of output we get:

OK, so what did we do differently? Well, we used a combination of the MATPLOTLIB* and OpenCV packages to read our image onto the IDE and display it as well. While that combination of packages did the trick when it came to displaying the entire image on the IDE, you’ll notice that the cat (along with most other things in the image) looks rather blue.

*The plt.imshow() function is a MATPLOTLIB function.

Why might that be? After all, we did use the IMREAD_COLOR mode to read the image into the IDE, so how did we get this result? Take a look at the revised code below for a solution to this issue:

cat=cv2.imread(r'C:\Users\mof39\Downloads\IMG_5427.jpg', cv2.IMREAD_COLOR)
cat=cv2.cvtColor(cat, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(8,8))
plt.imshow(cat)

Here’s the output we get:

I used most of the same code as I did for the previous example, with one additional line-the line that uses the cv2.cvtColor() function.

Why did I need to use this function? Well, if you’re wondering why Orange Boy looked a little blue in the first image, that’s because OpenCV read the first image in the BGR colorscale, which is the OpenCV default method of reading images.

The BGR colorscale stands for blue, green, red colorscale and it reads in images based off of the intensity of blue, green, and red light in the image. Since the blue light in the image was the most intense, the image looked blue upon display.

Now, here’s where the cv2.cvtColor() function comes in! This function takes two parameters-the image whose colorscale you want to convert and the conversion mode you want to use for the image. There are over a dozen conversion modes you can use for the image, but in this example, we’ll use the COLOR_BGR2RGB conversion mode, which changes the image’s colorscale from BGR to RGB (or red, green, blue). After running this function, then run the plt.imshow() function on the color-converted Orange Boy image so that you can see the normal-looking image, not the Blue Boy image.

Now, why does OpenCV read in images with a BGR colorscale by default? The reason for this is because back when OpenCV was first developed (in the summer of 2000), the BGR colorscale was the more popular colorscale to use for computer graphics. However, the RGB colorscale has since become the more widely-adopted colorscale for computer graphics (including Python packages like MATPLOTLIB)-and in all my honesty, I think it makes images display a lot better.

Another colorscale conversion, why not?

Now that I’ve taught you the basics of reading images into the Python IDE with the OpenCV package, let’s have a little fun with the colorscale conversions, shall we? Take a look at the code below:

cat=cv2.imread(r'C:\Users\mof39\Downloads\IMG_5427.jpg', cv2.IMREAD_COLOR)
cat=cv2.cvtColor(cat, cv2.COLOR_BGR2RGB)
cat=cv2.cvtColor(cat, cv2.COLOR_RGB2HSV)
plt.figure(figsize=(8,8))
plt.imshow(cat)

In this example, I converted the image of Orange Boy first to the RGB colorscale and then to the HSV (hue, saturation, value) colorscale. As you can see, the cat looks like something out of a thermal camera.

  • If you want an image like the one I got here, then you’ll need to first convert the image to RGB colorscale BEFORE converting to another colorscale!
  • The cv2.cvtColor() function contains about 150 different color conversion modes!
  • Just in case you’re wondering what the HSV colorscale is, here’s a simple three-bullet-point explanation:
  • H stands for hue, which represents the type of color the image contains. The value for H is represented on the color wheelusing a value from 0-360 degrees (as in angle degrees).
  • S stands for saturation, which represents the intensity (or well, saturation) of the color. The value for S is represented on a scale between 0% (fully desaturated pure grey)-100% (fully saturated pure color)
  • V stands for value (or brightness), which represents how bright or dark the color will appear in the image. The value for V, just as with S, is represented on a scale between 0% (fully black)-100% (fully illuminated color).

Thanks for reading and be sure to stay tuned for Part Two of Intro to Computer Vision!

Michael