Connect 4 Algorithm¶
In order for the robot to play competitively against a human, a minimax game algorithm is used to choose the best move in response to the human player. The algorithm’s ‘game loop’ is implemented inside the main file, but for general tidiness all of the algorithm functions are stored in a separate file.
Setup Functions¶
Create a numpy array of zeroes to represent the Connect 4 board. This will be populated with numbered pieces throughout the game.
def create_board():
board = np.zeros((ROW_COUNT, COLUMN_COUNT))
return board
Set up the board to print out in the terminal in a way that makes it visually easy to play with the computer.
def pretty_print_board(board):
flipped_board = np.flipud(board)
print("\033[0;37;41m 0 \033[0;37;41m 1 \033[0;37;41m 2 \033[0;37;41m 3 \033[0;37;41m 4 \033[0;37;41m 5 \033[0;37;41m 6 \033[0m")
for i in flipped_board:
row_str = ""
for j in i:
if j == 1:
#print(yellow)
row_str +="\033[0;37;43m 1 "
elif j ==2:
row_str +="\033[0;37;44m 2 "
else:
#print black
row_str +="\033[0;37;45m "
print(row_str+"\033[0m")
Note
Due to restrictions on the version of numpy, np.flipud(board)
was used instead of the most up to date version: np.flip(board)
.
If you are using the most up to date version of numpy, you can update this function (although it will not break if you do not - numpy has reasonably good backwards-compatibility).
Warning
It’s important to understand that the algorithm fills the board from the top down. In real life, the board fills up from the bottom. np.flipud(board)
flips the board around a horizontal axis, making it the correct visual orientation for Connect 4.
In future, for clarity and ease of understanding, the placement of the pieces within the board will be referred to in the same way that it would happen in real life (bottom up).
There are 4 functions that are used when placing a piece on the board.
- To get all locations in the board that could contain a piece (i.e. have not yet been filled):
def get_valid_locations(board):
valid_locations = []
for col in range(COLUMN_COUNT):
if is_valid_location(board, col):
valid_locations.append(col)
return valid_locations
- To check if there is a valid location in the chosen column:
def is_valid_location(board, col):
return board[ROW_COUNT - 1][col] == 0
- To check which row the piece can be placed into (i.e. the next available open row):
def get_next_open_row(board, col):
for r in range(ROW_COUNT):
if board[r][col] == 0:
return r
Note
This function serves as virtual ‘gravity’. Instead of placing a piece anywhere in the column, by getting only the next open row, piece placement is restricted to the next available slot from the bottom of the board, as would happen in real life. This also means that the only input that is required is now the column (the row is automatically found and assigned).
- Finally, to place a piece in the next available row, in the chosen column:
def drop_piece(board, row, col, piece):
board[row][col] = piece
Analysis Functions¶
When the human player (Player 1) has made a move, the drop_piece
function will update the numpy array board
with a number 1 in the specified position. In order for the game algorithm (Player 2) to choose the best move to play in response, it has to understand and analyse the current board state. This is done using a ‘windowing’ technique.
In the following function, horizontal, vertical, positive (upward sloping) and negative (downward sloping) diagonal windows are created. These windows are then used to scan all possible 4-piece sections of the board, and evaluate (score) each window based on its contents.
This evaluation is performed separately by the evaluate_window
function, which is called within the score_position
function, and explained in further detail below.
def score_position(board, piece):
score = 0
# Score centre column
centre_array = [int(i) for i in list(board[:, COLUMN_COUNT // 2])]
centre_count = centre_array.count(piece)
score += centre_count * 3
# Score horizontal positions
for r in range(ROW_COUNT):
row_array = [int(i) for i in list(board[r, :])]
for c in range(COLUMN_COUNT - 3):
# Create a horizontal window of 4
window = row_array[c:c + WINDOW_LENGTH]
score += evaluate_window(window, piece)
# Score vertical positions
for c in range(COLUMN_COUNT):
col_array = [int(i) for i in list(board[:, c])]
for r in range(ROW_COUNT - 3):
# Create a vertical window of 4
window = col_array[r:r + WINDOW_LENGTH]
score += evaluate_window(window, piece)
# Score positive diagonals
for r in range(ROW_COUNT - 3):
for c in range(COLUMN_COUNT - 3):
# Create a positive diagonal window of 4
window = [board[r + i][c + i] for i in range(WINDOW_LENGTH)]
score += evaluate_window(window, piece)
# Score negative diagonals
for r in range(ROW_COUNT - 3):
for c in range(COLUMN_COUNT - 3):
# Create a negative diagonal window of 4
window = [board[r + 3 - i][c + i] for i in range(WINDOW_LENGTH)]
score += evaluate_window(window, piece)
return score
The figure below shows the scanning range for this score_position
function. It is unnecessary to use every index of the board as a starting position for a scanning window, because in many positions some windows would then extend over the sides of the board.
As a result, there are only 69 positions in which the scanning window needs to be deployed. The yellow highlight shows the applicable scanning range, and the red squares are an example of a scanning window in the maximum required position.
The evaluate_window
function is called in the last line of each scoring block. The output of this evaluation function (a numerical score value) is stored in the score
variable, which is updated every time a higher score is found.
When the scanning is complete, the window with the best score is passed to the game algorithm to play a move. Note that this scoring mechanism is required, but the minimax function, which will be explained in further detail, makes some elements of this function much less important.
In any given scanning position, the contents of that window are evaluated for ‘strength’, e.g. a window that contains 3 consecutive pieces from the same player is a ‘strong’ state, and has a higher score. This means that the algorithm is more likely to try and create board states that are ‘strong’ - i.e. prioritise connecting 3 pieces together, rather than connecting 2.
def evaluate_window(window, piece):
score = 0
# Switch scoring based on turn
opp_piece = PLAYER_PIECE
if piece == PLAYER_PIECE:
opp_piece = BOT_PIECE
# Prioritise a winning move
# Minimax makes this less important
if window.count(piece) == 4:
score += 100
# Make connecting 3 second priority
elif window.count(piece) == 3 and window.count(EMPTY) == 1:
score += 5
# Make connecting 2 third priority
elif window.count(piece) == 2 and window.count(EMPTY) == 2:
score += 2
# Prioritise blocking an opponent's winning move (but not over bot winning)
# Minimax makes this less important
if window.count(opp_piece) == 3 and window.count(EMPTY) == 1:
score -= 4
return score
The final element of the analysis is a ‘special case’ variation of the score_position
function. When 4 pieces are joined together, this signifies the game has been won.
After every move, the board needs to be scanned by both the score_position
function, and also the winning_move
function, which will exit out of the game loop if it sees a winning move.
def winning_move(board, piece):
# Check valid horizontal locations for win
for c in range(COLUMN_COUNT - 3):
for r in range(ROW_COUNT):
if board[r][c] == piece and board[r][c + 1] == piece and board[r][c + 2] == piece and board[r][c + 3] == piece:
return True
# Check valid vertical locations for win
for c in range(COLUMN_COUNT):
for r in range(ROW_COUNT - 3):
if board[r][c] == piece and board[r + 1][c] == piece and board[r + 2][c] == piece and board[r + 3][c] == piece:
return True
# Check valid positive diagonal locations for win
for c in range(COLUMN_COUNT - 3):
for r in range(ROW_COUNT - 3):
if board[r][c] == piece and board[r + 1][c + 1] == piece and board[r + 2][c + 2] == piece and board[r + 3][c + 3] == piece:
return True
# check valid negative diagonal locations for win
for c in range(COLUMN_COUNT - 3):
for r in range(3, ROW_COUNT):
if board[r][c] == piece and board[r - 1][c + 1] == piece and board[r - 2][c + 2] == piece and board[r - 3][c + 3] == piece:
return True
Algorithm¶
The algorithm chosen to play Connect 4 is the minimax algorithm. Minimax is a backtracking algorithm which is commonly used in decision-making and game theory to find the optimal move for a player. This makes it a perfect choice for two-player, turn-based games.
In the minimax algorithm, the two players are the maximiser and minimiser. The maximiser is trying to get the highest score possible, and the minimiser is trying to get the lowest score possible.
The best / worst scores are calculated by the evaluate_window
function, and stored in the score
variable, described in the previous section.
At the start of every turn, minimax will scan the board’s remaining valid locations and calculate all possible moves, before backtracking and choosing the optimal move for that turn. This will be either the best or worst move, depending on whether it is the maximiser or minimiser’s turn. The assumption is that minimax (maximiser) can play optimally, as long as the human player (minimiser) also plays optimally. This will not always be the case, but does not lead to significant gameplay problems.
Before implementing the minimax algorithm, the two game-terminating states need to be defined as terminal nodes. If there is a winning move from either player, or if the board fills up without a win (leading to a draw), the game will end.
def is_terminal_node(board):
return winning_move(board, PLAYER_PIECE) or winning_move(board, BOT_PIECE) or len(get_valid_locations(board)) == 0
The minimax algorithm for the Connect 4 game is implemented below.
def minimax(board, depth, alpha, beta, maximisingPlayer):
valid_locations = get_valid_locations(board)
is_terminal = is_terminal_node(board)
if depth == 0 or is_terminal:
if is_terminal:
# Weight the bot winning really high
if winning_move(board, BOT_PIECE):
return (None, 9999999)
# Weight the human winning really low
elif winning_move(board, PLAYER_PIECE):
return (None, -9999999)
else: # No more valid moves
return (None, 0)
# Return the bot's score
else:
return (None, score_position(board, BOT_PIECE))
if maximisingPlayer:
value = -9999999
# Randomise column to start
column = random.choice(valid_locations)
for col in valid_locations:
row = get_next_open_row(board, col)
# Create a copy of the board
b_copy = board.copy()
# Drop a piece in the temporary board and record score
drop_piece(b_copy, row, col, BOT_PIECE)
new_score = minimax(b_copy, depth - 1, alpha, beta, False)[1]
if new_score > value:
value = new_score
# Make 'column' the best scoring column we can get
column = col
alpha = max(alpha, value)
if alpha >= beta:
break
return column, value
else: # Minimising player
value = 9999999
# Randomise column to start
column = random.choice(valid_locations)
for col in valid_locations:
row = get_next_open_row(board, col)
# Create a copy of the board
b_copy = board.copy()
# Drop a piece in the temporary board and record score
drop_piece(b_copy, row, col, PLAYER_PIECE)
new_score = minimax(b_copy, depth - 1, alpha, beta, True)[1]
if new_score < value:
value = new_score
# Make 'column' the best scoring column we can get
column = col
beta = min(beta, value)
if alpha >= beta:
break
return column, value
Note
The implementation of this minimax algorithm also contains Alpha-Beta pruning. There is no point following a decision-tree branch any further if the initial move scores less optimally than an alternative move that has already been discovered. Alpha-Beta pruning works to ‘prune’ away these branches, leaving a much smaller, more optimised decision tree.
This technique is used to reduce the time complexity of the algorithm, which in this context is important, as there are many other parts of the game loop that are time consuming (e.g. Motion Planning). The game algorithm can now run reliably in under 500ms, even when looking 4 moves into the future.
Limitations / Improvements¶
There are some key limitations to the algorithm, but they did not need to be directly addressed as they were outside the scope for this project.
- Lack of scalability
Due to the hard-coded nature of the scanning procedure, the board size, the number of connected pieces required to win, and the scanning window size cannot be changed without causing major errors. This would not be particuarly difficult to fix, but would require a different, more adaptive scanning structure and further definition of static variables.
- Incomplete win structure
During stress testing, it became clear that the algorithm would not make a winning move if there were two or more possible winning moves available. This is presumably because it could not decide between equally weighted branches, and therefore made the ‘next best’ move. This problem did not impact the algorithm’s success rate, however, because as soon as the human player filled one of the possible winning spaces, the algorithm would win the game using the other.