Extending the Telegram Bot for Remote Control Image Classification – Jamie Maguire


This is the fourth instalment of a miniseries where you see how to build an end-to-end AI home security and cross platform system.

To recap, the main requirements for this system are:

  • Detect motion and capture photo
  • Create message with photo attached
  • Send message to defined Telegram bot
  • Detect who is in the photo, if the person is me, then do not invoke the Telegram bot with a message and image

 

I recommend reading the following before continuing:

 

In part 4, we extend the existing Telegram bot.

It will be extended to allow the management of AI image functionality with the API from part 3, from a cell phone.

This means you don’t need to be at a computer to train the AI with new data or to test the image recognition capability.

~

The following details each component that forms the end-to-end .NET API, their use and where they are installed:

 

The table, details each file, component, and where they run:

Component Purpose Runs on
pir_camera.py (Python) Manages motion detection and capturing images Raspberry Pi 4
botmessage.py (Python) Handles interaction with the Telegram bot Raspberry Pi 4
CLIP Server (Python) Handles image vectorisation. Exposes /embed endpoint. Raspberry Pi 4
.NET API (C#) Exposes 2 endpoints /train /match.  Uses CLIP server Windows laptop
Telegram bot Used to manage AI from mobile phone and send/receive notifications.

 

With an explanation of these detailed, we can see how to enhance the Telegram bot with commands that let us interact with the .NET API.

~

Enhancing the Telegram Bot with Commands

To make our bot truly interactive, we added command-based capabilities to control the image recognition pipeline remotely.

/train

When the user sends /train Jamie, the bot:

  1. Waits for a photo.
  2. Sends that image to the .NET API along with the label.
  3. The API then stores the CLIP embedding of the image under the label John.

 

This creates a known identity for future comparisons.

/match

When the user sends /match:

  1. The bot captures a new photo and posts it to the API’s /match endpoint.
  2. The API generates the CLIP embedding and compares it with stored embeddings using cosine similarity.
  3. If a match is found above a defined threshold, the response might be: MATCH: John (Similarity: 0.92), otherwise, it will respond with: NO MATCH FOUND

 

This allows the system to identify whether a person in the image is known or unknown -all controlled from Telegram.

~

Adding a Command Handlers to the Telegram Bot (botmessage.py)

The script botmessage.py handles all interaction with the Telegram bot.  We need to extend this to handle the following commands:

  1. Detect /train command.
  2. Detect /match command.
  3. Wait for the next photo upload from the user.
  4. Once photo is received:
    • Download the photo to the file system
    • Send it to the .NET API (/train or /match endpoints)

 

Implementing the above lets us:

  • Send a photo for matching using mobile phone
  • Compare an image from the mobile phone

 

Logging is also added to help us debug.  Let’s dig into the handlers.

~

Start

This handles when the bot is invoked.  Main features:

  • Triggered by the /start command.
  • Replies with a welcome message and basic usage instructions.
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
await update.message.reply_text("Welcome to the Home Security Bot!\nUse /train 

Train_Command

This command is triggered by the /train command.  e.g., /train Jamie.

  • Stores the label in the variable pending_train. id is used as a unique reference.
  • Tells the user to send a photo for training.
  • If no label is provided, it prompts the user to add one.
async def train_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
    if context.args:
        label = context.args[0]
        pending_train[update.effective_chat.id] = label
        logger.info(f"Train command received with label: {label}")
        await update.message.reply_text(f"Send a photo to train label: {label}")
    else:
        await update.message.reply_text("Please provide a label, e.g., /train Jamie")

Match_Command

This command is triggered by the /match> command.  e.g., /match.

  • Triggered by /match.
  • Sets the label for the user’s session to None, indicating this is for matching rather than training.
  • Asks the user to send a photo to perform face matching.
async def match_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
    pending_train[update.effective_chat.id] = None
    logger.info("Match command received")
    await update.message.reply_text("Send a photo to match.")

Handle_Photo

This handler is responsible for a few things:

  • Fetches the current mode for this chat ID from the pending_train dictionary.  This tells the bot whether the user is in training mode (label is a string like “Alice”) or matching mode (label is None).
  • Checks if the message contains a photo:
    • If no photo is found, it replies with an error message and exits.
  • If a message does contain a photo:
    • Retrieves the file_id of the highest resolution version of the photo.
    • Telegram sends photos in multiple sizes; photo[-1] is the best quality.
  • Uses the bot to get a reference to the file on Telegram’s servers.
  • Constructs a filename to save the image locally.
  • Uses the chatID and fileID to make it unique.
  • IMAGE_DIR is a predefined folder path where images are stored.
  • Downloads the file from Telegram servers to the specified filename path on the local drive.
  • Logs the download path for debugging or tracking purposes.

 

Here is the code:

async def handle_photo(update: Update, context: ContextTypes.DEFAULT_TYPE):
    chat_id = update.effective_chat.id
    label = pending_train.get(chat_id)

    if not update.message.photo:
        await update.message.reply_text("No photo detected.")
        return

    file_id = update.message.photo[-1].file_id
    new_file = await context.bot.get_file(file_id)

    filename = os.path.join(IMAGE_DIR, f"{chat_id}_{file_id}.jpg")
    await new_file.download_to_drive(filename)
    logger.info(f"Downloaded photo to: {filename}")

 

That explains the commands, now let’s look at routing requests to the .NET API.

~

Routing to the .NET API

With the commands implemented, we need 2 POST requests to invoke the /train  or /match endpoints on the .NET API.

if label:
            logger.info(f"Sending to /train with label: {label}")
            url = f"{API_BASE}/train"
            files = {'file': open(filename, 'rb')}
            data = {'label': label}
        else:
            logger.info("Sending to /match")
            url = f"{API_BASE}/match"
            files = {'file': open(filename, 'rb')}
            data = {}

           response = requests.post(url, files=files, data=data)
           response.raise_for_status()
           reply = response.text
           logger.info(f"API Response: {reply}")

 

Let’s see how these scripts fit in with the overall solution so far.

~

The Updated Script

Here we have the updated script in its entirety.  Some additional logging is added to help me debug:

import os
import logging
import requests
from telegram import Update, Bot
from telegram.ext import (
    ApplicationBuilder,
    CommandHandler,
    MessageHandler,
    ContextTypes,
    filters,
)

# === CONFIG ===
TOKEN = '[YOUR-TOKEN]'
API_BASE = 'http://192.168.1.150:5001/api/image'
SESSION_STATE = {}  # Tracks user state: waiting for /train or /match image
IMAGE_DIR = '/home/admin/repos/HomeSecuritySystem/images'
os.makedirs(IMAGE_DIR, exist_ok=True)

# === LOGGING SETUP ===
logging.basicConfig(
    format="[%(levelname)s] %(asctime)s - %(message)s",
    level=logging.INFO
)

logger = logging.getLogger(__name__)
pending_train = {}


# === BOT COMMAND HANDLERS ===

async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await update.message.reply_text("Welcome to the Home Security Bot!\nUse /train 

 

With the script written, it can now be tested.

~

Testing Using Mobile Phone

To test these components, we work through the following steps:

  1. Start the CLIP Server on the Raspberry Pi 4.
  2. Start the .NET API on the Windows laptop.
  3. Send the /train command to the Telegram bot from the cell phone.
  4. Send the /match command to the Telegram bot from the cell phone.

 

Let’s look at this in action.

Labelling an Image

With the CLIP server and .NET API running, we send the /train command.  The bot asks us to upload an image:

After a few milliseconds, the bot replies and informs us the operation was successful.

Invoking the Matching endpoint

Now that we’ve trained the bot with an image using the /train endpoint.  We can attempt an image match by supplying images from our cell phone.

First, we’ll try an image that bears no resemblance to training data:

Next, we’ll try an image that is similar:

Great.  At this point, we’ve tested both endpoints directly from the cell phone.

~

End to End Communication

We validated successful communication between all system components that form the AI home security system:

  • Real-time notifications over Telegram.
  • Remote training and matching of image embeddings.
  • Full remote control and feedback via Telegram.
  • Local network communication between Pi and .NET API.

 

At time stage, the system is ready for future enhancements using conversational AI, or even agentic reasoning

~

Summary

In Part 5, we’ll bring everything together from the previous parts of this series.

We also tidy up the code.  The complete solution will resemble the following:

Stay tuned.

~




Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment