The current bottleneck in AI media generation isn’t the technology; it’s the friction. If you are browsing on your phone and suddenly have an idea for an AI-generated image or video, the process is agonizingly slow. You have to open a browser, navigate to a platform like Midjourney or Sora, log in, construct your prompt, wait for the generation, download the file, and then switch back to your messaging app to share it.
We can eliminate this friction entirely.
A recent developer demonstration showcased how to bypass this clunky process by building a custom Telegram bot that serves as a direct pipeline to advanced AI media generation APIs. By leveraging the Claude Code CLI as an autonomous assistant, you can instruct your Telegram bot to generate high-quality images and video directly within a chat interface.
Here is the technical blueprint for building this system.
The Architecture of the Telegram Bot
The goal is to create a seamless, chat-based interface. The user will send a command (e.g., /image or /video), provide a prompt, select an aspect ratio, and the bot will return the generated media directly into the chat thread.
To achieve this, the architecture relies on three core components:
- The Telegram Bot API: To handle the user interface, capture the prompts, and deliver the final media.
- Klifgen API: A centralized API service that provides access to various AI media generation models, including Nano Banana (for images) and Grok Imagine (for video).
- Claude Code: An autonomous CLI assistant used to write the Python scripts and shell files that will orchestrate the bot’s logic.
Step 1: Establishing the Telegram Bot
Before writing any code, you need a functional bot instance on Telegram.
- Open Telegram and search for the official @BotFather.
- Send the /newbot command.
- Provide a name (e.g., “Jarvis Media Provider”) and a unique username ending in _bot.
- The BotFather will generate an HTTP API Token. Copy this string; it is the master key to your bot.
Step 2: Leveraging Claude Code for Autonomous Development
Instead of manually writing the Python logic to handle Telegram’s webhooks and API requests, the developer utilized Claude Code to generate the entire script autonomously.
By navigating to a local directory in the terminal and initializing Claude Code, you can provide a high-level prompt outlining the desired architecture.
The Prompt Strategy:
The instruction to Claude must be highly specific. The developer prompted Claude to create a Python script that uses the python-telegram-bot library. The script must listen for the /image and /video commands, capture the user’s text prompt, and present inline buttons for the user to select an aspect ratio (e.g., 16:9, 9:16, 1:1).
Once the user selects the ratio, the Python script must execute a local shell script (image.sh or video.sh), passing the prompt, duration, and aspect ratio as arguments.
Claude Code will autonomously:
- Write the requirements.txt file.
- Write the main bot.py script.
- Create a .env file to securely store the Telegram API token.
Step 3: Constructing the API Request Scripts
The Python bot handles the user interaction, but the actual media generation is handled by bash scripts that make cURL requests to the Klifgen API.
Again, Claude Code is used to write these scripts. The developer provided Claude with the exact cURL examples from the Klifgen API documentation.
For the image.sh script, the prompt instructed Claude to create a script that takes the prompt and aspect ratio arguments from the Python bot and sends a POST request to the request-nano-banana endpoint.
The API will return a JSON success response containing an image_url. The shell script must parse this URL and pass it back to the Python bot, which then sends the image to the user in Telegram.
Step 4: Handling Asynchronous Video Generation
Video generation is significantly more complex than image generation. Because rendering a video takes time, the API cannot immediately return a video file. Instead, it returns a Task ID.
The developer instructed Claude to write a video.sh script that handles this asynchronous process:
- The script sends a POST request to the request-grok-imagine endpoint with the prompt, duration (e.g., 6 or 10 seconds), and aspect ratio.
- The API returns a task_id.
- The script then enters a polling loop. Every 15 seconds, it sends a GET request to the query-status endpoint using the task_id.
- If the status is processing, the loop continues.
- If the status is completed, the script extracts the result_url (the final video file) and passes it back to the Python bot.
- The Python bot sends the video directly into the Telegram chat.
The Result: Frictionless Generation
By combining a messaging interface with an API aggregator and an autonomous coding assistant, the developer created a highly efficient workflow.
The user can simply open Telegram, type /video a tiger walking in the jungle, tap the 9:16 aspect ratio button, and tap the 6 seconds button. The bot handles the API routing, the polling, and the delivery, dropping a high-quality, AI-generated video directly into the chat thread a few minutes later.
This architecture demonstrates how developers can use AI not just to generate content, but to build the very infrastructure that makes accessing that content seamless.

