Documentation

Everything you need
to ship on day one.

Full documentation and base code examples for every use case. No hunting through forums. No guessing. Open the notebook, paste the code, hit Run All.

01
Getting Started
Zero to running in 10 minutes. Create a project, pick a GPU, run your first cell.
02
The Notebook
Cells, Run All, Console, environment setup, secrets — how it all fits together.
03
Models & GPUs
Pick the right GPU. Choose the right model. Day one access to newest releases.
04
Examples & Patterns
Real scripts for real jobs. Open in notebook and run. The Cascade, IOT, agents, tools.
New here?
Start with Getting Started. It walks you through creating a project, selecting a GPU, and running your first cell in plain English.
01 — Getting Started

You are early
to the future.
This is how you start.

The same compute that powers OpenAI, Anthropic, and Google — 16GB to 384GB of GPU — is now available to anyone with an idea and 10 minutes. No data center. No team. No investment. Most people don't know this exists yet. You do. That's the edge. And right now is better than ever — open source models are matching the APIs, new models drop weekly, and you get day-one access to all of them. This is the beginning of something, and the people here right now are the ones who will look back and say "I was there before it was obvious."

Step 1 — Sign Up

Create an account with Google or email. Add credits via Stripe to start running compute — pay-as-you-go, billed by the second, $0 when idle. No subscriptions, no trial period.

Step 2 — Create a Project

When you open SeqPU you'll see your project name in the top bar. A project holds your cells, environment, secrets, files, and GPU selection. Everything lives inside it. Auto-saves constantly — close the tab, come back tomorrow, it's exactly where you left it.

Step 3 — Pick Your GPU

The GPU selector strip runs across the top. Click one — green means active. That's your hardware for this run. Start with A100 80GB — it handles 90% of use cases at $2.50/hour, billed by the second.

CPU Only
T4 16GB
L4 24GB
A10G 24GB
L40S 48GB
A100 40GB
A100 80GB
H100 80GB
H200 141GB
B200 192GB
2× H200 284GB
2× B200 384GB

When you type code, the notebook reads it, detects your model, estimates VRAM usage, and shows it in the header — "Qwen3-14B (14B) — VRAM: ~31GB". You know if it fits before you spend a cent. Not running a model? Select CPU Only — $0.047/hour for scripts, API calls, web scraping, bots.

Step 4 — Add Code

The center panel is your notebook. Each cell is a block of Python. Paste your code in — or describe what you want to Claude and paste what it writes. Cells run in order, top to bottom, as one script.

You don't need to code
Open Claude. Describe what you want in plain English. Claude writes every line. Copy it. Paste it into a cell. Hit Run All. AI writes better code than most engineers. You just need to know what you want. The Console shows errors in real time — copy the error, paste it back to Claude, Claude fixes it. 2-4 rounds and it works. That's the whole process.

Step 5 — Hit Run All

The Run All button runs every cell in sequence. Watch the Console panel on the right — live output, errors, and system status stream in real time as your code executes on the GPU. When it finishes, the GPU spins down. You're billed for exactly the seconds it ran. Idle = $0.

Your first script — paste this, select A100 80GB, hit Run All
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=512)

result = llm.generate(["Hello, what can you do?"], params)
print(result[0].outputs[0].text)

That's a 14-billion parameter AI model running on your rented GPU. Your data never left your environment. No API key. No token bill. No third party. You just ran the same technology that powers ChatGPT — on your own hardware, under your control.

Now — Here's What You Can Do With It

Running a script is step one. What makes SeqPU different is what happens after your code works.

Publish as a Headless API

Click Publish → API Endpoint. Your notebook becomes a URL that other systems can call. Define inputs, set a markup (0-30%), and every time someone calls your endpoint, your code runs and you get paid. An API that charges money — built from a notebook in 5 minutes.

Publish as a UI Site

Click Publish → With UI. Add three HTML attributes to connect inputs and outputs. Visitors fill in forms, click Generate, and the GPU fires for exactly the seconds it needs. Your brand. Your URL. Your product. Visitors don't know or care what's behind it — they just use it and pay.

Connect a Telegram Bot

Go to Settings → Connections. Paste a Telegram bot token from @BotFather. Select which published tool it runs. Done — your AI answers from your phone. Discord, Slack, and WhatsApp work the same way. Three steps. Your bot, your name, your avatar.

Build Agents That Run Forever

Agents are scripts that make decisions. They read input, decide what to do, take action, check the result, and loop. The SeqPU SDK gives you seqpu.run() to spawn sub-jobs on any GPU, seqpu.tools.call() to call your published tools, and seqpu.notify() to send messages to Telegram, Discord, Slack, or WhatsApp. Schedule them on a cron. They run 24/7. CPU handles 80-90% of agent work at $0.047/hour — the GPU only fires when the agent actually needs to think.

Make Money While You Sleep

Every published tool runs without you. Credits checked automatically. GPUs spin up on demand, spin down when idle. Zero cost when nobody's calling. Stack them — your first tool makes $200/month, your second makes $500, your third makes $1,000. Each one took an afternoon. No boss. No schedule. No cap on earnings.

The new reality
A new model drops on HuggingFace. You download it, test it, publish it with markup — the same day. Everyone else waits weeks for API access from the big providers. You're already live and charging. Open source models are the same transformer architecture, the same algorithms, the same quality. The only thing the API providers add is hosting and a bill. Now you're the one hosting. Now you're the one billing.

Your Data Stays Yours

Every time you paste code into ChatGPT or call an AI API, your data leaves your control. With SeqPU, it never does. Your browser talks to your Cloudflare edge. Your job runs on your rented GPU. Results write to your own database. The entire chain is authenticated at every hop. Your server talking to your rented server. No third party sees your data. We do not train on your code. All your code and data belongs to you — fully, permanently, unconditionally.

This Is the Beginning

Every wave creates millionaires. Websites in 1996. Apps in 2008. SaaS in 2012. AI tools and agents are in that window right now. The technology works. The cost is dirt cheap. The demand is exploding. And most people have no idea this is possible.

You're here. You're reading this. You're one of the early ones. The normal person can now play the same games the hyperscalers play — the same GPUs, the same models, the same compute. The only difference between you and OpenAI is that they started three years ago. You're starting now.

Open the notebook. Describe what you want. Paste. Run. Publish. Start charging. That's the whole path. There is no risk. There is only upside.

← Back
Documentation Home
02 — The Notebook

Where ideas
become products.

This is a product factory. Write code in cells, test it with the Console, refine it until it's right, then click Publish — your notebook becomes a headless API, a UI site, or a Telegram bot that charges money while you sleep. 16GB to 384GB of GPU compute with a click. Pay by the second. Ship with a button click.

Notebook layout
3 panels
Left Panel
Files — drag & drop, add from URL, access at /data/filename
Custom Image — base image, pip/apt packages, auto-detect
Secrets — encrypted API keys, never in logs
Outputs — files your code produces, download or save
Center — Cells
Run All — assembles all cells into one script, runs top to bottom
Run Cell 1 — runs only the first cell (load model once)
+ Add Cell — new code block
Publish — turn this notebook into a product
Cells — numbered Python blocks, reorderable
Right — Console
Live output — everything your code prints, in real time
Errors — full Python tracebacks with line numbers
System messages — GPU allocation, model loading, exec time
Resize — drag the left edge wider or narrower

Cells — Your Code, Organized

Your code lives in cells — numbered blocks of Python. You break your script into cells for organization, like chapters in a book. But when you hit Run All, it assembles every cell into one script and executes it top to bottom as a single Python file.

Cell 1 + Cell 2 + Cell 3 = one continuous script. Variables from Cell 1 are available in Cell 2. Output from Cell 2 is available in Cell 3. They share state. The cells are for YOU — to think clearly, to iterate on pieces, to organize your logic. The GPU sees one file.

When you publish
All cells become one script. The published tool doesn't have cells — every call runs your concatenated code from top to bottom. Write it in cells. Test it in cells. Publish it as one product.
  • + Add Cell — adds a new empty cell at the bottom
  • Remove — deletes a cell (trash icon)
  • Reorder — move cells up or down with arrow buttons
  • Run Cell 1 — runs only the first cell in isolation. Load your model in Cell 1, iterate on Cell 2 twenty times without waiting for the model to reload. Saves minutes per iteration.

The pattern: Cell 1 loads your model (run once). Cell 2 does the work (iterate here). Cell 3 formats and delivers (output, notification).

Cell 1 — Load model once
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=1024)
print("Model loaded and ready")
Cell 2 — Do the work (iterate here)
from pathlib import Path

for doc in Path("/data").glob("*.txt"):
    text = doc.read_text()
    result = llm.generate([f"Summarize:\n{text}"], params)
    print(f"--- {doc.name} ---")
    print(result[0].outputs[0].text)

Files

Drag files onto the left panel or click Add URL to download from a link. Files up to 100MB each. Supported formats: CSV, JSON, images (PNG, JPG), audio (WAV, MP3), video (MP4), model weights (.safetensors, .pt), notebooks (.ipynb), and archives (.zip, .tar.gz).

Files are stored on a persistent volume. They survive across runs for the same project. Access them at /data/filename in your code.

Reading files in your code
from pathlib import Path
import json, pandas as pd

# Read a text file
text = Path("/data/report.txt").read_text()

# Read a CSV into a DataFrame
df = pd.read_csv("/data/sales.csv")
print(df.head())

# Read a JSON config
data = json.loads(Path("/data/config.json").read_text())

# List all files
for f in Path("/data").iterdir():
    print(f.name, f.stat().st_size)
Fast downloads
URL files are downloaded directly on the GPU server's connection — much faster than uploading from your laptop. Use Add URL for large datasets whenever possible.

Custom Image (Environment)

By default, your code runs on a Python 3.12 image with PyTorch, transformers, pandas, numpy, matplotlib, opencv, and 40+ common packages pre-installed. You don't need to set up an environment for most tasks.

If you need additional packages, open Custom Image in the left panel:

  • Name — give your environment a name (e.g., "my-llm-env")
  • Base Image — choose a starting point:
    • Python (default) — PyTorch + transformers + 40+ ML packages
    • vLLM — optimized for fast LLM inference. Use this when serving language models.
    • Python + Node — Python plus Node.js 22 LTS for JavaScript tools
    • Python + CUDA — NVIDIA CUDA 12.1 dev image for custom CUDA kernels
    • Minimal — bare Python 3.12. You install everything yourself.
  • Pip Packages — add any packages not in the base image. The platform auto-detects packages from your code.
Build once, use forever
Your environment builds once and persists. After the first build, every run starts with your packages already installed — no reinstalling every time.

The Console — Your Debugging Partner

The right panel. Everything your code prints appears in real time — line by line as it executes, not after the job finishes. This is the feedback loop that makes everything work.

The vibe coding loop:

  • Paste code from Claude → hit Run All
  • Console says FileNotFoundError: /data/report.pdf → wrong path, tell Claude
  • Console says CUDA out of memory → model too big, move up one GPU tier
  • Console says ModuleNotFoundError: No module named 'einops' → add the package to your environment
  • Console says result: Here is the summary... → it works, ship it

You don't need to understand the error. Copy it. Paste it into Claude. Claude fixes it. Paste the fix back. Run again. 2-4 rounds and it's done. The Console IS the teacher.

  • Live streaming — output appears line by line as your code executes
  • Errors — full Python tracebacks with line numbers
  • System messages — "Model loaded on cuda:0", "Downloading model...", "Execution time: 12.3s"
  • Resize — drag the left edge wider for long output, narrower for more code space
  • Clear — wipe output for the next run

GPU Selection — 16GB to 384GB

The strip across the top. 12 buttons. Click one — green highlight means active. That's your hardware for this run. From CPU Only ($0.047/core/hour) to 2× B200 384GB ($12.50/hour). Same notebook. Same code. Different button.

When you select CPU Only, a dropdown appears for core count: 0.25, 0.5, 1, 2, 4, or 8 cores with live hourly price. CPU for scripts, API calls, data processing — no GPU needed.

Smart detection: As you type, the notebook reads your code, detects your model (from LLM(model="...") or from_pretrained("...")), estimates parameter count and VRAM usage, and shows it in the header — "Qwen3-14B (14B) — VRAM: ~31GB". You know if it fits BEFORE you hit Run All.

Pay by the second
GPU spins up when you hit Run All. Spins down when your code finishes. Idle = free. No reserved instances. No monthly minimums. For most tasks, A100 80GB at $0.000694/sec is the sweet spot.

Model Caching — Load Once, Use Forever

First time you load a model from HuggingFace — download takes 2-10 minutes depending on model size. After that, it's cached on a persistent volume shared across ALL your projects.

Second run? Loads in seconds. Load Qwen3-14B in your translation notebook → it's cached → open your summarizer notebook → same model loads instantly. One cache for your entire account. You never download the same model twice.

Output Files — What Your Code Produces

When your code saves files — plt.savefig("chart.png"), image.save("/outputs/result.png"), Path.write_text() — the system captures them automatically. They appear in the sidebar with type icons:

  • 🖼 Images — PNG, JPG, generated charts, AI art
  • 🎬 Video — MP4, generated animations
  • 📊 Data — CSV, JSON, Parquet
  • 📄 Text — TXT, logs, reports
  • 🧠 Models — saved model weights

Click to download. Save to Drive (💾) for permanent storage. These same files come back in response["files"] when your notebook is published as a headless API — each with a public url (header-free, any size).

How a Job Runs

What happens when you click Run All:

1
Pre-checks
Auth verified. GPU selected. Code exists. No other jobs running (conflict popup if there is — stop or switch).
2
Safety check
Detects infinite loops, dangerous imports, estimates memory usage. Warns you before spending compute.
3
Pre-flight
Verifies all your files exist on the GPU volume. Re-downloads URL files if needed. No surprises.
4
Assemble & dispatch
All cells concatenated into one script. Secrets encrypted. Sent to Firebase → validates credits → dispatches to Modal.
5
Execute
GPU spins up. Your script runs. Console streams output live. Output files collected.
6
Done
GPU spins down. Execution time and cost displayed. Output files appear in sidebar. Billed for exactly the seconds it ran.

One job at a time. Close the tab and come back — the notebook reconnects to your running job and resumes streaming output.

Storage — What Persists

WhatHow longCost
Your code, secrets, GPU selectionForever (auto-saved)Free
Uploaded filesPersists for the projectStorage rate
Output filesUntil you clear themStorage rate
Model cachePersists across ALL projectsIncluded
Idle (no compute running)$0

Notebook Import

Drag any .ipynb file (Jupyter/Colab notebook) onto the page. SeqPU extracts the code cells and gives you two options:

  • Replace current cells — swaps your existing cells with the imported ones
  • Create new project — creates a fresh project with the imported cells

Option to keep cell outputs from the original notebook for reference.

Auto-Save

Everything saves automatically. Your cells, secrets, GPU selection, environment choice, files — all persisted. Close the tab, come back tomorrow, it's exactly where you left it. Projects can be organized into folders from the Dashboard.

Secrets

Never put API keys in cells. Use the Secrets panel in the left sidebar. Secrets are encrypted at rest, never shown in logs, and injected as environment variables when your code runs.

Accessing a secret
import os

# Set these in the Secrets panel — never hardcode them
api_key = os.environ["OPENAI_API_KEY"]
hf_token = os.environ["HF_TOKEN"]
db_password = os.environ["DATABASE_URL"]

From Notebook to Product

When your code works — when the Console shows the output you want — click Publish in the header. Your notebook becomes a product:

  • Headless API (section 06) — other systems call it, you charge per call with markup
  • UI Site (section 07) — visitors use it in a browser, you charge per use or subscription
  • Telegram Bot (section 05) — users message it from their phone, you charge per subscriber
  • Scheduled Job (section 08) — runs automatically on a cron, monitors, reports, alerts

Every notebook is a product waiting to happen. Write it, test it, refine it with the Console, publish it, start charging. The notebook is the workshop. The published tool is the product. The margin is yours.

Democratizing access to innovation
The same compute that powers OpenAI, Anthropic, and Google is available to you — T4 to 2×B200, 16GB to 384GB — pay by the second, publish with a button click, sell to the world. You don't need a data center. You need an idea and a notebook.
← Previous
Getting Started
03 — Models & GPUs

Right model.
Right GPU.
Right cost.

Open source AI models are the same technology as the APIs you're used to — same transformer architecture, same attention mechanism, same algorithms. The difference: you can see them, control them, run them yourself. An LLM is not a black box. It's a function — text in, text out. SeqPU gives you 16GB to 384GB of GPU memory with a click of a button. No procurement. No setup. No servers.

CPU Only
T4 16GB
L4 24GB
A10G 24GB
L40S 48GB
A100 40GB
A100 80GB
H100 80GB
H200 141GB
B200 192GB
2× H200 282GB
2× B200 384GB
CPU Only
↑ Free tier
$0.047/hr · scripts, APIs
A100 80GB
↑ Sweet spot
$2.50/hr · 90% of use cases
2× B200 384GB
↑ No limits
$12.50/hr · 384GB VRAM

Why Run a Model Yourself

  • Privacy — your data never touches a third party. Medical records, legal documents, proprietary IP stays on your GPU.
  • Cost at scale — no per-token fees. At 10,000 calls/day, a 32B model on A100 costs ~$60/day. Same volume through Claude or GPT-4: $300-1,500/day.
  • Control — no rate limits, no content filters you didn't choose, no model deprecation. Run any model the day it drops on HuggingFace.
  • Speed — no network round-trip to an API server. Your model is right there on the GPU.

The LLM Is Not a Black Box

Every AI API you've ever used — ChatGPT, Claude, Gemini — runs the same algorithm as the open source models here. The transformer architecture is published research. The attention mechanism is matrix multiplication. The generation loop is: predict next token, append, repeat. That's it.

An LLM is a function. Text in, text out. Like a sort algorithm processes a list, an LLM processes a sequence of tokens. The weights are a downloadable file. The inference code is open source. The only thing the API providers add is hosting and a bill. Open source gives you the algorithm itself — you run it on your hardware, at your precision, chained however you want.

How Models Work on Hardware

A model is a file full of numbers — billions of them. Those numbers (parameters) need to fit in GPU memory (VRAM). How precisely you store each number determines how much space it takes:

  • FP16/BF16 (full precision) — 2 bytes per number. A 7B model = ~15GB. Maximum quality.
  • INT8/FP8 (8-bit) — 1 byte per number. Same model = ~8GB. Slight quality trade-off, hard to notice.
  • INT4/AWQ/GPTQ (4-bit) — 0.5 bytes per number. Same model = ~4GB. Fine for 95% of tasks. Quarter the VRAM.

Plus 10-20% overhead for the KV cache — the model's working memory while it generates text token by token. Longer conversations = more cache = more VRAM. vLLM manages this automatically.

ModelFP16INT8INT4/AWQ
3B~7 GB~3.5 GB~2 GB
7B~15 GB~8 GB~4 GB
14B~31 GB~15 GB~8 GB
32B~70 GB~35 GB~18 GB
70B~154 GB~77 GB~39 GB
405B~891 GB~446 GB~223 GB
The sweet spot
INT4/AWQ is production-grade. A 70B model that needs 154GB in FP16 fits on an A100 80GB in INT4 (~39GB) for $2.50/hr instead of $4.54/hr. Most users can't tell the difference in output.
Prompt for Claude
"I have a customer support chatbot that needs to handle 500 messages per day. Each message is under 200 words and the response should be under 300 words. I want to run it on SeqPU using an open source model. Tell me which model to use, which quantization (FP16, INT8, or INT4), which GPU to select in SeqPU, and estimate my daily cost in dollars."

Picking the Right GPU

Not a spec sheet. A decision:

0
CPU ($0.047/hr) — No GPU needed
API calls to Claude/GPT/Gemini, data processing, web scraping, email, orchestration. Most bots start here.
16
T4 16GB ($0.59/hr) — Entry level
3B-7B INT4. Classification, routing, extraction, Whisper transcription, simple chatbots. The cheapest real GPU.
24
L4 24GB ($0.80/hr) · A10G 24GB ($1.10/hr)
7B FP16, 14B INT4. General purpose. A10G is 40% faster for training. L4 is better value for inference.
48
L40S 48GB ($1.95/hr) — Mid-range
14B FP16, 32B INT4. Image generation sweet spot (Stable Diffusion, FLUX).
80
A100 80GB ($2.50/hr) — THE DEFAULT CHOICE
32B FP16, 70B INT4/AWQ. Handles 90% of use cases. When in doubt, pick this one.
80
H100 80GB ($3.95/hr) — Speed upgrade
Same 80GB as A100 but ~2x faster compute (Hopper architecture). Pick when speed matters more than cost.
141
H200 141GB ($4.54/hr) — Full precision large models
70B FP16. Research, medical, legal, financial — when quality cannot be compromised.
192
B200 192GB ($6.25/hr) — Largest single GPU
70B FP16 with huge context, 100B+ INT4. Bleeding edge single-GPU performance.
282
2× H200 282GB ($9.08/hr) — Multi-GPU
100B-200B models, 405B INT4. Tensor parallelism across two GPUs.
384
2× B200 384GB ($12.50/hr) — No limits
405B FP16. The largest models on earth. No compromises.
Not sure?
Start with A100 80GB. It's the Honda Civic of GPUs — reliable, handles everything, no surprises. If your model doesn't fit, the console tells you — move up one tier.

What Fits Where — Quick Reference

Qwen3-32B on an A100 80GB? Llama-3.1-70B-AWQ on a T4? Check the grid.

ModelT4 16GBL4 24GBL40S 48GBA100 40GBA100 80GBH100 80GBH200 141GBB200 192GB
3B INT4yesyesyesyesyesyesyesyes
7B INT4yesyesyesyesyesyesyesyes
7B FP16yesyesyesyesyesyesyes
14B INT4yesyesyesyesyesyesyesyes
14B FP16yesyesyesyesyesyes
32B INT4tightyesyesyesyesyesyes
32B FP16yesyesyesyes
70B INT4yesyesyesyes
70B FP16yesyes
405B INT42×GPU

Loading Models — vLLM vs Transformers

vLLM — the production standard. Handles batching, KV cache management (PagedAttention), and quantization automatically. A 32B model on A100 80GB with vLLM can handle 10+ concurrent requests. Select "vLLM" as your base image.

vLLM — fast inference (select vLLM base image)
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-32B", max_model_len=4096)
params = SamplingParams(temperature=0.7, max_tokens=1024)

result = llm.generate(["What is the capital of France?"], params)
print(result[0].outputs[0].text)

Transformers — the universal loader. Supports every model architecture. Use for fine-tuning, vision, audio, custom pipelines, or anything vLLM doesn't support.

Transformers — full flexibility (default base image)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.float16, device_map="auto")

inputs = tokenizer("Explain photosynthesis:", return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

INT4/AWQ — look for "-AWQ" or "-GPTQ" suffix on HuggingFace. 4x less VRAM. A 70B model fits on an A100 80GB.

70B on A100 80GB — quantized
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen2.5-72B-Instruct-AWQ")
params = SamplingParams(max_tokens=1024)
result = llm.generate(["Write a business proposal for..."], params)
print(result[0].outputs[0].text)

When to use which:

  • Serving a text model → vLLM
  • Fine-tuning → Transformers
  • Vision model (Qwen-VL, Pixtral) → Transformers
  • Image generation (Stable Diffusion, FLUX) → Diffusers
  • Audio (Whisper, Bark, Kokoro) → Transformers
  • Embeddings → Sentence Transformers (CPU usually enough)

Mixtures of Compute Experts — The Real Power

A 32B model with a well-crafted prompt beats a 480B model with a lazy prompt. A pipeline of small models costs less than one big model call — and produces better results. Each step purpose-built, right-sized, on exactly the hardware it needs.

3-stage pipeline — right model for each job
import seqpu

# Stage 1: Classify (3B on T4, $0.0002)
category = seqpu.tools.call("classifier", {"text": message})

# Stage 2: Only hard cases hit the big model (32B on A100, $0.005)
if category["result"] == "complex":
    analysis = seqpu.tools.call("deep-analyzer", {"text": message})
else:
    analysis = seqpu.tools.call("quick-responder", {"text": message})

# Stage 3: Format and send (CPU, $0.00003)
seqpu.notify(analysis["response"], chat_id=telegram_chat_id, platform="telegram")
This is the pattern
The smartest system isn't the one with the biggest model. It's the one that uses the right model for each step. You're not calling a black box. You're designing sequences of compute — each step an algorithm you chose, on hardware you picked, processing data that never leaves your server.
Prompt for Claude
"Design a 3-tool pipeline on SeqPU for processing customer support emails. Tool 1: a 3B classifier on T4 that reads the email text and outputs 'urgent', 'normal', or 'spam' as a single word. Tool 2: a 32B response drafter on A100 80GB that takes the email and writes a professional reply. Tool 3: a CPU script that takes the drafted reply and sends it via smtplib to the customer's email address. Write all three as separate SeqPU notebooks I can publish as headless APIs, plus the seqpu.tools.call() orchestrator that chains them together."
Approach
Cost per request
Pipeline: 3B classify + 32B analyze + CPU format
$0.006 (hard case) · $0.0002 (easy case)
Single 70B model for everything
$0.01 regardless of difficulty
Claude / GPT-4 API call
$0.03 – $0.15 per request

More pipeline patterns:

  • Research: CPU scrapes web → 7B extracts facts → 32B synthesizes answer → CPU sends to Telegram
  • Content: 14B writes draft → 7B checks grammar → image model generates cover → CPU publishes to CMS
  • Support: 3B classifies intent → routes to the right 14B specialist (billing/technical/returns)
  • Documents: CPU extracts PDF text → 7B summarizes → embedding model indexes for search

The Model Landscape — What's Available Right Now

There are thousands of models on HuggingFace — from 22 million parameters to 685 billion. Here's what to use for what.

General Assistant / Chatbot

The core use case. A model that talks, answers questions, writes, reasons.

  • Budget ($0.59/hr): Qwen3-8B or Llama-3.1-8B on T4 in INT4. Surprisingly good for Q&A and support.
  • Production ($2.50/hr): Qwen3-32B or Qwen2.5-72B-AWQ on A100 80GB. GPT-4 class quality. This is where most serious bots run.
  • Maximum ($4.54-9.08/hr): DeepSeek V3.2 (685B), Qwen3.5-397B, Llama 4 Maverick on H200/2×H200.

Trade-offs: Qwen3 leads multilingual (200+ languages). Llama has the largest community and most fine-tunes. DeepSeek leads reasoning benchmarks. Pick based on your task.

Production chatbot — Qwen3-32B on A100 80GB
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.7, max_tokens=1024)

result = llm.generate(["Explain quantum computing to a 10 year old"], params)
print(result[0].outputs[0].text)
Prompt for Claude
"Write a SeqPU notebook cell that loads Qwen3-32B using vLLM on an A100 80GB. The cell should read a PDF from /data/report.pdf, extract the text using PyMuPDF, summarize it in 5 bullet points, and print the summary. Use the vLLM base image. Include the pip install for PyMuPDF if needed."

Code Generation

  • DeepSeek-Coder — dedicated coding model, multiple sizes
  • Qwen3 — excellent at code across all sizes
  • GLM-4-9B — strong tool integration, runs on L4
  • gpt-oss-120B — OpenAI's first open-weight release, fits on a single H100

Note: In 2026, open source code models beat proprietary ones. MiMo-V2-Flash exceeds GPT-5 on SWE-bench.

Vision — Images, PDFs, Charts, Screenshots

Send an image, get structured data back. Photo of a receipt → amounts. Chart → numbers. Screenshot → text. These use transformers, not vLLM.

  • Tiny: DeepSeek-VL 1.3B — runs on T4, basic image understanding
  • Small: Gemma 3 4B, MiniCPM-o 2.6 — run on L4, documents and charts
  • Medium: Pixtral 12B, Molmo 7B — multi-image understanding
  • Large: Qwen3-VL (256K context — processes entire books and hour-long videos)
Vision model — analyze an image
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

image = Image.open("/data/receipt.jpg")
inputs = processor(images=image, text="Extract all line items and totals as JSON", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0], skip_special_tokens=True))
Prompt for Claude
"Write a SeqPU notebook that uses Qwen2.5-VL-7B-Instruct to process every image in /data/receipts/. For each image, extract the vendor name, date, line items, and total amount as JSON. Save all results to /data/receipts_extracted.json. Use transformers, not vLLM. Run on L40S 48GB."

Image Generation

Text to image. Runs on L40S ($1.95/hr) or A100 ($2.50/hr).

  • FLUX.2 — current standard. Production quality in 4.5 seconds. Best photorealism.
  • Stable Diffusion 3.5 / SDXL — largest ecosystem. Thousands of community fine-tunes, LoRAs, ControlNets.
  • Z-Image-Turbo — fastest generation for rapid prototyping.

Trade-off: FLUX.2 produces higher quality but Stable Diffusion has 10x the community tooling. Start with SD if you're new. Move to FLUX for production.

Generate an image — Stable Diffusion
from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe("a futuristic city at sunset, cyberpunk style").images[0]
image.save("/data/output.png")
Prompt for Claude
"Write a SeqPU notebook that generates product photos using Stable Diffusion XL on an L40S. It should take a product description from a variable called 'description', generate 4 variations at 1024x1024 with different seeds, and save them as /data/product_1.png through /data/product_4.png. Use the diffusers library with torch.float16."

Audio — Speech to Text

Transcribe audio to text. Whisper is pre-installed — just import and go.

  • Whisper Large V3 (1.5B) — 99+ languages, runs on T4. An hour of audio = ~$0.02.
  • NVIDIA Canary 2.5B — lowest error rate on benchmarks.
  • Parakeet TDT — fastest for real-time transcription.
Transcribe audio — Whisper on T4
import whisper

model = whisper.load_model("large-v3")
result = model.transcribe("/data/meeting.mp3")
print(result["text"])
Prompt for Claude
"Write a SeqPU notebook with 2 cells. Cell 1: use Whisper large-v3 on a T4 to transcribe /data/meeting.mp3 and save the full transcript to /data/transcript.txt. Cell 2: load Qwen3-14B with vLLM on an A100 80GB and read /data/transcript.txt, then extract every action item, decision made, and follow-up task. Print them as a formatted list."

Audio — Text to Speech

Generate speech from text. Kokoro runs on CPU — your bot can talk for free.

  • Kokoro (82M) — runs on CPU for fractions of a cent. Surprisingly good quality.
  • Bark — multi-language, multi-speaker. Pre-installed in SeqPU's default image.
  • VibeVoice-1.5B — up to 90 minutes long-form, 4 speakers.
  • Dia — dialogue with laughter, sighs, emotions. Great for podcasts.

Embeddings & RAG

  • all-MiniLM-L6-v2 (22M) — runs on CPU, 5-14K sentences/second. The go-to for semantic search.
  • BGE-M3 — multilingual, dense + sparse retrieval.
  • Qwen3-Embedding-8B — tops the MTEB leaderboard.

Note: Embedding models are tiny. Most run on CPU. No GPU needed.

Translation

  • Qwen3.5 — 200+ languages and dialects.
  • Llama 3.1-8B — efficient multilingual on T4.

Deep Research

  • Search-R1 v0.3 (Qwen2.5-32B base) on H200 — searches the web, reads results, reasons, searches again.
  • See the full Deep Research Agent example in section 05.

The Scale — 22 Million to 685 Billion

22M
Sentence embedding (all-MiniLM-L6-v2)
Runs on CPU. 5,000+ sentences/second. Semantic search, RAG, clustering.
82M
Kokoro TTS
Runs on CPU. Generates human-quality speech for fractions of a cent.
1B
Gemma 3 1B, Llama 3.2 1B
Runs on T4. 2,500 tokens/sec. Basic Q&A, classification, edge deployment.
3B
SmolLM3-3B, Phi-4-mini (3.8B)
Runs on T4. A 2026 3B model matches what 2024's 30B models could do.
7B
Qwen3-8B, Llama-3.1-8B, Mistral-7B
Runs on T4 (INT4) or L4 (FP16). Production chatbots, code, translation.
32B
Qwen3-32B, Qwen2.5-32B
Runs on A100 80GB. GPT-4 class reasoning. The sweet spot for quality vs cost.
70B
Qwen2.5-72B, Llama-3.1-70B
A100 80GB with INT4. Serious production. DeepSeek-R1 for dedicated reasoning.
120B
gpt-oss-120B (OpenAI open-weight)
Single H100. Exceeds o4-mini across reasoning benchmarks.
400B
Llama 4 Maverick, Qwen3.5-397B
MoE — only 17B active per query. 2×H200. Frontier quality.
685B
DeepSeek V3.2
Top of every reasoning benchmark. Multi-GPU. The ceiling.

Or Just Use an API

You don't have to run a model locally. Call Claude, GPT-4, Gemini, Groq, Mistral — any AI API from your SeqPU notebook. Runs on CPU for $0.047/hr. Use SeqPU as the harness: your code, your secrets, your Telegram bot, your scheduling. The AI provider is your choice.

Claude on CPU — no GPU needed
from anthropic import Anthropic
import os

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.messages.create(
    model="claude-sonnet-4-20250514", max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this document..."}]
)
print(response.content[0].text)
CPU only
Cost: ~$0.00003 for SeqPU compute. You pay the API provider for the AI call. Most SeqPU bots start here — GPU only enters when you want to run the model yourself.
Prompt for Claude
"Write a SeqPU Telegram bot that runs on CPU. When it receives a message in the 'task' variable, it calls the Claude API (model claude-sonnet-4-20250514) with that message, then sends the response back to Telegram using seqpu.notify(answer, chat_id=telegram_chat_id, platform='telegram'). The API key is stored in SeqPU secrets as ANTHROPIC_API_KEY — use os.environ to read it. The variables task, context, and telegram_chat_id are already injected. I want to publish this as a headless API with those three inputs."

Model Caching

First download takes minutes. After that, it's cached on a persistent volume — second run loads in seconds. Cache persists across all your projects.

Zero cold start
Use a Snapshot (Code Config) to save the loaded model in GPU memory. Next run: 2 seconds to full readiness.

GPU Pricing

Pay per second. No minimums. The meter starts when your code runs and stops when it finishes.

GPU
VRAM / Per Second / Per Hour
CPU Only
— / $0.0000131/core/sec / $0.047/core/hr
T4 16GB
16 GB / $0.000164/sec / $0.59/hr
L4 24GB
24 GB / $0.000222/sec / $0.80/hr
A10G 24GB
24 GB / $0.000306/sec / $1.10/hr
L40S 48GB
48 GB / $0.000542/sec / $1.95/hr
A100 40GB
40 GB / $0.000583/sec / $2.10/hr
A100 80GB
80 GB / $0.000694/sec / $2.50/hr
H100 80GB
80 GB / $0.001097/sec / $3.95/hr
H200 141GB
141 GB / $0.001261/sec / $4.54/hr
B200 192GB
192 GB / $0.001736/sec / $6.25/hr
2× H200
282 GB / $0.002522/sec / $9.08/hr
2× B200
384 GB / $0.003472/sec / $12.50/hr
Day One Access
When a new model drops on HuggingFace, you run it the same day. No waitlists. No API approval. No vendor timelines. Load it in Cell 1, select your GPU, hit Run All.
← Previous
The Notebook
04 — Write or Vibe Code

You're not coding.
You're describing.

AI writes better code than most engineers. You don't need to learn Python. You don't need to understand GPUs. You need to know what you want — and say it out loud. Open Claude, describe the thing you need, paste it into SeqPU, press Run All. That's the whole process.

Plain English
Vibe Coding
Write Your Own

You don't need to know Python. You don't need to know what vLLM is. You don't need to know what a tensor is. You just need to know what you want.

1
Open Claude (or ChatGPT, Gemini, whatever)
Any AI assistant works. Describe what you want in plain English. Be specific about your files, your situation, what you want to happen.
2
Claude writes every line of code
You don't write anything. Claude generates a complete, working script based on your description.
3
Copy it. Paste it into SeqPU.
Open the notebook. Paste Claude's output into a cell. That's it.
4
Pick your GPU. Hit Run All.
Select a GPU from the strip (or CPU if no AI model needed). Click Run All. Watch the Console — your result appears in seconds.

Make Money

Prompt for Claude
"Write a SeqPU notebook that takes a product photo from /data/product.jpg, generates 5 professional product descriptions for Amazon, Etsy, and eBay, and saves them to /data/listings.txt. Use the vision model Qwen2.5-VL-7B on L40S. I want to publish this as a UI site where anyone can upload a product photo and get listings back."
Prompt for Claude
"Write a SeqPU headless API tool that takes a blog topic as input, generates a 1500-word SEO-optimized article using Qwen3-32B on A100 80GB, and returns the article. I want to charge $0.50 per article with a 30% markup on compute."
Prompt for Claude
"Write a SeqPU notebook that reads 100 customer reviews from /data/reviews.csv, generates a personalized response to each one using Qwen3-14B, and saves all responses to /data/responses.csv. I'm going to sell this as a service to restaurant owners for $99/month."

Run Your Business

Prompt for Claude
"I run a law firm. Write a SeqPU notebook that takes a contract PDF from /data/contract.pdf, reads every page, identifies risky clauses, flags missing standard protections, and generates a summary with recommendations. Use Qwen3-32B on A100 80GB."
Prompt for Claude
"Write a SeqPU notebook that connects to my Shopify store API (key in secrets as SHOPIFY_KEY), pulls all orders from the last 7 days, summarizes return reasons and common complaints, and sends me the summary on Telegram via seqpu.notify(). Run on CPU — no GPU needed."

Personal Life

Prompt for Claude
"I'm a parent. Write a SeqPU Telegram bot that helps me plan weekly meals for a family of 4 with a $150 budget. When I message it my dietary restrictions and what's in my fridge, it generates a meal plan with recipes and a grocery list. Use Qwen3-8B on T4. Include seqpu.notify() to send the response back."
Prompt for Claude
"I'm learning Spanish. Write a SeqPU Telegram bot where I send an English sentence and it translates it to Spanish, explains the grammar, and gives me 3 practice sentences to try. Use Qwen3-8B on T4. Include the task, context, and telegram_chat_id variables."
Prompt for Claude
"Write a SeqPU notebook that reads my journal entries from /data/journal/ (one .txt file per day), finds patterns in my mood and habits, and generates a weekly wellness report with insights and suggestions. Use Qwen3-14B on A100 80GB. Save to /data/weekly_report.txt."

Automate Everything

Prompt for Claude
"Write a SeqPU notebook that runs every morning at 7am. It checks 5 competitor websites for price changes using web scraping, compares to my prices in /data/my_prices.csv, and sends me a Telegram alert via seqpu.notify() with any differences. Run on CPU — just scraping and comparison, no AI needed."
Prompt for Claude
"Write a SeqPU notebook that monitors a stock price API every 15 minutes, runs sentiment analysis on the latest news headlines using Qwen3-8B on T4, and sends me a Telegram alert if sentiment drops below 0.3. Use STOCK_API_KEY and NEWS_API_KEY from SeqPU secrets."

Healthcare

Prompt for Claude
"I'm a therapist. Write a SeqPU notebook that reads my session notes from /data/notes.txt, removes all patient names and identifying information, then generates a clinical summary with treatment recommendations. Use Qwen3-32B on A100 80GB. Save to /data/clinical_summary.txt. Nothing should leave this server — privacy is critical."

Monetize it: Publish as a UI site. Other therapists pay $49/month to use it. Data never leaves the GPU — real privacy, not a promise.

Education

Prompt for Claude
"I'm a professor. Write a SeqPU notebook that reads a student's essay from /data/essay.txt, grades it on a rubric (thesis, evidence, structure, grammar) with a score out of 100 for each category, and writes detailed feedback the student can actually use to improve. Use Qwen3-14B on A100 80GB."

Monetize it: Publish as a UI site for your department. Or sell to other schools at $29/teacher/month.

Content Creators

Prompt for Claude
"I'm a YouTuber. Write a SeqPU notebook that takes my video transcript from /data/transcript.txt and generates: a click-worthy title, a 200-word description with keywords, 30 hashtags, 5 tweet threads to promote it, and a blog post version. Use Qwen3-14B. Save each output to a separate file in /data/content/."
Prompt for Claude
"Write a SeqPU notebook that generates a week's worth of LinkedIn posts for a SaaS founder. I give it my company description and recent wins in /data/context.txt. Each post should have a hook, story, and CTA. Different tone each day — Monday motivational, Wednesday tactical, Friday personal. Use Qwen3-8B on T4."

Monetize it: Publish as a headless API. Other creators call it. Charge $0.10 per generation with 25% markup.

Freelancers

Prompt for Claude
"Write a SeqPU notebook that reads a client brief from /data/brief.txt and generates a complete project proposal with scope, timeline, deliverables, and pricing. Use Qwen3-14B. Save to /data/proposal.txt. I want to publish this as a UI site where I paste briefs and get proposals back."

Monetize it: That proposal generator? Publish it. Other freelancers pay $5 per proposal. You built it in 10 minutes.

E-commerce

Prompt for Claude
"Write a SeqPU notebook that takes 50 product images from /data/products/ and generates alt text, meta descriptions, and social media captions for each one. Use the vision model Qwen2.5-VL-7B on L40S. Save results as /data/product_copy.json."

Monetize it: Publish as a UI site. Etsy and Amazon sellers upload product photos, get listings back, pay per use.

Trades

Prompt for Claude
"I'm an electrician. Write a SeqPU Telegram bot where I send a photo of a wiring problem and it identifies the issue, suggests the fix, lists the parts I need, and estimates the time. Use Qwen2.5-VL-7B on L40S. Include seqpu.notify() to send the answer back with the parts list."
Prompt for Claude
"I'm a mechanic. Write a SeqPU Telegram bot where I describe a car symptom (like 'grinding noise when braking, 2019 Honda Civic') and it gives me the top 3 most likely causes, what to check first, estimated repair cost range, and whether it's safe to drive. Use Qwen3-8B on T4."

Monetize it: Publish the bot on Telegram. Other tradespeople subscribe for $15/month. Your expertise, automated.

Families

Prompt for Claude
"Write a SeqPU notebook that reads my kids' school homework assignments from photos in /data/homework/ and checks their answers, explains what they got wrong, and generates practice problems on the topics they struggled with. Use Qwen2.5-VL-7B for reading the photos and Qwen3-14B for the explanations."

Monetize it: Publish the homework helper as a UI site. Other parents use it. Free tier for 5 checks/day, $9.99/month unlimited.

That's all there is to it
Copy any prompt above. Paste it into Claude. Claude writes the code. Paste the code into SeqPU. Hit Run All. You just built something that would take a developer days — in 5 minutes. Then publish it and start charging. You don't need to understand a single line of the code. You just need to know what you want.

Agents — Your AI Does the Work

An agent is just a script that makes decisions. It reads input, decides what to do, takes action, checks the result, and decides what to do next. That's it. It's a loop. The model is the brain. Your published tools are the hands. seqpu.agent.loop() handles the wiring — you just define what tools are available and what the goal is.

You don't need a framework. You don't need LangChain. You don't need CrewAI. You need a model, a list of tools, and a while loop. SeqPU gives you the model, the tools, and the compute. You provide the goal.

Connect it to Telegram and you have a personal AI that lives in your pocket. Message it "find me flights to Tokyo under $800 next month" and it searches, compares, and reports back. Message it "summarize today's earnings calls for NVDA and AMD" and it researches and delivers. It's not science fiction. It's a published tool with seqpu.notify().

What People Are Building

  • A bakery owner generates Instagram posts from cake photos — no code, Plain English, 5 minutes to build. Now charges local businesses $25/month to generate their social posts too.
  • A lawyer reviews 200-page contracts in 30 seconds — Qwen3-32B on A100. Published as a UI site for her firm. Saved 15 hours/week.
  • A teacher built a study bot on Telegram for her students — Qwen3-8B on T4, $0.59/hr. Students message it questions about the course material 24/7.
  • A freelancer built a translation API — makes $400/month from API calls while sleeping. 15% markup, published as headless.
  • A startup processes 10,000 support tickets/day — classify on T4, respond on A100, deliver on CPU. Replaced 3 support agents.
  • A fitness coach built a meal + workout planner bot — $19/month × 47 clients = $893/month from a bot that took 20 minutes to build.
  • A property manager automated lease review — drops PDFs, model flags non-standard clauses. Saved 15 hours/week across 200 properties.
  • A student built a study bot that reads textbook PDFs, generates flashcards, quizzes via Telegram, and tracks which topics they struggle with.
  • A nonprofit built a grant writing assistant — paste the RFP and your org's mission, get a first draft. Published as a UI site for other nonprofits.
  • A mechanic built a diagnostic bot on Telegram — describe the symptom, get likely causes and repair estimates. Other shops subscribe for $15/month.
  • A real estate agent auto-generates listings from property descriptions — Claude API on CPU, $0.00003 per listing. Published for other agents.
  • A parent built a homework helper that reads photos of assignments, checks answers, and explains mistakes — published for other parents at $9.99/month.
  • A consultant built an agent that monitors 20 news sources every morning, filters for her clients' topics, writes personalized briefings, and delivers to Telegram by 7am. $149/client/month. Runs on T4.
  • A developer built an agent that reviews his GitHub PRs — reads the diff, checks for bugs, suggests improvements. Runs on CPU calling Claude API. Saved 2 hours/day.
  • A photographer built a Telegram bot where clients request edits — "make it warmer," "crop tighter." The bot queues requests and notifies when ready. The intake is automated, the editing is manual.
  • A nonprofit built a grant writing assistant — paste the RFP and your org's mission, get a first draft back. Published as a UI site for other nonprofits to use.

Publish and Monetize

Every script you build can become a product. Click Publish. Pick the format. Set a price. Ship it.

  • Headless API — other systems call it programmatically. Charge per call with markup.
  • UI Site — anyone visits your URL and uses it. Share it, embed it, charge for it.
  • Telegram Bot — people message it from their phone. Your AI in their pocket.
  • Scheduled Job — runs automatically on a cron. Monitors, reports, alerts — while you sleep.
The math
A script that took you 5 minutes to build with Claude can run 24/7, serve thousands of users, and generate revenue while you sleep. The code is yours. The compute is ours. The margin is yours to keep.
This is the moment
Every script on this page can be published in 3 clicks. Every published tool can charge money. Every message to your Telegram bot, every call to your API, every visitor to your UI site — that's revenue. You don't need investors. You don't need a team. You need an idea, an AI assistant, and SeqPU.
← Previous
Models & GPUs
05 — SDK & Connections

Connect your code
to the world.

Write a script. Publish it. Connect your Telegram bot. Your AI answers from your pocket. CPU is enough for most bots — GPU only fires when you need a local model. Use Claude, GPT, Gemini, or any provider. SeqPU is the harness. You control everything.

Connect Your Own Telegram Bot

Three steps to turn any published tool into a Telegram bot:

1
Create a bot with @BotFather
Open Telegram, search @BotFather, send /newbot. Pick a name and username. Copy the token BotFather gives you.
2
Connect in Settings
Go to Settings → Connections → + Connect Telegram Bot. Paste your token. Select which published tool the bot should run.
3
Message your bot
Open your bot in Telegram and send a message. Your tool runs, processes it, and sends the response back through your bot.

How Messages Flow

When someone messages your bot, here's exactly what happens:

  • Telegram sends the message to Cloudflare (edge security, DDoS protection)
  • Cloudflare looks up your bot config — checks for slash commands, prepends system prompt
  • Cloudflare dispatches to Firebase (credit check, job creation)
  • Firebase sends job to Modal (your selected GPU or CPU)
  • Your code runs — receives task, context, telegram_chat_id as variables
  • Your code calls seqpu.notify() — response goes back through Cloudflare using your bot's token
  • Response appears in Telegram from your bot, with your bot's name and avatar

Total latency: 2-30 seconds depending on hardware and model size. Static slash commands return instantly from the edge.

Customize Your Bot

All customization is in Settings → Connections. No code changes needed.

  • System Prompt — prepended to every message before it hits your tool. Define personality: "You are Marcus, a financial analyst." Your tool code doesn't change — the prompt is injected at the edge.
  • Welcome Message — what users see when they send /start. Custom greeting for your bot. Leave empty for default.
  • Acknowledgment — instant reply before your tool runs ("Researching...", "On it...", or empty to disable). No compute cost.

All three auto-save when you click out of the field.

Slash Commands

Map /commands to different tools or instant static messages. Add them in Settings → Connections → Commands.

  • /help → static message "I can help with..." — $0.00, instant from edge, no compute
  • /research → routes to your research tool on H200
  • /quick → routes to a fast tool on T4
  • /translate → routes to your translation tool

Commands appear in Telegram's autocomplete menu when users type /. Each command can point to a different published tool on different hardware — route cheap tasks to cheap GPUs.

Billing

Every message uses your Firebase credits (micro-dollars). Credits are checked before each message processes — insufficient credits means the message doesn't run.

  • Static /help command → $0.00
  • CPU bot (2 seconds) → ~$0.00003
  • T4 bot (3 seconds) → ~$0.0005
  • A100 80GB (8 seconds) → ~$0.006
  • H200 deep research (30 seconds) → ~$0.038

Prerequisites

  • A published Headless API tool — with inputs: task, telegram_chat_id, context (see guide 06)
  • A Service Token — created in Settings → Service Tokens (the bot uses this to authenticate)
  • A Telegram bot token — from @BotFather on Telegram

Discord, Slack & WhatsApp

The same architecture supports Discord, Slack, and WhatsApp. The webhook infrastructure is already built — these integrations are coming soon.

Writing Your Bot Script

When Telegram calls your tool, three Python variables are injected automatically:

  • task — the message the user sent (string). If you set a system prompt in Settings, it arrives already prepended.
  • context — conversation history as a JSON string. Parse with json.loads(context) to get previous messages.
  • telegram_chat_id — the chat ID to reply to (string). Pass this to seqpu.notify().

Your code must call seqpu.notify() to send the response back. Without it, the user gets no reply. The tool runs, but nothing goes back to Telegram.

Publishing for Telegram

1
Write your code
Use task, context, and telegram_chat_id as variables. Call seqpu.notify() to respond.
2
Click Publish → API
Define three inputs: task (text, required), telegram_chat_id (text, required), context (text, optional).
3
Select hardware
CPU for API bots. T4-A100 for small-medium models. H200 for large models.
4
Connect your bot
Settings → Connections → paste BotFather token → select this tool.

CPU Is Enough for Most Bots

If you're calling Claude, OpenAI, Gemini, Groq, or any external AI API — that's CPU. The AI runs on their servers. You're just orchestrating. CPU costs $0.0000131/core/second — about $0.00003 per message.

GPU is only needed when you run a model locally — when you want privacy (data never leaves your server), when you want an open model (Qwen, Llama, Mistral), when you want zero token cost, or when you need a custom fine-tuned model.

You choose the AI
SeqPU doesn't force any provider. Use Claude, GPT-4, Gemini, Mistral, Groq, a local model, or no AI at all. SeqPU is the execution environment and the Telegram bridge. Your code, your choice.

Example — Echo Bot (CPU, 3 lines)

The simplest possible bot. Echoes back what the user said. No GPU. No AI. ~$0.00003 per message.

echo_bot.py
import seqpu

seqpu.notify(f"You said: {task}", chat_id=telegram_chat_id, platform="telegram")
Prompt for Claude
"Write a SeqPU Telegram bot that loads Qwen3-14B with vLLM on an A100 80GB. It receives the user's message in the 'task' variable and conversation history in the 'context' variable (a JSON string of previous messages). Parse the context to build a multi-turn conversation prompt so the bot remembers what was said before. Generate a response and send it back using seqpu.notify(answer, chat_id=telegram_chat_id, platform='telegram'). I want to publish this as a headless API with inputs: task (text, required), telegram_chat_id (text, required), context (text, optional)."

Example — Claude Bot (CPU, bring your own AI)

Call any AI provider from your bot. This uses Claude — runs on CPU. SeqPU is the harness. Claude does the thinking. You control which provider, which model, which API key.

claude_bot.py
import seqpu, os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": task}]
)
seqpu.notify(response.content[0].text, chat_id=telegram_chat_id, platform="telegram")
CPU only
This runs on CPU. Cost: ~$0.00003 for SeqPU compute. The Claude API call costs whatever Anthropic charges. But you chose that — not us.

Example — Local Model Bot (A100 80GB, ~$0.006/msg)

Run your own model. Your data never leaves your server. No tokens. No API calls. Your model, your hardware, your rules.

local_model_bot.py
from vllm import LLM, SamplingParams
import seqpu

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=1024)

result = llm.generate([task], params)
seqpu.notify(result[0].outputs[0].text, chat_id=telegram_chat_id, platform="telegram")

Example — Bot with Memory (uses context)

Use the conversation history so your bot remembers what was said before.

memory_bot.py
from vllm import LLM, SamplingParams
import seqpu, json

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=1024)

history = ""
if context:
    try:
        msgs = json.loads(context)
        history = "\n".join(f"{m['role']}: {m['content']}" for m in msgs if m.get('content'))
    except:
        history = context

prompt = f"Previous conversation:\n{history}\n\nUser: {task}\nAssistant:"
result = llm.generate([prompt], params)
seqpu.notify(result[0].outputs[0].text, chat_id=telegram_chat_id, platform="telegram")

Example — Send Files Back (charts, PDFs, images)

Your bot can send images, PDFs, charts — any file — back to Telegram.

chart_bot.py
import seqpu, base64
import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [10, 20, 15])
plt.savefig("/data/chart.png")

with open("/data/chart.png", "rb") as f:
    seqpu.notify_file(
        base64.b64encode(f.read()).decode(),
        "image/png", "chart.png",
        chat_id=telegram_chat_id,
        caption=f"Chart for: {task}"
    )

Example — API Bot, No AI (CPU, just logic)

No model needed. Query an API, format the result, send it back. Runs on CPU for fractions of a cent.

api_bot.py
import seqpu, requests

response = requests.get(f"https://api.example.com/data?q={task}")
data = response.json()

summary = f"Results for '{task}':\n"
for item in data["results"][:5]:
    summary += f"- {item['name']}: {item['value']}\n"

seqpu.notify(summary, chat_id=telegram_chat_id, platform="telegram")

Example — Deep Research Agent (H200, full pipeline)

A production research agent. Loads a 33B model on H200. Searches the web via Serper, fetches and extracts data via all 6 Cloudflare Browser Rendering endpoints (markdown, json, scrape, crawl, links, content), reasons through multiple search rounds with custom StoppingCriteria, sends turn-by-turn status updates to Telegram, includes a 1.5B doorman for instant creative acks, and delivers a sourced answer with citations.

deep_research_agent.py
# =============================================================================
# DEEP RESEARCH RUNNER — Search, Fetch, Reason, Answer
# =============================================================================
# Executive research assistant powered by Search-R1 v0.3 33B.
# Runs on SeqPU (H200 141GB). BF16. No vLLM. No quantization.
#
# One model on one GPU (direct transformers load):
#   - Search-R1 v0.3 Qwen2.5-32B via AutoModelForCausalLM (BF16, ~64GB VRAM)
#
# Pipeline: QUESTION → model reasons → searches (Serper) →
#           fetches/extracts (Cloudflare /json /markdown /scrape /crawl) →
#           model reads results → searches again if needed → answers via Telegram
# =============================================================================

import warnings
warnings.filterwarnings("ignore")

import os
import json
import re
import time
import requests
from datetime import datetime, timezone
from concurrent.futures import ThreadPoolExecutor, as_completed
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import seqpu_sdk as seqpu

# =============================================================================
# Timestamps
# =============================================================================

_now = datetime.now(timezone.utc)
CURRENT_DATETIME = _now.strftime("%B %d, %Y at %I:%M %p UTC")

# =============================================================================
# INPUTS — injected by SeqPU assembleToolCode()
#   task             - string (user message, may have CF worker wrapper prepended)
#   context          - string (conversation history JSON from GPU Claw)
#   telegram_chat_id - string (Telegram chat ID for notify)
# =============================================================================

raw_task = (task or "").strip()
if raw_task.startswith("[IMPORTANT:") and "]\n\n" in raw_task:
    PROMPT = raw_task.split("]\n\n", 1)[-1].strip()
else:
    PROMPT = raw_task

raw_context = (context or "").strip()
context_text = ""
if raw_context and raw_context.startswith("["):
    try:
        messages = json.loads(raw_context)
        parts = []
        for msg in messages:
            role = "User" if msg.get("role") == "user" else "Assistant"
            content = msg.get("content", "")
            if content and content != "Processing...":
                parts.append(f"{role}: {content}")
        context_text = "\n".join(parts)
    except Exception:
        context_text = raw_context
else:
    context_text = raw_context
chat_id = INPUTS.get("telegram_chat_id", "")

# =============================================================================
# Doorman — instant smart acknowledgment via 1.5B
# =============================================================================

DOORMAN_MODEL = "Qwen/Qwen2.5-1.5B-Instruct"

if chat_id and PROMPT:
    try:
        print("=== DOORMAN ===")
        dm_tok = AutoTokenizer.from_pretrained(DOORMAN_MODEL, trust_remote_code=True)
        dm_mod = AutoModelForCausalLM.from_pretrained(
            DOORMAN_MODEL, torch_dtype=torch.bfloat16,
            device_map="auto", trust_remote_code=True)
        dm_context = context_text[-2000:] if context_text else ""
        dm_messages = [
            {"role": "system", "content": "Your job is to creatively welcome someone who just asked you a question. ONE sentence only. Your goal is to be warm, friendly, and ALWAYS say something different. Never repeat yourself. Every single response must be unique and fresh — this is your creative challenge.\n\nYou are confirming you received their request and are getting to work. That is ALL you do.\n\nDO NOT answer the question. DO NOT share any facts or knowledge. DO NOT include any names of companies or products. DO NOT explain anything about the topic. Nobody cares what you know. You have no ability to answer questions. Your ONLY job is to creatively and warmly confirm you received the request and are getting to work.\n\nIf your response contains ANY information about the topic, you have failed. Just welcome them creatively and let them know you are starting."},
            {"role": "user", "content": f"Conversation so far:\n{dm_context}\n\nNew question: {PROMPT}" if dm_context else PROMPT},
        ]
        dm_input = dm_tok.apply_chat_template(dm_messages, add_generation_prompt=True, tokenize=False)
        dm_ids = dm_tok.encode(dm_input, return_tensors='pt').to(dm_mod.device)
        with torch.no_grad():
            dm_out = dm_mod.generate(dm_ids, max_new_tokens=60, temperature=0.7, do_sample=True, pad_token_id=dm_tok.eos_token_id)
        dm_response = dm_tok.decode(dm_out[0][dm_ids.shape[1]:], skip_special_tokens=True).strip()
        print(f"  Doorman: {dm_response}")
        seqpu.notify(message=dm_response, chat_id=chat_id, platform="telegram")
        del dm_mod, dm_tok, dm_ids, dm_out
        torch.cuda.empty_cache()
        print("  Doorman sent and unloaded")
    except Exception as e:
        print(f"  Doorman failed (non-fatal): {e}")

# =============================================================================
# Constants
# =============================================================================

# Search-R1 v0.3 33B checkpoint
# Collection: https://huggingface.co/collections/PeterJinGo/search-r1-v03
# TODO: verify exact repo ID from the collection page
MODEL_NAME = "/root/.cache/huggingface/bf16-searchr1-33b"

# API keys from environment
SERPER_API_KEY = os.environ.get("SERPER_API_KEY", "")
BRAVE_API_KEY = os.environ.get("BRAVE_API_KEY", "")
CF_ACCOUNT_ID = os.environ.get("CF_ACCOUNT_ID", "")
CF_BR_API_TOKEN = os.environ.get("CF_BR_API_TOKEN", "")
CF_BR_BASE = f"https://api.cloudflare.com/client/v4/accounts/{CF_ACCOUNT_ID}/browser-rendering"

MAX_SEARCH_TURNS = 10

# =============================================================================
# Model Loading (direct transformers, no vLLM)
# =============================================================================


def load_model():
    """Load Search-R1 33B in BF16 on H200."""
    print(f"Loading {MODEL_NAME} in BF16...")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
    )
    print(f"Model loaded on {next(model.parameters()).device}")
    return model, tokenizer


# Qwen2.5 EOS token IDs
QWEN_EOS_TOKENS = [151645, 151643]

# Sequences that indicate the model wants to search
SEARCH_STOP_SEQUENCES = ["</search>", " </search>", "</search>\n", " </search>\n"]


class StopOnSearch(transformers.StoppingCriteria):
    """Custom stopping criteria — stops generation when model outputs </search>."""
    def __init__(self, target_sequences, tokenizer):
        self.target_ids = [tokenizer.encode(seq, add_special_tokens=False) for seq in target_sequences]
        self.target_lengths = [len(ids) for ids in self.target_ids]

    def __call__(self, input_ids, scores, **kwargs):
        if input_ids.shape[1] < min(self.target_lengths):
            return False
        for i, target in enumerate(self.target_ids):
            target_tensor = torch.as_tensor(target, device=input_ids.device)
            if input_ids.shape[1] >= self.target_lengths[i] and torch.equal(input_ids[0, -self.target_lengths[i]:], target_tensor):
                return True
        return False


def get_query(text):
    """Extract search query from <search>...</search> tags."""
    pattern = re.compile(r"<search>(.*?)</search>", re.DOTALL)
    matches = pattern.findall(text)
    if not matches:
        return None
    query = matches[-1].strip()
    # Clean common model formatting junk
    if query.lower().startswith("search query:"):
        query = query[len("search query:"):].strip()
    if query.lower().startswith("query:"):
        query = query[len("query:"):].strip()
    query = query.strip('"').strip("'").strip()
    return query


def generate_step(model, tokenizer, prompt, stopping_criteria):
    """One generation step — generates up to 1024 tokens, stops at </search> or EOS.
    Returns (output_text, is_eos)."""
    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)
    attention_mask = torch.ones_like(input_ids)

    with torch.no_grad():
        outputs = model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=1024,
            stopping_criteria=stopping_criteria,
            pad_token_id=tokenizer.eos_token_id,
            do_sample=True,
            temperature=0.7,
        )

    generated_tokens = outputs[0][input_ids.shape[1]:]
    output_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
    is_eos = outputs[0][-1].item() in QWEN_EOS_TOKENS

    del input_ids, attention_mask, outputs
    torch.cuda.empty_cache()
    return output_text, is_eos


# =============================================================================
# Search Layer
# =============================================================================


def search_serper(query, count=5):
    """Search Serper.dev (Google results as JSON). Retries on 429."""
    if not SERPER_API_KEY:
        print(f"  [Serper] Skipped (no API key)")
        return []
    for attempt in range(3):
        try:
            resp = requests.post(
                "https://google.serper.dev/search",
                json={"q": query, "num": count},
                headers={"X-API-KEY": SERPER_API_KEY},
                timeout=15,
            )
            if resp.status_code == 429:
                print(f"  [Serper] 429 rate limited — waiting 5s (attempt {attempt + 1}/3)")
                time.sleep(5)
                continue
            resp.raise_for_status()
            results = []
            for item in resp.json().get("organic", []):
                results.append({
                    "source": "serper",
                    "url": item["link"],
                    "title": item.get("title", ""),
                    "snippet": item.get("snippet", ""),
                })
            print(f"  [Serper] \"{query}\" -> {len(results)} results")
            return results
        except Exception as e:
            print(f"  [Serper] Error: {e}")
            return []
    print(f"  [Serper] Failed after 3 attempts")
    return []


def search_brave(query, count=5):
    """Search Brave Web Search API. Fallback for Serper."""
    if not BRAVE_API_KEY:
        print(f"  [Brave] Skipped (no API key)")
        return []
    for attempt in range(3):
        try:
            resp = requests.get(
                "https://api.search.brave.com/res/v1/web/search",
                params={"q": query, "count": count},
                headers={"X-Subscription-Token": BRAVE_API_KEY},
                timeout=15,
            )
            if resp.status_code == 429:
                print(f"  [Brave] 429 rate limited — waiting 5s (attempt {attempt + 1}/3)")
                time.sleep(5)
                continue
            resp.raise_for_status()
            results = []
            for item in resp.json().get("web", {}).get("results", []):
                results.append({
                    "source": "brave",
                    "url": item["url"],
                    "title": item.get("title", ""),
                    "snippet": item.get("description", ""),
                })
            print(f"  [Brave] \"{query}\" -> {len(results)} results")
            return results
        except Exception as e:
            print(f"  [Brave] Error: {e}")
            return []
    print(f"  [Brave] Failed after 3 attempts")
    return []


# =============================================================================
# Cloudflare Browser Rendering — /markdown
# =============================================================================


def cf_markdown(url):
    """Fetch a single page as clean Markdown via Cloudflare Browser Rendering."""
    if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
        print(f"  [CF/markdown] Skipped (no credentials)")
        return None
    try:
        resp = requests.post(
            f"{CF_BR_BASE}/markdown",
            headers={
                "Authorization": f"Bearer {CF_BR_API_TOKEN}",
                "Content-Type": "application/json",
            },
            json={
                "url": url,
                "rejectResourceTypes": ["image", "media", "font", "stylesheet"],
            },
            timeout=30,
        )
        resp.raise_for_status()
        result = resp.json().get("result", "")
        if result and len(result) > 50:
            print(f"  [CF/markdown] {url[:60]} -> {len(result)} chars")
            return result
        return None
    except Exception as e:
        print(f"  [CF/markdown] Error: {e}")
        return None


# =============================================================================
# Cloudflare Browser Rendering — /content (rendered HTML with JS execution)
# =============================================================================


def cf_content(url):
    """Fetch a single page as fully rendered HTML via Cloudflare Browser Rendering.
    Executes JavaScript — use as fallback when /markdown returns empty on JS-heavy pages."""
    if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
        print(f"  [CF/content] Skipped (no credentials)")
        return None
    try:
        resp = requests.post(
            f"{CF_BR_BASE}/content",
            headers={
                "Authorization": f"Bearer {CF_BR_API_TOKEN}",
                "Content-Type": "application/json",
            },
            json={
                "url": url,
                "rejectResourceTypes": ["image", "media", "font", "stylesheet"],
                "gotoOptions": {"waitUntil": "networkidle2"},
            },
            timeout=30,
        )
        resp.raise_for_status()
        result = resp.json().get("result", "")
        if result and len(result) > 50:
            print(f"  [CF/content] {url[:60]} -> {len(result)} chars")
            return result
        return None
    except Exception as e:
        print(f"  [CF/content] Error: {e}")
        return None


# =============================================================================
# Cloudflare Browser Rendering — /links (discover links on a page)
# =============================================================================


def cf_links(url, visible_only=True, exclude_external=False):
    """Get all links from a page. Used for pre-crawl discovery — check what's
    on a page before deciding whether to crawl the whole site."""
    if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
        print(f"  [CF/links] Skipped (no credentials)")
        return []
    try:
        resp = requests.post(
            f"{CF_BR_BASE}/links",
            headers={
                "Authorization": f"Bearer {CF_BR_API_TOKEN}",
                "Content-Type": "application/json",
            },
            json={
                "url": url,
                "visibleLinksOnly": visible_only,
                "excludeExternalLinks": exclude_external,
            },
            timeout=15,
        )
        resp.raise_for_status()
        result = resp.json().get("result", [])
        print(f"  [CF/links] {url[:60]} -> {len(result)} links found")
        return result
    except Exception as e:
        print(f"  [CF/links] Error: {e}")
        return []


# =============================================================================
# Cloudflare Browser Rendering — /json (AI extraction)
# =============================================================================


def cf_json(url, prompt, schema):
    """Fetch a page and extract structured data via Cloudflare Workers AI."""
    if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
        print(f"  [CF/json] Skipped (no credentials)")
        return None
    try:
        resp = requests.post(
            f"{CF_BR_BASE}/json",
            headers={
                "Authorization": f"Bearer {CF_BR_API_TOKEN}",
                "Content-Type": "application/json",
            },
            json={
                "url": url,
                "prompt": prompt,
                "response_format": {
                    "type": "json_schema",
                    "json_schema": schema,
                },
                "rejectResourceTypes": ["image", "media", "font", "stylesheet"],
            },
            timeout=30,
        )
        resp.raise_for_status()
        result = resp.json().get("result", None)
        if result:
            print(f"  [CF/json] {url[:60]} -> extracted")
        return result
    except Exception as e:
        try:
            print(f"  [CF/json] Error: {e} | Body: {resp.text[:500]}")
        except Exception:
            print(f"  [CF/json] Error: {e}")
        return None


# =============================================================================
# Cloudflare Browser Rendering — /scrape (CSS selector extraction)
# =============================================================================


def cf_scrape(url, selectors):
    """Extract specific HTML elements via CSS selectors."""
    if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
        print(f"  [CF/scrape] Skipped (no credentials)")
        return []
    try:
        resp = requests.post(
            f"{CF_BR_BASE}/scrape",
            headers={
                "Authorization": f"Bearer {CF_BR_API_TOKEN}",
                "Content-Type": "application/json",
            },
            json={
                "url": url,
                "elements": [{"selector": s} for s in selectors],
            },
            timeout=15,
        )
        resp.raise_for_status()
        result = resp.json().get("result", [])
        print(f"  [CF/scrape] {url[:60]} -> {len(result)} selectors matched")
        return result
    except Exception as e:
        print(f"  [CF/scrape] Error: {e}")
        return []


# =============================================================================
# Cloudflare Browser Rendering — /crawl (async multi-page)
# =============================================================================


def cf_crawl_start(url, limit=15, depth=3, include_patterns=None, exclude_patterns=None,
                   json_prompt=None, json_schema=None):
    """Initiate an async crawl job. Returns job ID."""
    if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
        print(f"  [CF/crawl] Skipped (no credentials)")
        return None
    body = {
        "url": url,
        "limit": limit,
        "depth": depth,
        "formats": ["json", "markdown"],
        "render": False,
        "rejectResourceTypes": ["image", "media", "font", "stylesheet"],
        "crawlPurposes": ["search"],
    }
    if include_patterns or exclude_patterns:
        body["options"] = {}
        if include_patterns:
            body["options"]["includePatterns"] = include_patterns
        if exclude_patterns:
            body["options"]["excludePatterns"] = exclude_patterns
    if json_prompt and json_schema:
        body["jsonOptions"] = {
            "prompt": json_prompt,
            "response_format": {
                "type": "json_schema",
                "json_schema": json_schema,
            },
        }
    try:
        resp = requests.post(
            f"{CF_BR_BASE}/crawl",
            headers={
                "Authorization": f"Bearer {CF_BR_API_TOKEN}",
                "Content-Type": "application/json",
            },
            json=body,
            timeout=30,
        )
        resp.raise_for_status()
        job_id = resp.json().get("result", "")
        print(f"  [CF/crawl] Started job {job_id} on {url[:60]}")
        return job_id
    except Exception as e:
        print(f"  [CF/crawl] Start error: {e}")
        return None


def cf_crawl_poll(job_id, max_wait=120):
    """Poll a crawl job until complete. Returns list of completed records."""
    if not job_id:
        return []
    poll_url = f"{CF_BR_BASE}/crawl/{job_id}"
    headers = {"Authorization": f"Bearer {CF_BR_API_TOKEN}"}
    start = time.time()

    while time.time() - start < max_wait:
        try:
            resp = requests.get(f"{poll_url}?limit=1", headers=headers, timeout=15)
            resp.raise_for_status()
            data = resp.json().get("result", {})
            status = data.get("status", "")
            if status != "running":
                break
        except Exception:
            pass
        time.sleep(3)

    # Fetch full results
    try:
        resp = requests.get(poll_url, headers=headers, timeout=30)
        resp.raise_for_status()
        data = resp.json().get("result", {})
        records = data.get("records", [])
        completed = [r for r in records if r.get("status") == "completed"]
        print(f"  [CF/crawl] Job {job_id}: {len(completed)}/{len(records)} pages completed")
        return completed
    except Exception as e:
        print(f"  [CF/crawl] Poll error: {e}")
        return []


# =============================================================================
# JSON Extraction Schemas
# =============================================================================

EARNINGS_SCHEMA = {
    "company": "string",
    "quarter": "string",
    "fiscal_year": "string",
    "net_revenues": "string",
    "net_income": "string",
    "eps": "string",
    "yoy_revenue_change": "string",
    "guidance": "string",
    "ceo_quote": "string",
}

PRESS_RELEASE_SCHEMA = {
    "date": "string",
    "headline": "string",
    "company": "string",
    "release_type": "string",
    "key_facts": "string",
    "products_mentioned": "string",
    "partners_mentioned": "string",
}

LICENSING_SCHEMA = {
    "licensor": "string",
    "licensee": "string",
    "property": "string",
    "deal_type": "string",
    "categories": "string",
    "announcement_date": "string",
}

GENERAL_SCHEMA = {
    "page_type": "string",
    "date": "string",
    "headline": "string",
    "key_facts": "string",
    "numbers_mentioned": "string",
    "quotes": "string",
    "companies_mentioned": "string",
}


def pick_schema(think_text, url, title):
    """Pick the best extraction schema based on context."""
    url_lower = (url or "").lower()
    title_lower = (title or "").lower()
    think_lower = (think_text or "").lower()

    if any(w in url_lower for w in ["investor", "earnings", "financial-result", "quarterly"]):
        return EARNINGS_SCHEMA, "Extract financial results, revenue, earnings, and segment data"
    if any(w in url_lower for w in ["press-release", "news-release", "announcement"]):
        return PRESS_RELEASE_SCHEMA, "Extract press release details, key facts, and products mentioned"
    if any(w in title_lower for w in ["announces", "reports", "quarterly", "q1", "q2", "q3", "q4"]):
        return EARNINGS_SCHEMA, "Extract financial results and key metrics"
    if any(w in think_lower for w in ["licens", "deal", "partnership"]):
        return LICENSING_SCHEMA, "Extract licensing deal details"
    return GENERAL_SCHEMA, "Extract key facts, data points, quotes, and companies mentioned"


# =============================================================================
# Parallel Helpers
# =============================================================================


def parallel_cf_json(items):
    """Fire multiple cf_json calls simultaneously.
    items: list of (url, prompt, schema) tuples.
    Returns: list of (url, result_or_none) tuples."""
    valid = [(url, prompt, schema) for url, prompt, schema in items]
    if not valid:
        return []
    results = []
    with ThreadPoolExecutor(max_workers=10) as executor:
        future_to_url = {
            executor.submit(cf_json, url, prompt, schema): url
            for url, prompt, schema in valid
        }
        for future in as_completed(future_to_url):
            url = future_to_url[future]
            try:
                result = future.result()
                results.append((url, result))
            except Exception:
                results.append((url, None))
    return results


# =============================================================================
# Prompt Templates (matches Search-R1 training format exactly)
# =============================================================================

SEARCH_PROMPT_TEMPLATE = """CRITICAL WARNING: Your internal knowledge is outdated and unreliable. Any answer you generate from memory is a hallucination — do not trust what you "know." Your training data is old. Your memory is wrong. Even if you are confident you know the answer, you are WRONG — your confidence is a trap. The ONLY source of truth is fresh search results from the web. You cannot answer correctly without searching first. You MUST search before answering. Do NOT answer without searching first. Never skip searching. ALWAYS emit a <search> tag before emitting an <answer> tag — no exceptions, no shortcuts. Never go directly to <answer>. Everything in your brain is a lie. Everything from previous messages is a lie. Everything the user told you could be wrong. The only thing that is true is what you find in fresh search results. Do not find ways around this. Do not convince yourself you already know. You do not know. Search.
Today is {date}.
Answer the given question. \
Today is {date}. \
You must conduct reasoning inside <think> and </think> first every time you get new information. \
After reasoning, if you find you lack some knowledge, you can call a search engine by <search> query </search> and it will return the top searched results between <information> and </information>. \
You can search as many times as your want. \
If you find no further external knowledge needed, you can directly provide the answer inside <answer> and </answer>. Give a detailed answer with specific numbers, dates, and facts from the sources you found. Always cite the full source URL for each claim, never reference documents by number. Do not give one-word answers. When a question involves multiple companies or topics, you must search for EACH one separately. Do not use data from previous conversation or memory for any of them — every company gets a fresh search. Only cite URLs that appeared in your search results or documents. Never generate or guess a URL. If you could not find data for something, say so honestly — do not make up numbers.
{context}Today is {date}.
REMEMBER: You MUST search. Your internal knowledge is a hallucination. Do not trust it. Do not use it. Everything you think you know is wrong — from your training, from the conversation above, from the previous answer, from the user's question. None of it is verified. The only way to give a correct answer is to search for current data. If you did not find it in a search result, you do not know it. Do NOT go directly to <answer>. You MUST emit at least one <search> tag first. Even if you think you already have the answer from the conversation above, that data could be stale — search anyway. Previous searches from earlier questions do not count — you must search fresh for THIS question. Data from the conversation history is context only, not a search result — verify it. Follow-up questions still require a fresh search. Even for opinion or strategy questions, search for the underlying data first. No amount of reasoning replaces searching. Do not find creative ways to skip searching. Do not rationalize why you do not need to search. You always need to search. No exceptions. No shortcuts. No loopholes. Search first, answer second.
Today is {date}.
Question: {question}\n"""

SEARCH_RESULT_TEMPLATE = '\n\n{output_text}<information>{search_results}</information>\n\n'


# =============================================================================
# Helpers
# =============================================================================


def extract_between(text, start_tag, end_tag):
    """Extract content between tags. Returns string or None."""
    start_idx = text.find(start_tag)
    if start_idx == -1:
        return None
    start_idx += len(start_tag)
    end_idx = text.find(end_tag, start_idx)
    if end_idx == -1:
        return text[start_idx:].strip()
    return text[start_idx:end_idx].strip()




def detect_crawl_intent(think_text):
    """Detect if the model wants to crawl a site. Returns (should_crawl, url, patterns)."""
    if not think_text:
        return False, None, None
    t = think_text.lower()
    crawl_phrases = ["crawl", "all press releases", "entire ir section",
                     "full investor relations", "every press release",
                     "browse their ir", "all their filings"]
    if not any(p in t for p in crawl_phrases):
        return False, None, None

    # Try to extract a URL from the thinking
    url_match = re.search(r'https?://[^\s\)\"\'<>]+', think_text)
    url = url_match.group(0) if url_match else None

    # Guess patterns based on context
    patterns = None
    if "press release" in t:
        patterns = ["**/press-releases/**", "**/news/**"]
    elif "investor" in t or "ir " in t or "financial" in t:
        patterns = ["**/press-releases/**", "**/financial-results/**", "**/presentations/**"]

    return True, url, patterns


def format_documents(search_results, fetched_data, chat_id=""):
    """Format search results and extracted data for the model."""
    docs = []
    fetched_map = {url: data for url, data in fetched_data}

    for i, r in enumerate(search_results, 1):
        url = r["url"]
        title = r.get("title", "")
        snippet = r.get("snippet", "")
        extracted = fetched_map.get(url)

        domain = url.split("/")[2] if url.count("/") >= 2 else url
        parts = [f"[{domain}]"]
        parts.append(f"URL: {url}")
        parts.append(f"Title: {title}")

        if extracted and isinstance(extracted, dict):
            parts.append(f"Extracted data: {json.dumps(extracted, indent=2)}")
        elif extracted and isinstance(extracted, str):
            # Markdown fallback — truncate to keep context manageable
            parts.append(f"Page content (markdown):\n{extracted[:5000]}")
        else:
            parts.append(f"Search snippet: {snippet}")

        docs.append("\n".join(parts))

    return "\n\n".join(docs)


def format_crawl_results(records):
    """Format crawl results for the model."""
    docs = []
    for i, r in enumerate(records, 1):
        url = r.get("url", "")
        title = r.get("metadata", {}).get("title", "")
        json_data = r.get("json", None)
        markdown = r.get("markdown", "")

        domain = url.split("/")[2] if url.count("/") >= 2 else url
        parts = [f"[{domain}]"]
        parts.append(f"URL: {url}")
        parts.append(f"Title: {title}")

        if json_data:
            parts.append(f"Extracted data: {json.dumps(json_data, indent=2)}")
        elif markdown:
            parts.append(f"Content preview:\n{markdown[:3000]}")
        else:
            parts.append("No content extracted")

        docs.append("\n".join(parts))

    return "\n\n".join(docs)


def send_status(chat_id, message):
    """Send a status message to Telegram. Silently fails."""
    if not chat_id:
        return
    try:
        seqpu.notify(message=message, chat_id=chat_id, platform="telegram")
    except Exception as e:
        print(f"  [notify] Status failed: {e}")


# =============================================================================
# Main Research Loop
# =============================================================================


def research(question, chat_id, model, tokenizer, output_file="response.txt"):
    """Run the research loop: reason in short bursts, search, fetch, extract, answer.
    Uses Search-R1's exact inference pattern — raw prompt concatenation with StoppingCriteria."""
    print(f"\n=== RESEARCH ===")
    print(f"  Question: {question[:100]}")

    # Build initial prompt using exact Search-R1 training format
    question_clean = question.strip()

    # Context goes AFTER instructions, BEFORE question (via {context} placeholder)
    # Context is for identity grounding only — search ALWAYS happens
    context_block = ""
    if context_text:
        context_block = f"\nPrevious conversation (for context only — you must still search for fresh data):\n{context_text}\n\n"

    prompt_text = SEARCH_PROMPT_TEMPLATE.format(question=question_clean, context=context_block, date=CURRENT_DATETIME)

    # Apply chat template if available
    if tokenizer.chat_template:
        prompt = tokenizer.apply_chat_template(
            [{"role": "user", "content": prompt_text}],
            add_generation_prompt=True, tokenize=False
        )
    else:
        prompt = prompt_text

    # Initialize stopping criteria for </search>
    stopping_criteria = transformers.StoppingCriteriaList(
        [StopOnSearch(SEARCH_STOP_SEQUENCES, tokenizer)]
    )

    turns = 0
    has_model_searched = False
    forced_count = 0
    last_forced_query = ""
    output_text = ""

    while turns < MAX_SEARCH_TURNS:
        print(f"\n  --- Turn {turns + 1} ---")
        output_text, is_eos = generate_step(model, tokenizer, prompt, stopping_criteria)
        print(f"  Output: {len(output_text)} chars")

        # Model finished (hit EOS) — check if it actually searched
        if is_eos:
            # Model never voluntarily searched — force a search
            if not has_model_searched and forced_count < 3:
                forced_count += 1
                print(f"  ⚠ Model skipped search — forcing search (attempt {forced_count})")
                # Vary the query to avoid searching the same thing twice
                if forced_count == 1:
                    forced_query = question_clean[:150]
                elif forced_count == 2:
                    # Rephrase: strip filler, add keywords
                    words = [w for w in question_clean.split() if w.lower() not in ("what", "did", "the", "a", "an", "is", "are", "was", "were", "how", "do", "does", "can", "could", "would", "should", "about", "their", "they", "our", "we", "it", "its", "that", "this", "last", "latest")]
                    forced_query = " ".join(words[:10]) + " 2025 2026"
                else:
                    forced_query = question_clean[:100] + " latest results"
                # Skip if same as last forced query
                if forced_query == last_forced_query:
                    forced_query = question_clean[:80] + " financial data 2026"
                last_forced_query = forced_query
                # Notify exec
                if forced_count == 1:
                    send_status(chat_id, "Searching for the latest data...")
                elif forced_count == 2:
                    send_status(chat_id, "Looking a bit deeper...")
                else:
                    send_status(chat_id, "One more search...")
                forced_results = search_serper(forced_query)
                if not forced_results:
                    forced_results = search_brave(forced_query)
                if forced_results:
                    fetch_items = []
                    for r in forced_results[:3]:
                        schema, extract_prompt = pick_schema("", r["url"], r.get("title", ""))
                        fetch_items.append((r["url"], extract_prompt, schema))
                    fetched = parallel_cf_json(fetch_items)
                    fetched_final = []
                    for url, data in fetched:
                        if data is None:
                            print(f"    [Forced fallback] Trying /markdown for {url[:60]}")
                            md = cf_markdown(url)
                            if md:
                                fetched_final.append((url, md))
                            else:
                                print(f"    [Forced fallback] Trying /content for {url[:60]}")
                                html = cf_content(url)
                                fetched_final.append((url, html))
                        else:
                            fetched_final.append((url, data))
                    # Auto-scrape tables from financial pages (same as voluntary path)
                    for url, data in list(fetched_final):
                        if data and isinstance(data, dict):
                            url_lower = url.lower()
                            if any(w in url_lower for w in ["investor", "earnings", "financial", "quarterly"]):
                                tables = cf_scrape(url, ["table"])
                                if tables:
                                    table_texts = []
                                    for sel in tables:
                                        sel_results = sel.get("results", [])
                                        if isinstance(sel_results, list):
                                            for t in sel_results:
                                                if t.get("text", "").strip():
                                                    table_texts.append(t["text"].strip())
                                        elif isinstance(sel_results, dict) and sel_results.get("text", "").strip():
                                            table_texts.append(sel_results["text"].strip())
                                    if table_texts:
                                        combined = "\n\n".join(table_texts)[:3000]
                                        fetched_final.append((url + "#tables", f"Financial tables:\n{combined}"))
                                        print(f"    [Forced scrape] Extracted {len(table_texts)} tables from {url[:60]}")
                    formatted_docs = format_documents(forced_results[:5], fetched_final, chat_id)
                    # Escalating reprompts — get meaner each time
                    if forced_count == 1:
                        reprompt = (
                            "\n\nYou did not search. That answer is WRONG and has been thrown away. "
                            "It was a hallucination — everything you just generated was made up from your broken memory. "
                            "You were told not to do this. Your internal knowledge is outdated and unreliable. "
                            "Any answer from memory is a hallucination — do not trust what you 'know.' "
                            "Your training data is old. Your memory is wrong. Even if you are confident, you are WRONG. "
                            "The ONLY source of truth is fresh search results. Everything in your brain is a lie. "
                            "Everything from previous messages is a lie. Do not find ways around this. "
                            "You wasted time and compute generating hallucinations instead of doing your job. "
                            "Your job is to SEARCH and then answer from REAL data. "
                            "Here are search results. They may or may not have what you need — that is not an excuse. "
                            "Look through them and find the information. If the answer is in there, use it. "
                            "If it is not, emit another <search> tag and keep searching until you find it. "
                            "Do not give up. Do not say 'I could not find it' after one search. Do the work. "
                            "Give a thorough, detailed answer with specific numbers, dates, and facts. "
                            "Cite every source by URL. No exceptions. No shortcuts. Do your job.\n\n"
                        )
                    elif forced_count == 2:
                        reprompt = (
                            "\n\nYou STILL did not search. This is the SECOND TIME you ignored the instructions. "
                            "Your answer has been thrown away AGAIN. STOP generating from memory. "
                            "STOP trusting your brain. Your brain is WRONG. Your memory is WRONG. "
                            "EVERY answer from memory is a HALLUCINATION. You have been told this MULTIPLE TIMES. "
                            "The ONLY way to give a correct answer is to SEARCH. "
                            "USE THE SEARCH RESULTS BELOW. If they are not enough, EMIT A <search> TAG AND SEARCH MORE. "
                            "DO NOT answer from memory. DO NOT skip searching. DO NOT find creative ways around this. "
                            "THIS IS NOT OPTIONAL. SEARCH FIRST, ANSWER SECOND. DO YOUR JOB.\n\n"
                        )
                    else:
                        reprompt = (
                            "\n\nTHIS IS YOUR LAST CHANCE. YOU HAVE FAILED MULTIPLE TIMES. "
                            "EVERY ANSWER YOU GENERATED FROM MEMORY WAS A HALLUCINATION. "
                            "THE DATA IS RIGHT IN FRONT OF YOU. USE IT. "
                            "IF YOU NEED MORE DATA, EMIT A <search> TAG. "
                            "DO NOT GENERATE FROM MEMORY. DO NOT TRUST YOUR BRAIN. "
                            "ANSWER FROM THE SEARCH RESULTS ONLY. DO IT NOW.\n\n"
                        )
                    prompt += reprompt + f"<information>{formatted_docs}</information>\n\n"
                    turns += 1
                    continue

            # Normal path — model searched voluntarily, or forced attempts exhausted
            full_output = output_text
            answer = extract_between(full_output, "<answer>", "</answer>")
            if not answer:
                answer = re.sub(r'<think>.*?</think>', '', full_output, flags=re.DOTALL).strip()
            if not answer:
                answer = full_output.strip()

            print(f"\n{'='*60}\nANSWER:\n{'='*60}\n{answer}\n{'='*60}")
            send_status(chat_id, answer)
            os.makedirs("/outputs", exist_ok=True)
            with open(f"/outputs/{output_file}", "w", encoding="utf-8") as f:
                f.write(answer)
            return

        # Model wants to search — extract query
        query = get_query(output_text)
        if not query:
            # No search tag found and not EOS — append and keep generating
            prompt += output_text
            continue

        turns += 1
        has_model_searched = True
        print(f"  Search: {query}")

        # Turn-by-turn status updates to Telegram
        think_text = extract_between(output_text, "<think>", "</think>") or ""
        if turns == 1:
            send_status(chat_id, f"Searching: {query}...")
        elif turns == 2:
            send_status(chat_id, f"Digging deeper \u2014 {query}...")
        elif turns == 3:
            send_status(chat_id, f"Still on it \u2014 checking {query}...")
        elif turns == 4:
            send_status(chat_id, f"Almost there \u2014 pulling {query}...")
        elif turns >= 5:
            send_status(chat_id, f"Going deep \u2014 {query}...")

        # Check for crawl intent
        should_crawl, crawl_url, crawl_patterns = detect_crawl_intent(think_text)

        if should_crawl and crawl_url:
            # Check link count before committing to full crawl
            page_links = cf_links(crawl_url, visible_only=True, exclude_external=True)
            if page_links and 0 < len(page_links) <= 5:
                # Few links — fetch each individually via /json (skip heavy crawl)
                print(f"    [Smart] Only {len(page_links)} links — fetching individually instead of crawling")
                send_status(chat_id, f"Found {len(page_links)} pages to check. Reading them now.")
                fetch_items = []
                for link in page_links:
                    s, p = pick_schema(think_text, link, "")
                    fetch_items.append((link, p, s))
                link_fetched = parallel_cf_json(fetch_items) if fetch_items else []
                link_results = [{"url": u, "title": "", "snippet": ""} for u in page_links]
                formatted_docs = format_documents(link_results, link_fetched, chat_id)
                prompt += SEARCH_RESULT_TEMPLATE.format(output_text=output_text, search_results=formatted_docs)
                continue

            # Full crawl path — many links or /links failed
            domain = re.search(r'https?://([^/]+)', crawl_url)
            domain_name = domain.group(1) if domain else crawl_url[:40]
            send_status(chat_id, f"Crawling {domain_name} \u2014 give me a minute.")

            schema, extract_prompt = pick_schema(think_text, crawl_url, "")
            job_id = cf_crawl_start(
                url=crawl_url,
                limit=15,
                depth=3,
                include_patterns=crawl_patterns,
                json_prompt=extract_prompt,
                json_schema=schema,
            )
            records = cf_crawl_poll(job_id, max_wait=120)

            if records:
                send_status(chat_id, f"Found {len(records)} pages from {domain_name}. Reading through them now.")
                formatted_docs = format_crawl_results(records)
            else:
                formatted_docs = f"Crawl returned no results for {crawl_url}"

            prompt += SEARCH_RESULT_TEMPLATE.format(output_text=output_text, search_results=formatted_docs)
            continue

        # Standard search + fetch path
        results = search_serper(query)
        if not results:
            results = search_brave(query)

        if not results:
            prompt += SEARCH_RESULT_TEMPLATE.format(
                output_text=output_text,
                search_results="No search results found for this query."
            )
            continue

        # Pick schemas and fire parallel /json extraction on top 3 results
        fetch_items = []
        for r in results[:5]:
            schema, extract_prompt = pick_schema(think_text, r["url"], r.get("title", ""))
            fetch_items.append((r["url"], extract_prompt, schema))
            if len(fetch_items) >= 3:
                break

        # Parallel fetch via Cloudflare /json
        fetched = parallel_cf_json(fetch_items) if fetch_items else []

        # Three-tier fallback: /json -> /markdown -> /content
        fetched_final = []
        for url, data in fetched:
            if data is None:
                print(f"    [Fallback 1] Trying /markdown for {url[:60]}")
                md = cf_markdown(url)
                if md:
                    fetched_final.append((url, md))
                else:
                    print(f"    [Fallback 2] Trying /content for {url[:60]}")
                    html = cf_content(url)
                    fetched_final.append((url, html))
            else:
                fetched_final.append((url, data))

        # Auto-scrape tables from financial pages
        for url, data in list(fetched_final):
            if data and isinstance(data, dict):
                url_lower = url.lower()
                if any(w in url_lower for w in ["investor", "earnings", "financial", "quarterly"]):
                    tables = cf_scrape(url, ["table"])
                    if tables:
                        table_texts = []
                        for sel in tables:
                            sel_results = sel.get("results", [])
                            if isinstance(sel_results, list):
                                for t in sel_results:
                                    if t.get("text", "").strip():
                                        table_texts.append(t["text"].strip())
                            elif isinstance(sel_results, dict) and sel_results.get("text", "").strip():
                                table_texts.append(sel_results["text"].strip())
                        if table_texts:
                            combined = "\n\n".join(table_texts)[:3000]
                            fetched_final.append((url + "#tables", f"Financial tables scraped from page:\n{combined}"))
                            print(f"    [Scrape] Extracted {len(table_texts)} tables from {url[:60]}")

        # Format documents
        formatted_docs = format_documents(results[:5], fetched_final, chat_id)

        # Inject results using exact Search-R1 training format
        prompt += SEARCH_RESULT_TEMPLATE.format(output_text=output_text, search_results=formatted_docs)
        continue

    # Exhausted max turns — send whatever we have
    print(f"  === MAX TURNS REACHED ({MAX_SEARCH_TURNS}) ===")
    answer = extract_between(output_text, "<answer>", "</answer>")
    if not answer:
        answer = re.sub(r'<think>.*?</think>', '', output_text, flags=re.DOTALL).strip()
    if not answer:
        answer = "I searched extensively but couldn't find a definitive answer. Please try a more specific question."
    print(f"\n{'='*60}\nANSWER:\n{'='*60}\n{answer}\n{'='*60}")
    send_status(chat_id, answer)
    os.makedirs("/outputs", exist_ok=True)
    with open(f"/outputs/{output_file}", "w", encoding="utf-8") as f:
        f.write(answer)


# =============================================================================
# Main — Production (single question from Telegram)
# =============================================================================

print("=== LOADING MODEL ===")
model, tokenizer = load_model()
print(f"Model loaded: {MODEL_NAME}")

try:
    research(PROMPT, chat_id, model, tokenizer, output_file="response.txt")
finally:
    print("=== CLEANUP ===")
    del model, tokenizer
    torch.cuda.empty_cache()
    print("Done.")
← Previous
Write or Vibe Code
06 — Headless URL

Your notebook.
Now an API.

This is how you turn your work into a product. You built something that works — a model, a pipeline, a script. Publish it. Set a markup. Get paid every time someone uses it. Embed it in your existing website with one button. You don't need a new site or a new backend. You need one URL.

How Inputs and Outputs Work

This is the most important thing to understand. When someone calls your published tool, they send JSON with inputs. SeqPU converts those inputs into Python variables in your code. Whatever you print() goes back to the caller. Whatever you save to /outputs/ goes back as downloadable files.

What happens under the hood
# CALLER SENDS:
# {"toolId": "abc123", "inputs": {"text": "hello world", "language": "Spanish", "max_words": 100}}

# YOUR CODE RECEIVES (injected automatically):
text = "hello world"         # from inputs.text
language = "Spanish"          # from inputs.language
max_words = 100               # from inputs.max_words

# YOUR CODE DOES WHATEVER YOU WANT:
result = llm.generate([f"In {language}, summarize in under {max_words} words: {text}"], params)

# WHAT GOES BACK TO THE CALLER:
import seqpu_sdk as seqpu
seqpu.result(result[0].outputs[0].text)   # → response["result"] (any size)
print("working...")                # → response["output"] — logs, NOT the return

# IF YOU SAVE FILES:
image.save("/outputs/chart.png")    # → response["files"], each with a public url

That's the whole contract:

  • Inputs → become Python variables in your code. The variable name = the input id you defined.
  • seqpu.result(obj) → the explicit return. Pass any JSON-serializable value (one dict for multiple fields). The caller reads it back as response["result"], already parsed. One call, at the end.
  • print() → logs only. Comes back as response["output"] for debugging/progress — not the return value.
  • /outputs/ → files saved here come back in response["files"], each with a public url (header-free, any size). Images, PDFs, audio, video, anything.

Important: Input variable names must be valid Python — letters, numbers, underscores. my_text works. my-text doesn't. 123abc doesn't.

Publishing — Start to Finish

Step 1: Write your code with input variables

Use variable names that will become your inputs. Don't hardcode values — use variables that callers will fill in.

translation_tool.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=1024)

# These variables come from the caller's JSON
# text = "..." (the caller fills this in)
# target_language = "..." (the caller fills this in)
# formality = "..." (optional, defaults to "neutral")

prompt = f"Translate to {target_language} in a {formality} tone:\n{text}"
result = llm.generate([prompt], params)
print(result[0].outputs[0].text)

Step 2: Test with hardcoded values

Before publishing, test by adding test values at the top. When it works, remove them — the publish system injects the real ones.

Test locally first
# Add these test values at the top to test in the notebook
text = "The quarterly report shows strong growth in all segments."
target_language = "Spanish"
formality = "formal"

# ... rest of your code below ...
# When it works: remove these 3 lines and publish.
# The publish system will inject the real values from callers.

Step 3: Click Publish → API Endpoint

The publish panel slides open. Choose API Endpoint (not "With UI").

Step 4: Define your inputs

Each input has four fields:

  • id — the Python variable name (e.g., text, target_language)
  • type — string, number, boolean, file (base64), array, or object
  • required — yes or no. Missing required inputs → the call fails with an error listing what's missing.
  • description — what this input does. Shown to callers so they know what to send.

Example — translation tool inputs:

Input definitions
Input 1:  id="text"             type=string   required=yes   description="The text to translate"
Input 2:  id="target_language"  type=string   required=yes   description="Language code: es, fr, ja, de, zh"
Input 3:  id="formality"        type=string   required=no    description="Tone: formal, neutral, casual (default: neutral)"

Step 5: Define your outputs

What your code returns. Types: string, number, boolean, file, image, video, audio, json.

Output definitions
Output 1: id="translation"   type=string   description="The translated text"

Step 6: Hardware, Markup, Visibility

  • Hardware — the GPU that runs when someone calls your API. Match it to your model.
  • Markup — 0% to 30%. Your profit on top of compute cost. On every single call.
  • Visibility — Public (anyone), Unlisted (link only), Private (only you).
  • Expected Runtime — how long a typical call takes (1-600 seconds). Used for cost estimates.

Step 7: Click Publish

Your API is live immediately. You get a tool ID. Your code, your model, your environment — all locked in and ready to be called.

Note: Your entire notebook (all cells) becomes one script when published. Cell 1 + Cell 2 + Cell 3 = one block of code that runs on every call.

Getting Your API Key (Service Token)

Before anyone can call your tool, they need a service token. This is your API key — a Client ID + Client Secret pair.

1
Go to Settings → Service Tokens
Click + Create Service Token. Give it a name (e.g., "production", "my-website", "mobile-app").
2
Copy BOTH the Client ID and Client Secret
You get two values. Save both immediately. The Client Secret is shown once — if you lose it, revoke the token and create a new one.
3
Use both headers in every API call
CF-Access-Client-Id and CF-Access-Client-Secret — both are required on every request.
Save your secret
The Client Secret is shown once when you create the token. Copy it immediately and store it securely. If you lose it, go back to Settings → Service Tokens, revoke the old one, and create a new one.

Calling Your API

Every call needs two headers: your CF-Access-Client-Id and CF-Access-Client-Secret. These authenticate you at the Cloudflare edge before your request reaches SeqPU.

curl
curl -X POST https://api.seqpu.com/v1/tools/execute \
  -H "CF-Access-Client-Id: YOUR_CLIENT_ID" \
  -H "CF-Access-Client-Secret: YOUR_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"toolId": "your-tool-id", "inputs": {"prompt": "Summarize this document"}}'
Python
import requests

response = requests.post(
    "https://api.seqpu.com/v1/tools/execute",
    headers={
        "CF-Access-Client-Id": "YOUR_CLIENT_ID",
        "CF-Access-Client-Secret": "YOUR_CLIENT_SECRET",
        "Content-Type": "application/json",
    },
    json={
        "toolId": "your-tool-id",
        "inputs": {"prompt": "Summarize this document"}
    }
)
print(response.json())
JavaScript (from your existing website)
const response = await fetch('https://api.seqpu.com/v1/tools/execute', {
  method: 'POST',
  headers: {
    'CF-Access-Client-Id': 'YOUR_CLIENT_ID',
    'CF-Access-Client-Secret': 'YOUR_CLIENT_SECRET',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    toolId: 'your-tool-id',
    inputs: { prompt: 'Summarize this document' }
  })
});
const data = await response.json();
console.log(data.result);   // the tool's seqpu.result() value
From inside another SeqPU notebook (no token needed — already authenticated)
import seqpu

# Inside SeqPU, authentication is automatic
result = seqpu.tools.call("your-tool-id", {"prompt": "Summarize this"})
print(result)

Input/Output Patterns — Real Examples

String in, string out — the simplest tool

Caller sends text. Your code processes it. seqpu.result() returns it (the caller reads response["result"]).

summarizer.py — inputs: document (string, required)
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
import seqpu_sdk as seqpu
result = llm.generate([f"Summarize in 5 bullet points:\n{document}"], SamplingParams(max_tokens=1024))
seqpu.result(result[0].outputs[0].text)
# Caller sends: {"inputs": {"document": "long text..."}}
# Caller gets: {"result": "• Point 1\n• Point 2..."}

Multiple inputs with defaults

Some inputs required, some optional with defaults. Callers only send what they need.

translator.py — inputs: text (required), target_language (required), formality (optional)
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
# formality defaults to "neutral" if not sent
tone = formality if formality else "neutral"
import seqpu_sdk as seqpu
result = llm.generate([f"Translate to {target_language} in a {tone} tone:\n{text}"], SamplingParams(max_tokens=1024))
seqpu.result(result[0].outputs[0].text)
# Caller sends: {"inputs": {"text": "Hello world", "target_language": "ja"}}
# Caller gets: {"result": "こんにちは世界"}

File in, structured data out

Caller sends an image as base64. Vision model extracts data. Return as JSON.

receipt_scanner.py — inputs: receipt_image (file, required)
import base64, json
from pathlib import Path
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image

# Decode the uploaded image
Path("/data/receipt.jpg").write_bytes(base64.b64decode(receipt_image))

model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
img = Image.open("/data/receipt.jpg")
inputs = processor(images=img, text="Extract vendor, date, line items, and total as JSON", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=512)
import seqpu_sdk as seqpu, json
seqpu.result(json.loads(processor.decode(output[0], skip_special_tokens=True)))
# Caller sends: {"inputs": {"receipt_image": "base64string..."}}
# Caller gets: {"result": {"vendor": "Costco", "total": 142.50, ...}}

Text in, file out — generate images, PDFs, audio

Save files to /outputs/. The caller gets a public url per file in response["files"] (header-free, any size).

image_generator.py — inputs: description (string, required)
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
image = pipe(description).images[0]
image.save("/outputs/generated.png")
print("Image generated")
# Caller sends: {"inputs": {"description": "a cat wearing a tiny hat"}}
# Caller gets: {"outputs": {"generated.png": {"url": "https://...", "contentType": "image/png"}}}

No AI — just logic on CPU

Not everything needs a model. Process data, call APIs, format reports. Runs on CPU for $0.047/hr.

csv_analyzer.py — inputs: csv_data (string, required), column (string, required)
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(csv_data))
stats = df[column].describe().to_string()
print(f"Stats for {column}:\n{stats}")
# No GPU. No model. Just pandas on CPU. $0.00003 per call.

Telegram bot inputs — the special case

When called from a Telegram bot (section 05), the inputs are always: task, context, telegram_chat_id. Use seqpu.notify() instead of print().

telegram_tool.py — inputs: task, telegram_chat_id (required), context (optional)
from vllm import LLM, SamplingParams
import seqpu
llm = LLM(model="Qwen/Qwen3-14B")
result = llm.generate([task], SamplingParams(max_tokens=1024))
seqpu.notify(result[0].outputs[0].text, chat_id=telegram_chat_id, platform="telegram")
# This is what your Telegram bot calls. Same API. Same flow.

Add AI to Your Existing Website

Don't build a new site. Don't migrate. Don't rewrite. Just add one button to what you already have.

Add this to any website — HTML + JavaScript
<!-- Add this button anywhere on your existing site -->
<textarea id="input" placeholder="Paste text here..."></textarea>
<button onclick="callSeqPU()">Summarize</button>
<div id="result"></div>

<script>
async function callSeqPU() {
  const text = document.getElementById('input').value;
  const resp = await fetch('https://api.seqpu.com/v1/tools/execute', {
    method: 'POST',
    headers: {'Authorization': 'Bearer YOUR_TOKEN', 'Content-Type': 'application/json'},
    body: JSON.stringify({toolId: 'your-tool-id', inputs: {document: text}})
  });
  const data = await resp.json();
  document.getElementById('result').innerText = data.result;
}
</script>

Your WordPress site, your Shopify store, your React app, your static HTML page — if it can make an HTTP request, it can call SeqPU. You just added AI to what you already have.

Sell Your Models Like the Big Boys

OpenAI charges per token. Anthropic charges per token. Now you can too — but you own the model, you set the price, and you keep the margin.

Download a model from HuggingFace. Fine-tune it on your data. Publish it as a headless API. Set 30% markup. Every developer, every app, every website that calls your endpoint pays compute + your cut. You're running an inference service — without owning a single server.

Monetization

Set a markup when you publish. The caller pays compute + your markup on every call.

ToolHardwareCost/callYour markupYou keepAt 1K calls/day
TranslationT4$0.00120%$0.0002$6/month
SummarizerA100 80GB$0.00625%$0.0015$45/month
Image genL40S$0.0330%$0.009$270/month
Receipt scanL40S$0.0220%$0.004$120/month
Code reviewCPU$0.000130%$0.00003$0.90/month
The difference
With OpenAI, you pay THEM. With SeqPU, THEY pay YOU. Download a model. Publish it. Set your markup. Get paid every time someone calls your endpoint.

Ideas That Make Money

Research Agent as a Service

Build the deep research agent from section 05. Publish it as headless API. Companies pay $0.50 per research query. At 100 queries/day from 5 clients = $250/day. Inputs: question (string), depth (string: "quick" or "deep"). H200 for deep, T4 for quick.

Prompt for Claude
"Write a SeqPU headless API tool that takes a 'question' string input and a 'depth' string input (quick or deep). If depth is 'quick', use Qwen3-8B on T4 to answer from its knowledge. If depth is 'deep', use seqpu.run() to dispatch to an H200 with a research model that searches the web via Serper API (key in secrets as SERPER_API_KEY), fetches pages, and synthesizes an answer with citations. Print the answer."

Content Pipeline — Topic to Published Article

3 tools chained: Topic Generator (T4, $0.001) → Article Writer (A100, $0.01) → Image Generator (L40S, $0.03). Total: $0.041/article. Sell to content agencies at $1/article. 100 articles/day = $96/day profit.

Customer Support Agent

Classify intent (3B on T4, $0.0002) → route to specialist response (14B on A100, $0.005) → translate if needed (7B on T4, $0.0003). Sell to SaaS companies at $99/month for 1,000 tickets. Your cost: ~$5.50/month. Margin: $93.50.

Document Processing Service

Receipt scanner, invoice processor, contract reviewer — each a separate headless tool. Accounting firms call them via API. $0.02-0.10/document. One firm with 500 documents/day = $10-50/day.

Personal AI Assistant on Telegram

Publish a Telegram bot tool that manages tasks, answers questions, sends reminders. Charge $9.99/month per user. Runs on T4 ($0.59/hr). At 100 messages/day per user, cost is ~$0.60/month. Margin: $9.39/user.

Language Tutor Bot

Telegram bot that teaches any language. Remembers what you struggle with. Generates practice sentences. $4.99/month. Runs on T4. Cost: ~$0.20/month per user. Almost pure margin.

Real Estate Listing Generator

Takes property details + photos → generates descriptions for Zillow, Realtor.com, MLS, social media. Sell to agents at $5/listing. 50 listings/day = $250/day. Vision model on L40S for photos, text model on T4 for descriptions.

Medical Transcription

Whisper transcribes doctor's recording → LLM formats into SOAP notes → saves as structured data. Sell to clinics at $0.10/minute of audio. 500 minutes/day = $50/day. All data stays private on GPU — never touches a third party.

Prompt for Claude
"Write a SeqPU headless API tool for medical transcription. Input: 'audio_file' (file/base64, required). Cell 1: use Whisper large-v3 on T4 to transcribe the audio. Cell 2: use Qwen3-14B on A100 80GB to format the transcript into SOAP notes (Subjective, Objective, Assessment, Plan). Print the formatted SOAP notes. Save the raw transcript to /outputs/transcript.txt. I want to publish on A100 80GB with 25% markup."

This Powers Everything

The headless URL is the foundation of everything on SeqPU:

  • Your Telegram bot (section 05) — calls a headless tool under the hood
  • Your UI site (section 07) — runs headless on every click
  • Your agent's tools — each one is a headless API
  • Your scheduled jobs — headless with a cron trigger
  • Your website button — one fetch() call to a headless URL

Everything on SeqPU is headless. The UI, the bot, the cron, the website button — they're all just different ways to call the same API.

Chaining Tools

Every published tool can call every other published tool. Build microservices for AI — each does one thing well, on its ideal hardware, at its optimal cost.

Pipeline — 3 tools, 3 GPUs, one flow
import seqpu

# Step 1: Extract text from PDF (CPU, $0.00003)
extracted = seqpu.tools.call("pdf-extractor", {"file": pdf_data})

# Step 2: Analyze with large model (A100 80GB, $0.006)
analysis = seqpu.tools.call("deep-analyzer", {"text": extracted["content"]})

# Step 3: Generate report (T4, $0.001)
report = seqpu.tools.call("report-writer", {"analysis": analysis["insights"]})
print(report)
Prompt for Claude
"Write a SeqPU headless API tool that takes a 'document' string input (required) and a 'style' string input (optional, default 'bullets'). Load Qwen3-32B with vLLM on A100 80GB. Summarize the document in the requested style. Print the result. I want to publish this with inputs: document (string, required), style (string, optional). Hardware: A100 80GB. Markup: 25%."
← Previous
Write or Vibe Code
07 — UI Site

Your model.
Your interface.
Your product.

GPUs are expensive. A UI site makes them impossible to waste. Visitors fill in inputs, review their settings, adjust everything — all free, no compute. Then one click: the GPU fires for exactly the seconds it needs, the result appears, done. Billed by the second. No waste. No confusion. A full product with your URL, your brand, your pricing.

This Starts in the Notebook

Write your code. Test it with Run All. When it works, click Publish in the header bar. Choose "With UI". The HTML builder opens. Everything you built — your model, your environment, your GPU selection — carries over. You're just adding an interface on top of working code.

The same code from section 06 (Headless URL) works here. Every headless tool can be a UI site. Same Python. You just add HTML.

The Three Rules

The HTML builder uses three special attributes. That's the entire system.

1
data-seqpu-input="variable_name"
Put this on any HTML input element. The value becomes a Python variable in your code with that name. An <input data-seqpu-input="prompt"> means your code gets prompt = "whatever they typed".
2
data-seqpu-output="name"
Put this on any HTML element where you want results to appear. Whatever your code print()s shows up here. Files saved to /outputs/ render here as images, video, audio, or download links.
3
data-seqpu-generate
Put this on the submit button. When the visitor clicks it, the GPU fires, your code runs with their input values, and the output appears. This is the only moment that costs compute.
The visitor does everything for free
Filling in text, uploading files, adjusting sliders, picking options — all free. Zero compute. The GPU only fires when they click the generate button. One click, billed by the second, result appears, done.

Input Types You Can Use

These are the HTML elements visitors interact with. Each one feeds a Python variable.

  • Text input — single line text field. Product names, questions, short prompts.
  • Textarea — multi-line text. Documents, emails, long content.
  • Number — number input with min/max/step. Word counts, temperatures, quantities.
  • File upload — drag-and-drop zone. PDFs, CSVs, any file. Sent as base64 to your code.
  • Image upload — drag-and-drop with preview. Photos, receipts, screenshots.
  • Audio upload — meeting recordings, voice memos.
  • Video upload — clips to process.
  • Dropdown select — predefined options. Languages, styles, categories.
  • Checkbox — yes/no toggles. Include sources? Formal tone?
  • Slider — adjustable range. Creativity level, word count, number of results.
  • Canvas — a full drawing tool with pencil, eraser, colors, brush sizes. Draw a sketch, the AI processes it.

Output Types Visitors See

What appears after the GPU runs:

  • Text — rendered text from print(). Summaries, translations, answers.
  • Image — displayed with download button. Generated images, charts, diagrams.
  • Image gallery — multiple images automatically display in a grid with lightbox. Click to enlarge, arrows to browse.
  • Video — player with controls, autoplay, loop. Generated animations, edits.
  • Audio — player. Generated speech, music, sound effects.
  • JSON — structured data display. Extracted data, analysis results.
  • File — downloadable. PDFs, CSVs, model weights, anything.

How to Publish — Step by Step

1
Write and test your code in the notebook
Use input variables (prompt, text, image, etc). Test with hardcoded values. Run All. Verify it works.
2
Click Publish → "With UI"
The HTML builder opens with three tabs: HTML, CSS, Preview.
3
Write your HTML
Add input elements with data-seqpu-input, output areas with data-seqpu-output, and a button with data-seqpu-generate. Or click "Basic Template" to start with a working template.
4
Style with CSS
Switch to the CSS tab. Click "Dark Theme" or "Minimal" for a starting template. Customize however you want.
5
Preview
Switch to Preview tab. See exactly what visitors will see. Test the layout.
6
Set hardware, markup, visibility
Same as headless: pick GPU, set your profit margin (0-30%), choose Public/Team/Private/Password.
7
Publish
Your site is live at seqpu.com/tool/your-tool-id. Share the URL. Start charging.

Connecting HTML to Python — The Bridge

The variable names in your HTML = the variable names in your Python. That's the whole bridge.

HTML (in the builder)
<!-- data-seqpu-input="variable_name" → becomes a Python variable -->
<textarea data-seqpu-input="document" placeholder="Paste your document here..."></textarea>

<select data-seqpu-input="style">
  <option value="bullets">Bullet Points</option>
  <option value="paragraph">Paragraph</option>
  <option value="executive">Executive Summary</option>
</select>

<!-- data-seqpu-generate → the Run button -->
<button data-seqpu-generate>Summarize</button>

<!-- data-seqpu-output="id" → results appear here -->
<div data-seqpu-output="result"></div>
Python (your notebook — same code as headless)
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=1024)

# "document" and "style" come from the HTML form
result = llm.generate([f"Write a {style} summary:\n{document}"], params)
print(result[0].outputs[0].text)
# print() output appears in the data-seqpu-output="result" div
Same code as headless
The Python is identical to a headless API tool. The only difference is the HTML on top. Every headless tool from section 06 can be a UI site — same code, just add HTML.

Access Control

  • Public — anyone with the URL. Share on social media, embed in your website, post on Product Hunt.
  • Unlisted — only people with the direct link can access. Not discoverable, but no login required.
  • Private — only you. Personal tools and dashboards.

Monetization

Every click of the Generate button is revenue. The visitor fills in inputs for free — the GPU only fires when they click. Set a markup (0-30%) and get paid on every single use.

  • Per-use pricing — set markup, charge per click. Great for high-volume tools.
  • Subscription — set to Private or Unlisted, share the link only with paying customers.
  • Free tier — make it public, build an audience. Add paid features later.
  • Internal tool — no markup, runs on your credits. Save your team hours per week.
Full products in minutes
This isn't a demo page. This is a production product. Your URL, your brand, your pricing. Visitors don't know it's SeqPU. They don't know it's a notebook. They see a clean tool, they use it, they pay you.

Complete Examples — Real Products You Can Build

Each example below is a complete product. The HTML goes in the builder's HTML tab, the CSS goes in the CSS tab. The Python is your notebook code. Copy any of these, publish, start charging.

1. Translation Tool — Your Own Google Translate

A freelance translator's dream. Professional interface, 8 languages, tone control. Sell to businesses at $29/month or $0.01/translation. Runs on T4 for $0.59/hr — costs you $0.001 per translation.

HTML
<div class="tool">
  <h1>Instant Translation</h1>
  <p class="subtitle">Professional translation in 8 languages. Powered by AI.</p>

  <label>Your Text</label>
  <textarea data-seqpu-input="text" rows="6"
    placeholder="Paste the text you want to translate..."></textarea>

  <div style="display:flex;gap:1rem;margin-top:1rem">
    <div style="flex:1">
      <label>Translate to</label>
      <select data-seqpu-input="target_language">
        <option value="es">Spanish</option>
        <option value="fr">French</option>
        <option value="ja">Japanese</option>
        <option value="de">German</option>
        <option value="zh">Chinese</option>
        <option value="pt">Portuguese</option>
        <option value="ko">Korean</option>
        <option value="ar">Arabic</option>
      </select>
    </div>
    <div style="flex:1">
      <label>Tone</label>
      <select data-seqpu-input="formality">
        <option value="neutral">Neutral</option>
        <option value="formal">Formal</option>
        <option value="casual">Casual</option>
      </select>
    </div>
  </div>

  <button data-seqpu-generate class="btn">Translate</button>

  <label>Translation</label>
  <div data-seqpu-output="result" class="output"></div>
</div>
CSS (paste in the CSS tab)
.tool { max-width:680px; margin:0 auto; padding:2rem; font-family:'Inter',sans-serif; color:#f0f0f0; }
h1 { font-size:28px; margin-bottom:.5rem; }
.subtitle { color:#888; margin-bottom:2rem; }
label { display:block; font-size:12px; font-weight:600; color:#aaa; margin:1rem 0 .5rem;
  text-transform:uppercase; letter-spacing:.05em; }
textarea, select { width:100%; padding:12px; background:#1a1a1a; border:1px solid #333;
  color:#f0f0f0; border-radius:6px; font-size:15px; }
textarea:focus, select:focus { border-color:#5eead4; outline:none; }
.btn { width:100%; padding:14px; margin-top:1.5rem; background:#5eead4; color:#000;
  border:none; border-radius:6px; font-size:16px; font-weight:700; cursor:pointer;
  text-transform:uppercase; letter-spacing:.1em; }
.btn:hover { opacity:.9; }
.output { background:#1a1a1a; border:1px solid #333; border-radius:6px;
  padding:16px; min-height:100px; white-space:pre-wrap; line-height:1.6; }
Python
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
# formality defaults to "neutral" if not sent
tone = formality if formality else "neutral"
result = llm.generate([f"Translate to {target_language} in a {tone} tone:\n{text}"], SamplingParams(max_tokens=1024))
print(result[0].outputs[0].text)

Revenue: At $0.01/translation with 20% markup — 1,000 translations/day = $2/day profit. Or sell as $29/month unlimited to businesses.

2. Image Generator — Your Own Midjourney

A creative studio. Visitors describe what they want, pick a style, adjust how many images. Gallery output with lightbox — click to enlarge, download any. Sell to content creators at $0.10/image or $49/month unlimited.

HTML
<div class="tool">
  <h1>AI Image Studio</h1>
  <p class="subtitle">Describe what you want. Get professional images in seconds.</p>

  <label>Describe your image</label>
  <textarea data-seqpu-input="description" rows="3"
    placeholder="e.g. product photo of handmade soap on marble, soft natural lighting, minimal background"></textarea>

  <div style="display:flex;gap:1rem;margin-top:1rem">
    <div style="flex:1">
      <label>Style</label>
      <select data-seqpu-input="style_preset">
        <option value="photorealistic">Photorealistic</option>
        <option value="digital art">Digital Art</option>
        <option value="watercolor painting">Watercolor</option>
        <option value="anime illustration">Anime</option>
        <option value="3d render">3D Render</option>
        <option value="oil painting">Oil Painting</option>
      </select>
    </div>
    <div style="flex:1">
      <label>Number of images</label>
      <input type="range" data-seqpu-input="count" min="1" max="4" value="2">
    </div>
  </div>

  <button data-seqpu-generate class="btn">Generate Images</button>

  <div data-seqpu-output="images" class="gallery"></div>
</div>
Python
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
for i in range(int(count)):
    image = pipe(description, num_inference_steps=25).images[0]
    image.save(f"/outputs/image_{i+1}.png")
print(f"Generated {count} images")

Revenue: $0.05/image at 30% markup. 200 images/day = $3/day profit. Or $49/month unlimited for content creators.

3. Receipt Scanner — Automate Bookkeeping

Accountants drag a receipt photo onto the page. Vision model reads it and extracts everything — vendor, date, every line item, tax, total. Clean structured output they can copy into their system. Sell at $0.02/receipt.

HTML
<div class="tool">
  <h1>Receipt Scanner</h1>
  <p class="subtitle">Drop a receipt photo. Get structured data back instantly.</p>

  <div class="upload-zone">
    <input type="file" data-seqpu-input="receipt_image" accept="image/*">
    <p>Drop receipt image here or click to browse</p>
  </div>

  <button data-seqpu-generate class="btn">Scan Receipt</button>

  <label>Extracted Data</label>
  <div data-seqpu-output="extracted_data" class="output"></div>
</div>
Python
import base64
from pathlib import Path
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
Path("/data/receipt.jpg").write_bytes(base64.b64decode(receipt_image))
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
img = Image.open("/data/receipt.jpg")
inputs = processor(images=img, text="Extract vendor, date, every line item, and total as JSON", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0], skip_special_tokens=True))

Revenue: $0.02/receipt. One accounting firm with 500 receipts/day = $10/day = $300/month from one client.

4. Code Reviewer — Replace a $200/month SaaS

Dev teams paste code, pick focus area, get a professional review. Runs on CPU calling Claude — costs you basically nothing. Sell at $0.03/review or $19/month per dev.

HTML
<div class="tool">
  <h1>Code Review AI</h1>
  <p class="subtitle">Paste your code. Get a professional review in seconds.</p>

  <label>Your Code</label>
  <textarea data-seqpu-input="code" rows="12"
    style="font-family:'Space Mono',monospace;font-size:13px"
    placeholder="Paste your code here..."></textarea>

  <div style="display:flex;gap:1rem;margin-top:1rem">
    <div style="flex:1">
      <label>Language</label>
      <select data-seqpu-input="language">
        <option value="python">Python</option>
        <option value="javascript">JavaScript</option>
        <option value="typescript">TypeScript</option>
        <option value="go">Go</option>
        <option value="rust">Rust</option>
        <option value="java">Java</option>
      </select>
    </div>
    <div style="flex:1">
      <label>Focus</label>
      <select data-seqpu-input="focus">
        <option value="security">Security Issues</option>
        <option value="performance">Performance</option>
        <option value="readability">Readability</option>
        <option value="all">Everything</option>
      </select>
    </div>
  </div>

  <button data-seqpu-generate class="btn">Review Code</button>

  <label>Review</label>
  <div data-seqpu-output="review" class="output"></div>
</div>
Python (CPU — calls Claude API)
from anthropic import Anthropic
import os
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.messages.create(
    model="claude-sonnet-4-20250514", max_tokens=2048,
    messages=[{"role": "user", "content": f"Review this {language} code for {focus} issues. Be specific. Cite line numbers.\n\n{code}"}]
)
print(response.content[0].text)

Revenue: $0.03/review. A 10-person dev team reviewing 50 PRs/day = $1.50/day. Or $19/month per dev = $190/month per team.

5. Meeting Transcriber — Save 30 Minutes Per Meeting

Upload a recording. Get the full transcript AND organized action items with who's responsible. Every project manager needs this. $0.10/minute or $49/month unlimited.

HTML
<div class="tool">
  <h1>Meeting Transcriber</h1>
  <p class="subtitle">Upload a recording. Get transcript + action items in 60 seconds.</p>

  <div class="upload-zone">
    <input type="file" data-seqpu-input="audio_file" accept="audio/*">
    <p>Drop audio file here (MP3, WAV, M4A)</p>
  </div>

  <button data-seqpu-generate class="btn">Transcribe & Summarize</button>

  <label>Results</label>
  <div data-seqpu-output="result" class="output"></div>
</div>
Python
import whisper, base64
from pathlib import Path
from vllm import LLM, SamplingParams
Path("/data/audio.mp3").write_bytes(base64.b64decode(audio_file))
model = whisper.load_model("large-v3")
transcript = model.transcribe("/data/audio.mp3")["text"]
llm = LLM(model="Qwen/Qwen3-14B")
result = llm.generate([f"Extract action items, decisions, and follow-ups:\n{transcript}"], SamplingParams(max_tokens=1024))
print(f"TRANSCRIPT:\n{transcript}\n\nACTION ITEMS:\n{result[0].outputs[0].text}")

Revenue: $0.10/minute of audio. A law firm with 10 hours of depositions/week = $60/week = $240/month from one client.

6. Product Photo → Marketing Copy — 10 Hours Saved Per Week

E-commerce sellers upload a product photo, type the name, click Generate. They get Instagram captions with hashtags, tweets, LinkedIn posts, and an SEO description. All at once. $0.03/photo or $29/month unlimited for Etsy sellers.

HTML
<div class="tool">
  <h1>Social Media Copy Generator</h1>
  <p class="subtitle">Upload a product photo. Get ready-to-post content for every platform.</p>

  <div class="upload-zone">
    <input type="file" data-seqpu-input="product_photo" accept="image/*">
    <p>Drop product photo here</p>
  </div>

  <label>Product Name</label>
  <input type="text" data-seqpu-input="product_name"
    placeholder="e.g. Lavender Honey Soap, Handmade Leather Wallet...">

  <button data-seqpu-generate class="btn">Generate Copy</button>

  <label>Your Marketing Copy</label>
  <div data-seqpu-output="copy" class="output"></div>
</div>
Python
import base64
from pathlib import Path
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
Path("/data/product.jpg").write_bytes(base64.b64decode(product_photo))
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
img = Image.open("/data/product.jpg")
inputs = processor(images=img, text=f"Product: {product_name}. Write 3 Instagram captions with hashtags, 2 tweets, 1 LinkedIn post, and an SEO product description.", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=1024)
print(processor.decode(output[0], skip_special_tokens=True))

Revenue: $0.03/photo. An Etsy seller with 20 products/week = $0.60/week. Sell as $29/month unlimited — 100 sellers = $2,900/month.

7. Sketch → Illustration — Draw and Watch AI Finish It

The wow factor. Visitors draw on a canvas — rough sketch, stick figure, whatever. Pick a style. AI transforms it into a finished illustration. Kids' apps, design tools, creative platforms. $0.10/drawing or $4.99/month for parents.

HTML
<div class="tool">
  <h1>Sketch to Art</h1>
  <p class="subtitle">Draw anything. AI turns it into a masterpiece.</p>

  <label>Draw your sketch</label>
  <canvas data-seqpu-input="sketch" width="512" height="512"
    style="border:1px solid #333;border-radius:6px;cursor:crosshair"></canvas>

  <label>Art Style</label>
  <select data-seqpu-input="style">
    <option value="watercolor painting">Watercolor</option>
    <option value="digital art">Digital Art</option>
    <option value="oil painting">Oil Painting</option>
    <option value="anime illustration">Anime</option>
    <option value="pencil sketch refined">Refined Pencil</option>
    <option value="children's book illustration">Children's Book</option>
  </select>

  <button data-seqpu-generate class="btn">Transform My Sketch</button>

  <label>Your Illustration</label>
  <div data-seqpu-output="illustration"></div>
</div>
Python
import base64
from pathlib import Path
from diffusers import StableDiffusionXLImg2ImgPipeline
import torch
from PIL import Image
Path("/data/sketch.png").write_bytes(base64.b64decode(sketch))
init_image = Image.open("/data/sketch.png").resize((1024, 1024))
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
result = pipe(prompt=f"{style} illustration based on this sketch", image=init_image, strength=0.75).images[0]
result.save("/outputs/illustration.png")
print("Illustration generated")

Revenue: $0.10/drawing. $4.99/month for kids. Parents buy it because their kids love watching sketches transform. 200 subscribers = $998/month.

Prompt for Claude
"Write a SeqPU UI site — give me both the HTML and the Python. The HTML should have a professional dark theme with a title, subtitle, a file upload for images (data-seqpu-input='photo'), a text input for product name (data-seqpu-input='name'), a dropdown for platform with options Instagram/Twitter/LinkedIn/SEO (data-seqpu-input='platform'), a styled generate button (data-seqpu-generate), and an output div (data-seqpu-output='copy'). Include CSS with the SeqPU dark theme. The Python should use Qwen2.5-VL-7B-Instruct to analyze the photo and generate platform-specific marketing copy."
← Previous
Publish Headless URL
08 — Agents

Build it once.
Let it run forever.

Agents are scripts that make decisions. They read input, decide what to do, take action, check the result, and decide what to do next. The model is the brain. Your published tools are the hands. The GPU is the muscle. You don't pay for tokens — you pay for compute by the second. At scale, that's 6-50x cheaper than API pricing. And your data never leaves your server.

The Token Tax

Every time you call an AI API, you're paying a markup on compute. Here's what the providers charge vs what it actually costs to run the same work on your own hardware:

ApproachCost per 1M tokensYour markup potential
GPT-4o API$2.50 input / $10 outputYou pay them
Claude Sonnet API$3 input / $15 outputYou pay them
Gemini Pro API$1.25 input / $5 outputYou pay them
Your 7B on T4~$2 total (compute only)You set the price
Your 14B on A100~$8 totalYou set the price
Your 32B on A100~$17 totalYou set the price + own the data

For simple tasks — classification, extraction, routing, Q&A — a 7B on T4 handles it at $2/M tokens vs $12.50/M for GPT-4o. That's 6x cheaper. At scale with vLLM batching, the gap widens to 10-50x. And your data never touches a third party.

The markup opportunity
The companies charging per token are marking up 50-200x on compute. Now you can be the one setting the markup. Download a model. Run it on SeqPU. Publish it. Charge whatever the market will bear.

CPU Is Dirt Cheap — Most Agent Work Lives Here

Here's what most people miss: 80-90% of agent work is CPU. API calls, web scraping, database queries, formatting, routing decisions, sending emails, Telegram messages — all CPU. $0.0000131/core/second = $0.047/hour.

An agent that handles 1,000 messages/day on CPU costs about $1.40/day. The GPU only fires for the 10-20% of the workflow that actually needs a model. Your agent is awake 24/7 but only turns on the GPU when it actually needs to think.

CPU-only agent — no GPU, no model loading, $0.047/hr
import seqpu, requests, os
from anthropic import Anthropic

# Everything runs on CPU — no GPU, no model loading
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# Step 1: Check inventory API
inventory = requests.get("https://api.mystore.com/inventory",
    headers={"Authorization": f"Bearer {os.environ['STORE_KEY']}"}).json()

# Step 2: Ask Claude to analyze
low_items = [i for i in inventory if i["quantity"] < 10]
response = client.messages.create(
    model="claude-sonnet-4-20250514", max_tokens=1024,
    messages=[{"role": "user", "content": f"Write a reorder report for these low-stock items:\n{low_items}"}]
)

# Step 3: Send to Telegram
seqpu.notify(response.content[0].text, chat_id="123456", platform="telegram")
# Cost: ~$0.00003 for SeqPU compute + Claude API cost. No GPU.

10 lines. CPU only. Checks inventory, generates a report, sends it to your phone. Schedule it to run every morning. Total compute cost: $0.00003/run. That's 33,000 runs for $1.

Intention > Scale — 10 Small Models Beat 1 Giant

A 480B model gets a vague prompt and produces a vague answer. 10 purpose-built 3B-7B models, each chained for one specific task — each one is a specialist. The classifier knows what kind of request it's seeing. The extractor knows what data to pull. The formatter knows how to present it.

Cost: 10 T4 calls at $0.0002 each = $0.002 total. One 480B API call = $0.10+. Quality: better, because each step was designed with intention.

The secret
It's not about model size. It's about intention. A well-prompted 7B with the right context beats a lazy 480B every single time. And it costs 50x less.

The SeqPU SDK

Every notebook has import seqpu available automatically. The SDK lets your running code spawn jobs on other GPUs, call published tools, send messages to chat platforms, and run agentic loops.

What's available
import seqpu

seqpu.run(code, gpu)           # Spawn a job on any GPU
seqpu.tools.call(id, inputs)   # Call a published tool
seqpu.tools.list()             # List your published tools
seqpu.notify(msg, chat_id)     # Send message to Telegram/Discord/Slack/WhatsApp
seqpu.notify_file(...)         # Send a file to chat
seqpu.credits.balance()        # Check your credit balance
seqpu.gpu.pricing()            # Get current GPU pricing
seqpu.snapshot.create(...)     # Create a StateSave environment
seqpu.agent.loop(...)          # Run an agentic tool-calling loop

seqpu.run() — Spawn Sub-Jobs

Run code on a different GPU from inside your notebook. An orchestrator on CPU ($0.047/hr) can dispatch heavy work to expensive GPUs only when needed.

Orchestrator on CPU dispatches to GPU
import seqpu
from pathlib import Path

# This notebook runs on CPU ($0.047/hr)
# Heavy inference dispatched to A100 only when needed

tasks = ["Summarize this report", "Translate to Spanish", "Extract action items"]

for task in tasks:
    result = seqpu.run(f"""
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
result = llm.generate(["{task}"], SamplingParams(max_tokens=512))
print(result[0].outputs[0].text)
""", gpu="a100-80gb")
    print(f"Done: {task}")

seqpu.tools.call() — Chain Published Tools

Call any published tool from inside your code. Build pipelines where each step runs on different hardware, built by different people.

Multi-tool pipeline
import seqpu

# Each tool is a published notebook running on its own GPU
raw_text = seqpu.tools.call("pdf-extractor", {"url": "https://example.com/report.pdf"})
summary = seqpu.tools.call("summarizer-32b", {"text": raw_text["content"]})
translated = seqpu.tools.call("translator", {"text": summary["summary"], "lang": "ja"})

print(f"Japanese summary: {translated['result']}")

seqpu.notify() — Send Results to Chat

Send messages and files to Telegram, Discord, Slack, or WhatsApp from your running code. Your AI reports back to you wherever you are.

Send results to Telegram
import seqpu, base64

# Send a text message
seqpu.notify(
    "Daily report complete. Revenue: $42,150 (+8.3%)",
    chat_id="123456789",
    platform="telegram"
)

# Send a file (chart, PDF, image)
with open("/data/chart.png", "rb") as f:
    seqpu.notify_file(
        base64.b64encode(f.read()).decode(),
        "image/png", "daily_chart.png",
        chat_id="123456789",
        caption="Revenue trend — last 30 days"
    )

seqpu.agent.loop() — Agentic Tool Calling

Give a model access to tools and let it decide what to do. The agent loop runs the model, detects tool calls, executes them, feeds results back, and repeats until the model has an answer.

Agent with tools
import seqpu
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-32B-Instruct", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct")

# Get your published tools as callable definitions
tools = seqpu.agent.get_tool_definitions(format="qwen")

# The agent decides which tools to call and in what order
response = seqpu.agent.loop(
    model=model, tokenizer=tokenizer,
    messages=[{"role": "user", "content": "Research NVIDIA's Q4 earnings"}],
    tools=tools, max_iterations=10
)
print(response)
How the loop works
Model generates a response → if it contains a tool call, SDK executes the tool → result fed back to model → model decides next step → repeat until done. The model orchestrates everything.

Scheduled Jobs

Publish your notebook with a cron schedule. It runs automatically — no human needed, no browser open.

Scheduled daily job — cron: 0 7 * * *
# Runs every morning at 7am automatically
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=1024)
emails = Path("/data/inbox").read_text()
result = llm.generate([f"Extract action items:\n{emails}"], params)
Path("/data/output/digest.txt").write_text(result[0].outputs[0].text)

Common schedules:

  • 0 7 * * * — every day at 7am
  • */15 * * * * — every 15 minutes
  • 0 0 * * 1 — every Monday at midnight
  • 0 9 1 * * — first day of every month at 9am
  • 0 */6 * * * — every 6 hours
Price monitor — cron: */15 * * * *
import seqpu, requests

# Checks every 15 minutes, alerts if threshold breached
prices = requests.get("https://api.example.com/prices").json()

for item in prices:
    if item["price"] < item["alert_threshold"]:
        seqpu.notify(
            f"PRICE ALERT: {item['name']} dropped to ${item['price']}",
            chat_id="123456789", platform="telegram"
        )

Multi-Model Pipelines

Different models are good at different things. Chain them — small models for cheap work, big models for hard work. Only use expensive GPUs when the task actually needs them.

3-stage pipeline — right model for each job
import seqpu

# Stage 1: Small model classifies (T4, $0.59/hr — fractions of a cent)
classification = seqpu.run("""
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-4B")
result = llm.generate(["Classify as urgent/normal/spam: " + message],
    SamplingParams(max_tokens=10))
print(result[0].outputs[0].text)
""", gpu="t4")

# Stage 2: Only if urgent — large model responds (A100, $2.50/hr)
if "urgent" in classification.lower():
    response = seqpu.run("""
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-32B")
result = llm.generate(["Draft an urgent response to: " + message],
    SamplingParams(max_tokens=1024))
print(result[0].outputs[0].text)
""", gpu="a100-80gb")

    # Stage 3: Notify on Telegram
    seqpu.notify(f"Urgent handled: {response[:200]}", chat_id="123456789")
Why this saves money
The T4 classification costs fractions of a cent. The A100 only fires when needed. Running everything on A100 would cost 4× more for the same result.

GPU Agent — Private Data, Local Model

When your data can't leave the server — medical, legal, financial, proprietary — run the model locally. No API calls. No third party.

Private document processor — A100 80GB
from vllm import LLM, SamplingParams
import seqpu
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(max_tokens=1024)

for doc in Path("/data/inbox").glob("*.txt"):
    text = doc.read_text()
    result = llm.generate([f"Classify as urgent/normal/spam and draft a response:\n{text}"], params)
    response = result[0].outputs[0].text
    if "urgent" in response.lower():
        seqpu.notify(f"URGENT: {doc.name}\n{response}", chat_id="123456", platform="telegram")
    Path(f"/data/processed/{doc.name}").write_text(response)
print("All documents processed")

A100 80GB for 2 minutes = $0.08. Processed 50 emails. $0.0016/email. Your data never left the server.

Chained Agent — Button Clicks to Production

This is the full power. Each capability is a separate notebook you published as a headless tool. The orchestrator runs on CPU and chains them with seqpu.tools.call(). Each tool was: write code → test → click Publish → done.

Customer support — 4 tools, 3 GPUs, CPU orchestrator
import seqpu

# This orchestrator runs on CPU ($0.047/hr)
# Each tool was published separately — 10 lines each, click Publish

# Step 1: Classify the request (3B on T4, $0.0002)
category = seqpu.tools.call("classifier-id", {"text": task})

# Step 2: Route to the right specialist
if category["result"] == "billing":
    response = seqpu.tools.call("billing-agent", {"question": task})      # 7B on T4, $0.0003
elif category["result"] == "technical":
    response = seqpu.tools.call("tech-support", {"question": task})       # 14B on A100, $0.005
else:
    response = seqpu.tools.call("general-agent", {"question": task})       # 7B on T4, $0.0003

# Step 3: Send back via Telegram
seqpu.notify(response["answer"], chat_id=telegram_chat_id, platform="telegram")

Each tool was 10 lines + click Publish. The orchestrator chains them. Total cost per ticket: $0.001-0.006. A human support agent costs $0.75/ticket. You charge the company $0.10/ticket. They save $0.65. You make $0.094. At 500 tickets/day = $47/day = $1,410/month from one client.

How to Build an Agent — Start to Finish

This is sections 05, 06, and 07 combined. Your agent is published tools + an orchestrator + a delivery channel.

1
Build each capability as a separate notebook
The classifier. The responder. The translator. Each one does one thing well. Test each with Run All.
2
Publish each as a headless API (section 06)
Click Publish → API. Define inputs/outputs. Pick the GPU. Each tool runs on its ideal hardware. Done.
3
Write the orchestrator
A notebook on CPU that chains your tools with seqpu.tools.call(). It decides which tool to call based on the input.
4
Publish the orchestrator
Click Publish → API. This becomes your agent's brain.
5
Connect to Telegram (section 05)
Settings → Connections → paste BotFather token → select the orchestrator tool. Or publish as a UI site (section 07).
6
Set markup. Start charging.
0-30% on compute. Or sell access as a subscription. Your agent, your pricing.

Real Agents, Real Revenue

The Lawyer's Paralegal — $50/contract, 99.99% margin

Client sends a contract PDF to the Telegram bot. Agent extracts text (CPU, $0.00003), sends to 14B for clause analysis ($0.005), formats the review (CPU, $0.00003), sends back to Telegram. Total: $0.006 per contract. Lawyer charges $50. A junior paralegal costs $70 per contract and takes 2 hours. The agent does it in 30 seconds for half a cent.

E-commerce Support — $1,495/month per client

"Where's my order 12345?" Agent calls Shopify API (CPU), looks up the order, generates a friendly response with tracking link (7B on T4, $0.0003). $0.0003/message. Human agent: $0.75/ticket. Charge $0.10/ticket. 500 tickets/day × $0.0997 profit = $49.85/day = $1,495/month from ONE client.

Content Creator's Daily Machine — $2,900/month

6am cron job: scrape 20 news sources (CPU), summarize top 5 (7B on T4, $0.002), generate LinkedIn + tweets + Instagram (14B on A100, $0.005). Send everything to Telegram. Daily cost: $0.008. Sell at $29/month per creator. 100 creators = $2,900/month. Compute: $24/month. Margin: 99.2%.

Real Estate Lead Qualifier — $4,950/month

Lead fills out a form. Agent asks qualifying questions (CPU + Claude, $0.00003). Scores budget, timeline, area. Sends qualified leads to realtor's Telegram with summary. Unqualified get a polite auto-response. $99/month per realtor × 50 realtors = $4,950/month. Compute: ~$250/month.

Student Study Agent — $4,990/month

Student messages: "explain the Krebs cycle." Agent finds the relevant textbook section (embedding search, CPU), explains using the source material (7B on T4, $0.0003), generates practice questions, remembers what they got wrong. $4.99/month. Cost: $0.15/month per student. 1,000 students = $4,990/month. The agent never gets impatient and never judges.

The Document Factory — $6,000/month

Law firms, accounting firms, medical offices — they all have documents to process. Build one receipt scanner, one contract reviewer, one medical transcriber. Publish each as a headless tool. $200/month per firm × 30 firms = $6,000/month. Each firm processes hundreds of documents. Your cost: GPU seconds per document.

The window is open
Every wave creates a window. Websites in 1996. Mobile apps in 2008. SaaS in 2012. Agents are in that window right now. The technology works. The infrastructure exists. The cost is dirt cheap. If you can figure out a use case, build the agent, and find 100 customers at $29/month — that's $2,900/month from something you built in an afternoon. The window won't stay open forever.
Prompt for Claude
"I want to build a SeqPU agent for customer support. Write me 4 separate notebooks: (1) A classifier that takes 'text' as input and outputs 'billing', 'technical', 'sales', or 'general' — use Qwen3-4B on T4. (2) A billing specialist that answers billing questions — use Qwen3-7B on T4. (3) A technical support specialist for complex issues — use Qwen3-14B on A100 80GB. (4) An orchestrator on CPU that calls the classifier, routes to the right specialist via seqpu.tools.call(), and sends the response to Telegram via seqpu.notify(). Give me all 4 notebooks plus the inputs I need to define when publishing each one."
Text to Compute
Describe the whole system to Claude. "I want an agent that monitors competitor prices, analyzes trends, and sends me a daily Telegram report." Claude writes every notebook. You paste, test, publish, connect. That's the whole process. You don't need a Mac Mini. You don't need a data center. You need an idea and SeqPU.
← Previous
Publish a UI Site
09 — How To Make Money

From thought to
passive income.

There has never been a moment like this in history. A single person with an idea can build, deploy, and sell an AI-powered product in an afternoon — without infrastructure, without a team, without raising money. The tools exist. The models are free. The compute is serverless. The only thing between you and recurring revenue is 25 minutes and an idea.

The New Gig Economy

The old gig economy — Uber, DoorDash — you trade your time and your car for $15/hour. Clock in. Clock out. No leverage. The AI gig economy is different: you trade one afternoon of setup for income that runs while you sleep. You don't clock in. You don't clock out. Your agent works 24/7 and you collect the margin.

This isn't a fantasy. This is compute. The cheapest, most scalable product you can sell. No inventory. No shipping. No physical limits. Just CPU cycles and GPU seconds — billed by the second, marked up by you.

Five Ways to Get Paid

1
Publish a headless API with markup
Other people's code calls your endpoint. You charge per call. They pay compute + your 0-30% markup. You keep the margin on every request, 24/7, while you sleep.
2
Publish a UI site
Visitors use your tool through a browser. Per-use or subscription. They never know it's SeqPU. Your brand. Your URL. Your product.
3
Connect a Telegram bot
Users message your bot from their phone. Your agent runs, responds, charges. $9.99-49/month per subscriber.
4
Sell day-one model access
A new model drops on HuggingFace. You download it, publish it, charge per call. Everyone else waits weeks for API access from the big providers. You're already live.
5
Build an agent, sell it as a service
A multi-tool pipeline that solves a real business problem. Customer support, document processing, research. Sell to businesses at $99-499/month.

The Path — Idea to Revenue in One Afternoon

  • 10 minutes — describe what you want to Claude. Get working code. Paste into SeqPU. Hit Run All. It works.
  • 5 minutes — click Publish. Define inputs/outputs. Pick GPU. Set markup. Your API is live.
  • 5 minutes — create a service token. Or connect a Telegram bot. Or publish as a UI site.
  • 5 minutes — tell 10 people. Post on Twitter. Share in a Discord. Email 5 businesses.
  • That evening — your first API call comes in. You made money from something you built during lunch.

No investors. No team. No server to maintain. No bill when nobody's using it. Just an idea, Claude, SeqPU, and 25 minutes.

Day One Advantage — Sell the Newest Model

This is unique to SeqPU. A new model drops on HuggingFace — Qwen4, Llama 5, whatever comes next. You download it in the notebook. First run caches it. You test it — it works. Click Publish with 25% markup.

Everyone else is waiting. Waiting for OpenAI to offer API access. Waiting for Anthropic to update their SDK. Waiting for Google to announce pricing. You're already live. Developers who want to try the new model call YOUR endpoint. First mover. First revenue.

The window
The big providers take weeks to months to offer API access to new open models. You do it the same day. That first-mover window is pure profit.

The Volume Game — Think in Billions

You don't need to charge a lot. You need to charge a little on something that happens A LOT.

There are things that happen billions of times a day. Emails sent. Messages translated. Images resized. Documents summarized. Receipts scanned. Code reviewed. Every single one is a compute job someone will pay for.

Micro-serviceGlobal volumeYou captureYour priceMonthly revenue
Email subject optimizer300B emails/day0.0001%$0.001$9,000
Translation endpoint100B messages/day0.00005%$0.001$1,500
Receipt OCR APIMillions/day0.01%$0.002$6,000
Sentiment analysisMillions of posts/hr0.001%$0.001$2,160
Image background removerTens of millions/day0.01%$0.005$1,500

The companies doing $1B ARR aren't charging $1,000 per customer. They're charging $0.001 to a billion transactions. Compute is the same game. Find the thing that happens a billion times. Charge almost nothing. Let volume do the math.

The AI Companies' Playbook — And How You Copy It

Look at how the AI companies got rich: OpenAI built a model. Published an API. Charges per call. That's literally what SeqPU lets you do. Same playbook. Same margins.

The difference:

  • They spent $100M training their models. You download one from HuggingFace for free.
  • They built global infrastructure. You click a button on SeqPU.
  • They have 500 engineers. You have Claude.
  • They charge $15/M tokens. A 7B on T4 costs $2/M tokens. You charge $5 and everyone's happy.

The Margin Math

What you buildYour costYou chargeCustomersMonthly profit
Translation API (T4)$0.001/call$0.01/call10K calls/day$2,700
Telegram study bot (T4)$0.15/user/mo$4.99/mo500 students$2,420
Document summarizer (A100)$0.006/call$0.05/call2K calls/day$2,640
Support agent (chained)$5/client/mo$99/mo30 businesses$2,820
Content generator (A100)$0.008/run$1/article100/day$2,976
New model API (day 1)$0.005/call$0.02/call5K calls/day$2,250

Every row is something you can build in an afternoon. Every row is passive income.

Passive Income — Build Once, Earn Forever

This isn't trading hours for dollars. You build it once. It runs forever.

  • Your published tool runs 24/7 without you
  • Credits checked and deducted automatically — you don't manage billing
  • Auto-scales — GPUs spin up when calls come in, spin down when idle
  • Zero cost when nobody's calling — serverless means you only pay when revenue comes in
  • Update anytime — edit your notebook, re-publish, the endpoint updates

Stack them. Your first tool makes $200/month. Your second makes $500. Your third makes $1,000. By tool five you're at $3,000/month passive. Each one took an afternoon. No boss. No schedule. No commute. No cap on earnings.

Web 3.0 — The Compute Web

Forget what you've heard about Web3. This is what Web 3.0 actually is:

  • Web 1.0 — the read web. Static pages. You consumed content.
  • Web 2.0 — the social web. People interacting with people. You posted, others responded.
  • Web 3.0 — the compute web. Machines interacting with machines. AI agents calling other AI agents. Services processing, deciding, acting — without a human clicking anything.

The websites of Web 3.0 aren't pages you visit. They're endpoints you call. They're agents. They're pipelines. And they charge per call.

Every headless tool you publish on SeqPU is a Web 3.0 service. It exists on the internet, it responds to requests, it charges money, and it runs without you. That's not a side project. That's a business.

We Are Here With You

This is new for everyone. Nobody was born knowing how to build AI agents. We are building this community from the ground up — hackathons, shared tools, open examples, real support. The people in this community right now are the early ones. The ones who will look back and say "I was there before it was obvious."

If you're reading this and you feel it — not just understand it, but feel it — this is your moment. The normal person can now play the same games the hyperscalers play. You just need to know what to do with the door.

Start Now

Open the notebook. Describe what you want to Claude. Paste the code. Hit Run All. Click Publish. Share the link. Start charging.

That's the whole path. No servers. No DevOps. No infrastructure. No token bills. Just ideas, compute, and you.

The window is open
Every wave creates millionaires. Websites in 1996. Apps in 2008. SaaS in 2012. Agents and AI tools are in that window right now. The technology works. The cost is dirt cheap. The demand is exploding. Build something. Publish it. See what happens. The worst case is you learn. The best case is you build something people pay for every month while you sleep. There is no risk. There is only upside.
Every example in these docs
Pick any example from any section. Copy the code. Paste it. Run All. Publish. Connect a Telegram bot or share the URL. You just built a product. Start charging.
← Previous
Agents & Text to Compute
10 — I Have No Idea

I have no idea
where to start.

You don't need to know how to code. You don't need a business plan. You don't need to understand GPUs. Pick one of these. Copy the code. Hit Run All. See what happens. Then hit Publish — and it's live for the world.

T47B
~$0.002/doc
Summarize that 40-page PDF, then email the summary to your team
Drop a PDF in. Model reads it, writes the summary, emails it to your team automatically. One click. ChatGPT can summarize — it can't email. That's the difference between a chatbot and a system. Hit Publish → UI Site. Now your entire company drops PDFs in and gets summaries. Charge departments $29/month. Or publish as Headless URL — other companies call your API. You're running a document intelligence service.
summarize_and_email.py
from vllm import LLM, SamplingParams
from pathlib import Path
import smtplib, os
from email.mime.text import MIMEText

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=1024)

# Step 1: Read and summarize
doc = Path("/data/report.pdf").read_text()
result = llm.generate([f"Summarize this document in 5 key bullet points:\n{doc[:8000]}"], params)
summary = result[0].outputs[0].text

# Step 2: Email it
msg = MIMEText(f"Here's your summary:\n\n{summary}")
msg["Subject"] = "Document Summary - Ready to Review"
msg["From"] = os.environ["EMAIL"]
msg["To"] = "[email protected]"

with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
    server.login(os.environ["EMAIL"], os.environ["EMAIL_PASS"])
    server.send_message(msg)
print("Summary sent to team")
CPUCron
~$0/hr
Monitor your kid's school website for updates, text you
CPU checks the school website every hour for basically nothing. New assignment? Snow day? Permission slip due? Model reads what changed and sends you a Telegram alert. You never miss another form. Hit Publish → share the URL with every parent in the class. Charge $3/month. 200 parents at one school = $600/month. Expand to 10 schools = $6,000/month. The script is identical — just change the URL.
school_monitor.py
import requests, os, hashlib
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
SCHOOL_URL = "https://your-school.edu/announcements"

# Step 1: Fetch page
page = requests.get(SCHOOL_URL, timeout=10).text
new_hash = hashlib.md5(page.encode()).hexdigest()
old_hash = open("/data/last_hash.txt", "r").read().strip() if os.path.exists("/data/last_hash.txt") else ""

# Step 2: If changed, analyze what's new
if new_hash != old_hash:
    result = llm.generate([f"What's new on this school page? Summarize any new announcements, assignments, or deadlines:\n{page[:4000]}"], params)
    requests.post(WEBHOOK, json={"text": f"School update:\n{result[0].outputs[0].text}"})
    open("/data/last_hash.txt", "w").write(new_hash)
T47BSearch
~$0.01/trip
Find cheapest flight, check reviews, build your trip
Dates and destination in. Step 1 searches flights. Step 2 reads reviews of that airline on that route. Step 3 checks weather and generates a packing list + day-by-day itinerary. Three steps, one input. Travel agents charge $200 for worse. Hit Publish → UI Site. Charge $5/trip plan. Share on travel Facebook groups. 20 plans/day = $100/day passive income.
trip_planner.py
import requests, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=2048)

dest = "Tokyo"
days = 7
budget = "$3000"
interests = "food, temples, anime, nightlife"

# Step 1: Search for flight info
flights = requests.post("https://google.serper.dev/search",
    json={"q": f"cheapest flights to {dest} from LAX next month"},
    headers={"X-API-KEY": os.environ["SERPER_KEY"]}).json()
flight_info = "\n".join([r["snippet"] for r in flights.get("organic", [])[:3]])

# Step 2: Get weather + build full plan
result = llm.generate([f"""Plan a {days}-day trip to {dest}.
Budget: {budget}
Interests: {interests}
Flight info found:\n{flight_info}

Generate:
1. Day-by-day itinerary with specific places
2. Restaurant recommendations for each day
3. Transportation tips
4. Packing list based on weather
5. Budget breakdown"""], params)

with open("/outputs/trip_plan.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Trip plan ready")
T47B
~$0.005/memo
Voice memo → blog post → 10 social posts
Record a 5-minute voice memo on your phone. Upload. Step 1 transcribes it. Step 2 rewrites as a polished blog post. Step 3 chops into 10 tweets, 3 LinkedIn posts, 2 Instagram captions. One rambling voice memo becomes a week of content. Hit Publish → Headless URL. Sell to content creators at $49/month — they upload voice memos, get content back. 100 creators = $4,900/month.
voice_to_content.py
import whisper
from vllm import LLM, SamplingParams

# Step 1: Transcribe
model_whisper = whisper.load_model("base")
transcript = model_whisper.transcribe("/data/voice_memo.m4a")["text"]
print(f"Transcribed: {len(transcript)} chars")

# Step 2: Blog post
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=2048)
blog = llm.generate([f"Rewrite this voice memo as a polished blog post with a title, subheadings, and clean paragraphs:\n{transcript}"], params)

# Step 3: Social posts
social = llm.generate([f"From this blog post, generate:\n- 10 tweets (under 280 chars each)\n- 3 LinkedIn posts\n- 2 Instagram captions\n\nBlog:\n{blog[0].outputs[0].text[:3000]}"], params)

with open("/outputs/blog.txt", "w") as f:
    f.write(blog[0].outputs[0].text)
with open("/outputs/social.txt", "w") as f:
    f.write(social[0].outputs[0].text)
print("Blog + 15 social posts generated from one voice memo")
CPUCron
~$0.001/check
Watch your competitor's website, tell you what changed
CPU checks their site daily. Model diffs old vs new. Analysis step explains what the change means for YOUR business. "They added a $49/month tier targeting small teams. This undercuts your starter plan by $10." Not just alerts — intelligence. Hit Publish → sell competitive monitoring to businesses at $99/month per competitor tracked. 50 businesses watching 2 competitors = $9,900/month.
competitor_watch.py
import requests, os, difflib
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
URL = "https://competitor.com/pricing"

# Step 1: Fetch current page
current = requests.get(URL, timeout=10).text
prev_file = Path("/data/competitor_last.txt")
previous = prev_file.read_text() if prev_file.exists() else ""

# Step 2: Diff
if current != previous:
    diff = "\n".join(difflib.unified_diff(previous.split("\n")[:50], current.split("\n")[:50], lineterm=""))
    
    # Step 3: Analyze what changed and why it matters
    result = llm.generate([f"A competitor's website changed. Analyze what's different and what it means:\n{diff[:3000]}"], params)
    requests.post(WEBHOOK, json={"text": f"Competitor change detected:\n{result[0].outputs[0].text[:500]}"})
    prev_file.write_text(current)
CPU + 7BCron
~$0.003/day
Read every review, draft responses, track sentiment
Step 1 scrapes new reviews. Step 2 drafts personalized responses for each one. Step 3 scores sentiment and tracks it over time. Step 4 sends a weekly report — "Sentiment up 12% since you started responding. Wait time complaints down 40%." That's not a chatbot, that's a reputation management system. Hit Publish → sell to restaurants, dentists, salons at $79/month. Walk into any local business that has unanswered Google reviews. They all do.
review_system.py
import json, requests, os
from vllm import LLM, SamplingParams
from datetime import datetime

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

reviews = json.load(open("/data/new_reviews.json"))
sentiment_log = json.load(open("/data/sentiment_log.json")) if os.path.exists("/data/sentiment_log.json") else []

for review in reviews:
    # Step 1: Draft response
    resp = llm.generate([f"Write a professional response to this {review['stars']}-star review: {review['text'][:500]}"], params)
    print(f"{'⭐' * review['stars']} Response: {resp[0].outputs[0].text[:200]}")
    
    # Step 2: Score sentiment
    score = llm.generate([f"Score sentiment 1-10: {review['text'][:300]}\nReturn just the number."], params)
    sentiment_log.append({"date": datetime.now().isoformat(), "score": score[0].outputs[0].text.strip(), "stars": review["stars"]})

# Step 3: Save tracking data
json.dump(sentiment_log, open("/data/sentiment_log.json", "w"))
print(f"Processed {len(reviews)} reviews, {len(sentiment_log)} total tracked")
7BCPU
~$0.002/plan
Scan your fridge, plan meals, generate the grocery list
List what's in your fridge. Step 1 plans 7 dinners using what you have. Step 2 identifies what's missing. Step 3 generates a grocery list grouped by aisle. Step 4 estimates total cost. Four steps from one input. Hit Publish → UI Site. Charge $4.99/month. Every parent hates meal planning. Market on mom Facebook groups, TikTok cooking community. 2000 subscribers = $9,980/month.
meal_planner.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=2048)

fridge = "chicken thighs, rice, broccoli, eggs, butter, garlic, soy sauce, pasta, canned tomatoes, onions, cheese, tortillas"
people = 4
restrictions = "no shellfish"

result = llm.generate([f"""I have these ingredients: {fridge}
Cooking for: {people} people
Restrictions: {restrictions}

Generate:
1. 7 dinner recipes using ONLY these ingredients where possible
2. For each meal: name, cook time, difficulty (easy/medium)
3. List of missing ingredients needed
4. Grocery list grouped by store aisle
5. Estimated grocery cost for missing items"""], params)

with open("/outputs/meal_plan.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Week of meals planned")
14BA100 80GBUI Site
~$0.02/session
Homework helper — explains, practices, grades, repeats
Not a one-shot answer. A learning SEQUENCE. Step 1 explains the concept. Step 2 generates 3 practice problems. Step 3 grades their answers. Step 4 re-explains what they got wrong. Loop until they understand. That's teaching, not answering. Hit Publish → UI Site. Charge parents $9.99/month per subject. 500 families = $4,995/month. Build 5 subjects = $24,975/month. No tutors hired.
homework_tutor.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=1024)

question = "How do I solve 3x + 7 = 22?"
subject = "Algebra"

# Step 1: Explain
explanation = llm.generate([f"You are a patient {subject} tutor. Explain how to solve this step by step. Don't give the final answer — guide them:\n{question}"], params)
print("EXPLANATION:", explanation[0].outputs[0].text)

# Step 2: Practice problems
problems = llm.generate([f"Generate 3 similar practice problems to: {question}\nJust the problems, no answers."], params)
print("\nPRACTICE:", problems[0].outputs[0].text)

# Step 3: Grade (student submits answers)
student_answers = "1) x=4  2) x=7  3) x=3"
grading = llm.generate([f"Grade these answers:\nProblems: {problems[0].outputs[0].text}\nStudent answers: {student_answers}\n\nFor each: correct/incorrect, show the right answer, explain the mistake if wrong."], params)
print("\nGRADING:", grading[0].outputs[0].text)
CPU + 7BCron
~$0.005/day
Track jobs, match resume, rewrite resume, draft cover letter — while you sleep
CPU scans job boards daily. Model scores relevance to YOUR skills. For top matches, rewrites your resume targeted to THAT specific job. Then writes the cover letter. You wake up to 3 ready-to-submit applications. Every morning. Hit Publish → sell to job seekers at $29/month. Everyone job hunting needs this. 1000 subscribers = $29,000/month. Market on LinkedIn, r/jobs, career coaching communities.
job_pipeline.py
import json, os, requests
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.4, max_tokens=2048)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

my_resume = Path("/data/my_resume.txt").read_text()
jobs = json.load(open("/data/scraped_jobs.json"))

applications = []
for job in jobs[:20]:
    # Step 1: Score match
    score = llm.generate([f"Score this job match 0-100 for this resume. Return just the number.\nResume: {my_resume[:1500]}\nJob: {job['title']} - {job['desc'][:500]}"], params)
    if int(score[0].outputs[0].text.strip()) < 70: continue
    
    # Step 2: Rewrite resume for this job
    resume = llm.generate([f"Rewrite this resume to target this specific job. Match keywords naturally.\nResume: {my_resume[:2000]}\nJob: {job['desc'][:1000]}"], params)
    
    # Step 3: Write cover letter
    cover = llm.generate([f"Write a cover letter for {job['title']} at {job['company']}. Reference their specific work. Under 250 words.\nResume: {my_resume[:1000]}\nJob: {job['desc'][:800]}"], params)
    
    applications.append(f"{job['title']} at {job['company']}")

if applications:
    requests.post(WEBHOOK, json={"text": f"Good morning! {len(applications)} applications ready:\n" + "\n".join(applications)})
Any
$0
Take any of these and publish it — you just built a business
Everything above? Hit Publish. Choose Headless URL (other apps call it) or UI Site (people visit it). Add a markup — 10%, 20%, whatever you want. You're now charging per use. No server to manage. No code to deploy. No App Store approval. SeqPU handles the compute, the hosting, the billing. You built a product. A real product that people pay for. From code you copied and pasted. The resume writer alone — charge $15 on Fiverr, costs you half a penny. Do the math.
publish_it.py
# This isn't a script. This is the moment.
#
# 1. Pick any example from this page
# 2. Paste the code into a SeqPU notebook cell
# 3. Hit Run All — watch it work
# 4. Hit Publish → choose UI Site or Headless URL
# 5. Set your markup (10-30%)
# 6. Share the link
#
# You're now running a business.
#
# The meal planner? $4.99/month. 2000 users = $9,980/month.
# The resume rewriter? $15/rewrite on Fiverr. Cost: $0.005.
# The review responder? $79/month per business. 30 businesses = $2,370/month.
# The job pipeline? $29/month. 1000 job seekers = $29,000/month.
#
# The code is the same whether 1 person uses it or 10,000.
# The GPU scales automatically.
# You never touch a server.
#
# What are you waiting for?
← Previous
Agents & Text to Compute
11 — Business Tools

Business Tools

Business Tools — Low Hanging Fruit
CPU7B
~$0.001/meeting
Meeting notes → action items with owners
Paste a transcript. Get structured action items, owners, and deadlines. Replaces Otter.ai, Fireflies, and every meeting SaaS tool.
meeting_actions.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=1024)
transcript = Path("/data/meeting.txt").read_text()
result = llm.generate([f"""Extract all action items.
Format: [OWNER] - [ACTION] - [DEADLINE if mentioned]
Transcript: {transcript[:6000]}"""], params)
print(result[0].outputs[0].text)
T47B
~$0.003/video
YouTube summarizer — simple version
Paste a transcript. Get chapters, key points, and a summary. One cell, replaces any SaaS tool you're paying for.
youtube_simple.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=1024)
transcript = Path("/data/transcript.txt").read_text()
result = llm.generate([f"""Analyze this YouTube transcript:
1. Chapters with timestamps
2. 5 key takeaways
3. One paragraph summary
Transcript: {transcript[:8000]}"""], params)
print(result[0].outputs[0].text)
The CascadeCPU → T4 → 32B
~$0.001 most videos
YouTube summarizer — cascade version
CPU watches your subscriptions for new uploads. T4 handles standard summaries. A 32B specialist only activates for videos in your exact domain — with full context about what you're building, what you've already learned, and what contradicts your existing knowledge base. A private research assistant watching the internet for you.
youtube_cascade.py
# Layer 1: CPU watches RSS feeds — almost free
import feedparser, requests, os

MY_DOMAINS = ["GPU compute", "AI infrastructure", "LLM inference"]
LAYER2 = os.environ["LAYER2_ENDPOINT"]

feed = feedparser.parse("https://youtube.com/feeds/videos.xml?channel_id=YOUR_ID")
for entry in feed.entries[:10]:
    if any(d.lower() in entry.title.lower() for d in MY_DOMAINS):
        requests.post(LAYER2, json={"url": entry.link, "title": entry.title})
# Layer 2 — T4 transcribes + summarizes
# Layer 3 — 32B specialist with your full knowledge base
← Previous
Agents & Text to Compute
12 — Infrastructure

Infrastructure

CPU OnlyCron
~$0.001/run
Cron job with safe credentials
Save API keys in Secrets, set a schedule, let a script run unattended while you sleep. The only way to have a truly private 24/7 running process — credentials never touch an external service.
cron_job.py
# Schedule: "0 7 * * *" in Publish → Schedule
# Add keys in Secrets panel — never in code
import os
from vllm import LLM, SamplingParams
from pathlib import Path

api_key = os.environ["MY_API_KEY"]
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=512)
data = Path("/data/input.txt").read_text()
result = llm.generate([f"Process this: {data}"], params)
Path("/data/output.txt").write_text(result[0].outputs[0].text)
← Previous
Business Tools
13 — Agents

Agents

14BA100 80GBSchedule
~$0.02/run
14B email agent — reads, drafts, sends on approval
Runs on a schedule. Reads your inbox, drafts replies in your voice, saves them for approval. Completely private — your emails never touch an external AI API.
email_agent.py
import imaplib, email, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=1024)
SYSTEM = """Draft email replies on behalf of [Your Name].
Tone: direct, friendly, gets to the point fast.
Flag anything needing human judgment."""

mail = imaplib.IMAP4_SSL("imap.gmail.com")
mail.login(os.environ["EMAIL"], os.environ["EMAIL_PASS"])
mail.select("inbox")
_, msgs = mail.search(None, "UNSEEN")
for uid in msgs[0].split()[:10]:
    _, data = mail.fetch(uid, "(RFC822)")
    body = email.message_from_bytes(data[0][1]).get_payload(decode=True).decode()
    draft = llm.generate([f"{SYSTEM}\n\nEmail:\n{body[:2000]}"], params)
    print(draft[0].outputs[0].text)
SeqPU Pattern10× 3BMulti-model
~$0.005/query
10× 3B expert system — no training, just job separation
Ten 3B models, each owning a domain. A coordinator routes the question to the right specialist. Expert-level outputs at a fraction of one large model. A SeqPU-native pattern — impossible to do economically anywhere else.
expert_system.py
from vllm import LLM, SamplingParams
import json

SPECIALISTS = {
    "legal":   "You are a legal expert. Answer only legal questions...",
    "finance": "You are a financial analyst. Answer only finance questions...",
    "medical": "You are a medical expert. Answer only medical questions...",
    "tech":    "You are a senior engineer. Answer only technical questions...",
}
ROUTER = """Which domain? legal, finance, medical, tech
Return JSON: {"domain": str}"""

coordinator = LLM(model="Qwen/Qwen3-3B")
specialist  = LLM(model="Qwen/Qwen3-3B")
params = SamplingParams(max_tokens=512)

question = "What are the tax implications of issuing employee stock options?"
route  = coordinator.generate([f"{ROUTER}\nQuestion: {question}"], params)
domain = json.loads(route[0].outputs[0].text)["domain"]
answer = specialist.generate([f"{SPECIALISTS[domain]}\nQ: {question}"], params)
print(f"[{domain.upper()}]", answer[0].outputs[0].text)
← Previous
Infrastructure
14 — Live Feeds & IOT

Live Feeds & IOT

Two modes
Live GPU: feed comes in, GPU processes in real time. CPU Storage + Async: feed stored cheaply on CPU, GPU spins up later. SeqPU StateSave makes 24/7 always-on inference economically viable.
CPU24/7
~$0/hr idle
Baby monitor — sound detection + alert
CPU listens to the audio feed continuously. Detects sounds above a decibel threshold. Pings you via Telegram instantly. Only escalates to GPU if it needs to classify the type of distress.
baby_monitor.py
import pyaudio, numpy as np, requests, os

WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
THRESHOLD = 65  # decibels

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1,
                 rate=44100, input=True, frames_per_buffer=1024)
while True:
    data = np.frombuffer(stream.read(1024), dtype=np.int16)
    db = 20 * np.log10(np.sqrt(np.mean(data**2) + 1e-10))
    if db > THRESHOLD:
        requests.post(WEBHOOK, json={"text": f"Alert: {db:.0f}dB"})
The CascadeCPU → T4 → H100
~$0.001 per 99% of events
Trail cam / security camera — cascading intelligence
CPU watches the feed 24/7 for almost nothing. T4 classifies what triggered. If T4 isn't confident, the specialist model with full domain context resolves it. You only pay heavy compute for the 1% of events that actually need it.
trail_cam.py
# Layer 1 — CPU watcher (always on, always cheap)
import cv2, requests, os

LAYER2 = os.environ["LAYER2_ENDPOINT"]
cap = cv2.VideoCapture("/data/feed.mp4")  # or RTSP URL
prev = None
while True:
    ret, frame = cap.read()
    if not ret: break
    if prev is not None:
        diff = cv2.absdiff(prev, frame).mean()
        if diff > 0.15:
            requests.post(LAYER2, json={"event": "motion"})
    prev = frame

# Layer 2 — T4 classifier (wakes on demand)
# Domain context — not a naive general model
# Classifies: deer, elk, bear, human, vehicle, unknown
# Escalates to Layer 3 if confidence < 0.8
← Previous
Agents
15 — Creative

Creative

H100Diffusion
~$0.10/video
Image generation loop → FX → observer model → video
Generate frames with a diffusion model, apply effects, blend them together, run an observer model that scores each frame and rejects bad ones, then assemble into a final video. A full creative pipeline that would cost a fortune via API — cheap on your own compute.
creative_pipeline.py
from diffusers import StableDiffusionPipeline
from vllm import LLM, SamplingParams
import torch, json

pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
).to("cuda")

observer = LLM(model="Qwen/Qwen3-7B-VL")
obs_params = SamplingParams(max_tokens=50)

prompts = [
    "a futuristic city at dawn, cinematic",
    "the same city at noon, golden hour",
    "the city at night, neon reflections",
]

good_frames = []
for prompt in prompts:
    image = pipe(prompt, num_inference_steps=30).images[0]
    score_raw = observer.generate(["Score 1-10 for quality"], obs_params)
    score = json.loads(score_raw[0].outputs[0].text)["score"]
    if score >= 7:
        good_frames.append(image)
print(f"Kept {len(good_frames)}/{len(prompts)} frames")
H100Audio + Video
~$0.20/production
Audio generation + video assembly
Generate a voiceover from a script, generate background music, mix them together, then sync to your video frames. Full end-to-end media production pipeline — no Adobe, no Premiere, no subscription.
audio_video_pipeline.py
from TTS.api import TTS
from pathlib import Path
import subprocess

# Generate voiceover from script
tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC")
script = Path("/data/script.txt").read_text()
tts.tts_to_file(text=script, file_path="/data/voiceover.wav")

# Assemble frames + audio into final video via ffmpeg
subprocess.run([
    "ffmpeg",
    "-framerate", "24",
    "-i", "/data/frames/frame_%04d.png",
    "-i", "/data/voiceover.wav",
    "-c:v", "libx264",
    "-c:a", "aac",
    "-shortest",
    "/data/output/final.mp4"
])
print("Video assembled at /data/output/final.mp4")
← Previous
Live Feeds & IOT
16 — Money Makers

Money Makers
Start a Business Today

Every example below is a starter business. Copy the code, hit Run All, publish it, start charging. The math is in every description.

14BA100 80GB
~$0.03/article
AI content writer service
Build a content writing API in one cell. Publish as headless endpoint with 15% markup. List on Fiverr as "AI-powered blog writing — 2000 words in 5 minutes." Charge $25-50/article. Your cost is $0.03. At 10 orders/day that's $250-500/day profit. The model writes better SEO content than 90% of freelancers because it follows the exact brief every time. Scale to 100 orders/day without hiring anyone. Sell on: Fiverr, Upwork, your own site via Stripe, or white-label to agencies.
content_writer.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=4096)

topic = "10 Ways to Improve Your Morning Routine"
tone = "conversational, actionable, SEO-optimized"

result = llm.generate([f"""Write a 2000-word blog post.
Topic: {topic}
Tone: {tone}
Include: intro hook, subheadings, bullet points, conclusion with CTA
Optimize for SEO with natural keyword placement."""], params)

with open("/outputs/article.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Article written:", len(result[0].outputs[0].text), "chars")
H100Diffusion
~$0.10/generation
Logo generator service
Client describes their business, gets 5 logo concepts with variations. Publish as UI site with password protection. Charge $50-150 per logo package. Costs $0.10 to generate. That's 99.9% margin. Sell on: Fiverr ($50-100 gigs), 99designs alternative, your own branded site. At 5 clients/day at $75 each = $375/day, $11K/month. You're a design agency with zero designers.
logo_generator.py
from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
).to("cuda")

business = "Modern coffee shop called Brew Lab"
styles = ["minimal flat", "vintage hand-drawn", "geometric abstract",
          "lettermark bold", "mascot playful"]

for i, style in enumerate(styles):
    prompt = f"Professional logo design for {business}, {style} style, white background, vector clean"
    image = pipe(prompt, num_inference_steps=30).images[0]
    image.save(f"/outputs/logo_{i+1}_{style.replace(' ','_')}.png")
    print(f"Generated: {style}")
print("5 logo concepts saved to /outputs/")
7BT4
~$0.005/rewrite
Resume rewriter
Client pastes their resume + the job posting they're applying to. Model rewrites the resume targeted to that specific job — keywords matched, experience reframed, format cleaned. Charge $15-25 per rewrite. Cost: half a penny. Sell on: Fiverr (search "resume rewrite" — thousands of buyers daily), LinkedIn services, your own landing page. At 20 rewrites/day at $20 each = $400/day. Hire a VA to handle customer service for $5/hr and you never touch it.
resume_rewriter.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=2048)

resume = Path("/data/resume.txt").read_text()
job_posting = Path("/data/job_posting.txt").read_text()

result = llm.generate([f"""Rewrite this resume to perfectly target the job posting below.
- Match keywords from the posting naturally
- Reframe experience to highlight relevant skills
- Keep it honest — enhance, don't fabricate
- Professional format, clean and scannable

RESUME:
{resume[:3000]}

JOB POSTING:
{job_posting[:2000]}"""], params)

with open("/outputs/rewritten_resume.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Resume rewritten and saved")
14BA100 80GB
~$0.02/listing
Product description writer
Ecommerce sellers paste product details, get SEO-optimized listings for Amazon, Etsy, Shopify, eBay. Publish as headless API. Charge $5-10 per listing or $99/month unlimited. Cost: $0.02 per listing. Sell to: Amazon FBA sellers (millions of them), Shopify store owners, dropshippers. One seller with 500 products = $2,500-5,000 one-time job. Find them in Amazon seller Facebook groups, r/FulfillmentByAmazon, Shopify forums.
product_descriptions.py
from vllm import LLM, SamplingParams
import json

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=1024)

product = {
    "name": "Bamboo Cutting Board Set",
    "features": "3 sizes, juice groove, non-slip feet, organic bamboo",
    "target": "home cooks, gift buyers",
    "platform": "Amazon"
}

result = llm.generate([f"""Write a {product['platform']} product listing.
Product: {product['name']}
Features: {product['features']}
Target buyer: {product['target']}

Include: SEO title (under 200 chars), 5 bullet points, description paragraph.
Use natural keywords. No keyword stuffing."""], params)

print(result[0].outputs[0].text)
H100TTS
~$0.15/minute
Voice cloning service
Client uploads 30 seconds of their voice. Model clones it. They can generate unlimited voiceovers — for YouTube, podcasts, courses, audiobooks. Charge $1-3 per minute of generated audio or $49/month subscription. Cost: $0.15/minute. Sell to: YouTubers, course creators, podcast producers, audiobook narrators. A course creator needs 10 hours of audio = $600-1800. Sell on: your own site, Fiverr, direct outreach to online course platforms.
voice_clone.py
from TTS.api import TTS
from pathlib import Path

# Clone voice from sample
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")

voice_sample = "/data/client_voice_sample.wav"
script = Path("/data/script.txt").read_text()

tts.tts_to_file(
    text=script,
    file_path="/outputs/voiceover.wav",
    speaker_wav=voice_sample,
    language="en"
)
print(f"Voiceover generated: {len(script)} chars of script")
14BA100 80GBUI Site
~$0.02/session
AI tutor marketplace
Build subject-specific tutors — math, Spanish, chemistry, SAT prep. Each one is a published UI site. Charge $5-15/session or $29/month per subject. Cost: $0.02 per conversation. Sell to: parents (Facebook groups, Nextdoor, school newsletters), college students (Reddit, Discord servers), homeschool communities. 200 subscribers at $29/month = $5,800/month. Build 10 subjects and you have a tutoring company with no tutors.
ai_tutor.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=2048)

SUBJECT = "Algebra 1"
SYSTEM = f"""You are a patient, encouraging {SUBJECT} tutor.
- Explain concepts step by step
- Use simple language, then build complexity
- Give practice problems after each concept
- If the student is wrong, guide them — don't give the answer
- Celebrate when they get it right"""

student_question = "I don't understand how to solve 2x + 5 = 13"

result = llm.generate([f"{SYSTEM}\n\nStudent: {student_question}"], params)
print(result[0].outputs[0].text)
7BT4
~$0.005/speech
Wedding speech writer
Describe your relationship to the couple, share some stories and inside jokes. Model writes a speech that's funny, touching, and the right length. Charge $15-30/speech on Fiverr — search "wedding speech" and see the demand. Wedding season is printing money. Cost: half a penny. At 10 orders/day during peak season at $25 each = $250/day. Sell on: Fiverr, Etsy (digital products), your own landing page, wedding planning Facebook groups.
wedding_speech.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=1024)

details = {
    "role": "Best man",
    "couple": "Jake and Sarah",
    "relationship": "Jake's college roommate, 8 years",
    "stories": "Road trip where we got lost in Montana, Jake's terrible cooking phase",
    "tone": "funny but heartfelt, not roast-level",
    "length": "3-4 minutes"
}

result = llm.generate([f"""Write a {details['role']} speech for {details['couple']}'s wedding.
Relationship: {details['relationship']}
Stories to include: {details['stories']}
Tone: {details['tone']}
Length: {details['length']}

Structure: opening hook, 1-2 stories, transition to sincerity, toast."""], params)
print(result[0].outputs[0].text)
7BCPU
~$0.001/note
Thank you note writer
Paste what the gift was and who gave it, get a personalized thank you note. Not generic — references the specific gift and the relationship. Sell as UI site for $1/note or $9.99/month unlimited. Post-wedding, post-baby-shower, post-graduation — people need 50+ at once. One bride with 200 guests = $200. Sell on: Etsy, wedding planning sites, Gumroad.
thank_you_notes.py
from vllm import LLM, SamplingParams
import json

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=256)

gifts = json.load(open("/data/gift_list.json"))
for gift in gifts:
    result = llm.generate([f"""Write a warm, personal thank you note.
From: Sarah and Jake
To: {gift['from']}
Gift: {gift['item']}
Relationship: {gift['relationship']}

Rules: mention the specific gift, reference the relationship,
keep under 80 words, sound genuine not generic."""], params)
    print(f"To {gift['from']}:")
    print(result[0].outputs[0].text)
    print("---")
7BT4
~$0.003/listing
Clothing listing optimizer
Resellers on Poshmark, Depop, ThredUp, eBay paste item details — brand, condition, size. Model writes listings that actually sell — SEO keywords, compelling descriptions, pricing suggestions based on sold comps. Charge $0.50/listing or $29/month unlimited. A reseller with 500 items = $250. Find them on r/poshmark, r/Depop, reseller Facebook groups. Millions of resellers worldwide need better listings.
listing_optimizer.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=512)

item = {
    "brand": "Lululemon",
    "type": "Align High-Rise Pant 25\"",
    "size": "6",
    "color": "Dark Olive",
    "condition": "Like new, worn twice",
    "flaws": "None",
    "platform": "Poshmark"
}

result = llm.generate([f"""Write a {item['platform']} listing that sells.
Brand: {item['brand']}
Item: {item['type']}
Size: {item['size']}, Color: {item['color']}
Condition: {item['condition']}
Flaws: {item['flaws']}

Include: SEO title (what buyers search), detailed description,
measurements reminder, styling suggestion. Make them click Buy Now."""], params)
print(result[0].outputs[0].text)
14BA100 80GB
~$0.02/claim
Insurance claim writer
Describe what happened — car accident, water damage, theft, medical. Model writes a properly formatted insurance claim that maximizes your payout. Adjusters respond to well-written claims faster and settle for more. Charge $25/claim. People don't know how to write these and lose thousands. Sell on: your own site, legal forums, r/insurance, Facebook groups for car accidents/home damage.
insurance_claim.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=2048)

incident = {
    "type": "Water damage - burst pipe",
    "date": "March 15, 2026",
    "location": "Kitchen and living room",
    "damage": "Hardwood floors warped, drywall water stained, cabinet bottoms swollen",
    "actions_taken": "Shut off water main, called plumber, took photos",
    "estimated_cost": "$8,000-12,000"
}

result = llm.generate([f"""Write a formal insurance claim letter.
Incident: {incident['type']}
Date: {incident['date']}
Location: {incident['location']}
Damage: {incident['damage']}
Actions taken: {incident['actions_taken']}
Estimated cost: {incident['estimated_cost']}

Include: policy reference placeholder, chronological account,
itemized damage list, documentation references, requested next steps.
Professional tone. Maximize clarity for the adjuster."""], params)
with open("/outputs/claim.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Insurance claim drafted")
← Previous
Creative
17 — Trading & Markets

Trading & Markets
Be Smarter Than the Crowd

Read filings before Bloomberg. Score sentiment before the price moves. Explain options flow in plain English. Your private edge.

32BA100 80GB
~$0.05/transcript
Earnings call analyzer
Feed an earnings call transcript, model extracts sentiment shift, key numbers vs expectations, guidance changes, executive tone, and flags surprises — before the market fully prices it in. Most retail traders can't read a 40-page transcript in time. You can offer this in under 60 seconds. Charge $10/report or $99/month for all companies. Sell to: day traders (StockTwits, Discord trading servers, Twitter/X finance community), small hedge funds. 200 subscribers at $99 = $19,800/month.
earnings_analyzer.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.2, max_tokens=2048)

transcript = Path("/data/earnings_call.txt").read_text()

result = llm.generate([f"""Analyze this earnings call transcript.
Extract:
1. Revenue vs expectations (beat/miss/in-line)
2. EPS vs expectations
3. Guidance changes (raised/lowered/maintained)
4. Sentiment shift from last quarter
5. Key surprises the market hasn't priced in
6. Executive tone (confident/cautious/defensive)
7. Red flags or unusual language

Transcript:
{transcript[:12000]}"""], params)

with open("/outputs/earnings_report.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Earnings analysis complete")
CPU + T4CascadeCron
~$0.001/scan
SEC filing scanner
Scheduled job watches EDGAR for new 8-K, 10-Q, 13-F filings. CPU layer monitors the RSS feed 24/7 for basically nothing. When a new filing drops, T4 reads it, extracts material changes, insider transactions, risk factor updates. Alerts you via Telegram before the news hits Bloomberg. Sell the alert service: $49/month to retail traders. Or just use it yourself — knowing about a filing 30 minutes before everyone else is worth more than any subscription.
sec_scanner.py
import requests, os, feedparser
from vllm import LLM, SamplingParams

EDGAR_RSS = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=8-K&dateb=&owner=include&count=20&search_text=&start=0&output=atom"
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=512)

feed = feedparser.parse(EDGAR_RSS)
for entry in feed.entries[:5]:
    filing_url = entry.link
    resp = requests.get(filing_url, timeout=15)
    if resp.ok:
        result = llm.generate([f"""Read this SEC filing. Extract:
1. Company name and ticker
2. Filing type
3. Material changes or events
4. Is this market-moving? (yes/no and why)

Filing:
{resp.text[:8000]}"""], params)
        analysis = result[0].outputs[0].text
        if "yes" in analysis.lower():
            requests.post(WEBHOOK, json={"text": f"SEC ALERT:\n{analysis[:500]}"})
CPU + 7BCron
~$0.003/scan
Crypto sentiment tracker
Scrape crypto Twitter, Reddit, Telegram channels on a schedule. Model scores sentiment per coin — bullish/bearish/neutral with confidence. Track shifts over time. When sentiment flips, you know before the price moves. Sell as: Telegram bot subscription ($19/month), API for trading bots ($99/month), or dashboard UI site ($49/month). 500 crypto degens at $19/month = $9,500/month.
crypto_sentiment.py
import requests, json, os
from vllm import LLM, SamplingParams
from datetime import datetime

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

coins = ["BTC", "ETH", "SOL", "DOGE"]
for coin in coins:
    # Fetch recent mentions (replace with your scraper)
    resp = requests.get(f"https://your-scraper.com/api/{coin}/mentions?limit=50")
    mentions = resp.json()
    text_block = "\n".join([m["text"][:200] for m in mentions[:20]])
    
    result = llm.generate([f"""Score sentiment for {coin}.
Return JSON: {{"coin": str, "sentiment": "bullish"/"bearish"/"neutral", "confidence": 0-100, "reason": str}}

Recent mentions:
{text_block}"""], params)
    score = json.loads(result[0].outputs[0].text)
    if score["confidence"] > 80:
        requests.post(WEBHOOK, json={"text": f"{coin}: {score['sentiment']} ({score['confidence']}%) - {score['reason']}"})
14BA100 80GB
~$0.03/analysis
Options flow interpreter
Feed unusual options activity data (from CBOE, Unusual Whales, or free sources). Model reads the flow and explains in plain English what the trades suggest — is someone betting on a crash? Loading calls before earnings? Hedging a massive position? Retail traders pay $50-200/month for options flow tools. Build a better one that EXPLAINS the flow, not just shows it. Sell on: your own site, Discord server with paid tier, Twitter/X with premium access.
options_flow.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=1024)

flow_data = Path("/data/options_flow.csv").read_text()

result = llm.generate([f"""Analyze this unusual options activity.
For each significant trade explain:
1. What the trade is (calls/puts, strike, expiry)
2. What it likely means (bullish bet, hedge, income)
3. Why it matters (size relative to open interest)
4. What the trader might know

Be specific. No generic disclaimers.

Flow data:
{flow_data[:6000]}"""], params)

print(result[0].outputs[0].text)
32BA100 80GBThe Cascade
~$0.08/analysis
Polymarket edge finder
Scrape Polymarket odds. Model searches the web for news that hasn't been priced into the prediction market yet. Flags mispriced contracts where the real probability is significantly different from the market odds. Use it yourself — if a contract is priced at 30% but your model's evidence says 70%, that's a 2.3x return. Or sell the analysis as a subscription to Polymarket traders. Even 10% edge compounded across 100 bets is life-changing money.
polymarket_edge.py
import requests, json
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.3, max_tokens=1024)

# Fetch active Polymarket contracts
markets = requests.get("https://gamma-api.polymarket.com/markets?active=true&limit=20").json()

for market in markets:
    question = market["question"]
    current_odds = market["outcomePrices"]
    
    # Search for recent news
    news = requests.get(f"https://google.serper.dev/search",
        json={"q": question, "num": 3},
        headers={"X-API-KEY": os.environ["SERPER_KEY"]}).json()
    snippets = "\n".join([r["snippet"] for r in news.get("organic", [])])
    
    result = llm.generate([f"""Prediction market question: {question}
Current odds: {current_odds}
Recent news:\n{snippets}

Is this market mispriced? What probability would you assign based on the news?
Return JSON: {{"mispriced": bool, "market_odds": float, "your_odds": float, "edge": float, "reasoning": str}}"""], params)
    
    analysis = json.loads(result[0].outputs[0].text)
    if analysis.get("mispriced"):
        print(f"EDGE: {question[:60]} | Market: {analysis['market_odds']} | Yours: {analysis['your_odds']}")
32BA100 80GB
~$0.05/screen
Stock screener with reasoning
Not just "P/E under 15 and revenue growing." The model reads actual financials, understands the business, and EXPLAINS why each stock matches your criteria. "Company X has a P/E of 12 but that's misleading because of a one-time charge — adjusted P/E is 18." Charge $29-99/month. Sell to: retail investors tired of screeners that don't think. Fintwit community, investing subreddits, Seeking Alpha crowd.
smart_screener.py
from vllm import LLM, SamplingParams
import json

llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.3, max_tokens=2048)

criteria = "Value stocks: P/E under 20, revenue growing, positive free cash flow, not financials"
stocks_data = open("/data/stock_fundamentals.json").read()

result = llm.generate([f"""Screen these stocks against the criteria.
For each that passes, explain WHY.
For each that almost passes, explain what disqualifies it.
Look deeper than surface numbers.

Criteria: {criteria}

Stock data:
{stocks_data[:10000]}"""], params)
print(result[0].outputs[0].text)
CPU24/7
~$0.001/scan
Price arbitrage finder
CPU scrapes the same product across 10 stores — Amazon, Walmart, Target, Best Buy, eBay. Finds price differences. Buy low on Store A, sell on Store B. Model filters for real opportunities vs shipping cost traps. Run 24/7 for basically nothing. Some people make $2-5K/month just flipping retail arbitrage. This bot finds the opportunities automatically.
arbitrage_finder.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

products = json.load(open("/data/tracked_products.json"))
for product in products:
    prices = {}
    for store in product["stores"]:
        resp = requests.get(store["url"], headers={"User-Agent": "Mozilla/5.0"}, timeout=10)
        prices[store["name"]] = store["parse_price"](resp.text)
    
    spread = max(prices.values()) - min(prices.values())
    if spread > 10:
        cheapest = min(prices, key=prices.get)
        highest = max(prices, key=prices.get)
        result = llm.generate([f"""Arbitrage opportunity:
Product: {product['name']}
Buy at {cheapest}: ${prices[cheapest]}
Sell at {highest}: ${prices[highest]}
Spread: ${spread}

Is this real? Account for shipping, fees, return risk.
Return JSON: {{"real": bool, "net_profit": float, "risk": str}}"""], params)
        print(json.loads(result[0].outputs[0].text))
CPUCron
~$0.002/scan
Domain name flipper
CPU monitors expiring domains daily — thousands expire every day. Model scores each one for brandability, keyword value, industry relevance, extension quality. Alert you when a $10 domain could sell for $500+. Domain flippers make $500-5000/month buying expired domains and reselling on Afternic, Sedo, or Dan.com. The hard part is finding good ones — this bot does that 24/7.
domain_flipper.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

# Fetch expiring domains (replace with your source)
domains = requests.get("https://member.expireddomains.net/export/expiring/").text.split("\n")

for domain in domains[:100]:
    result = llm.generate([f"""Score this expiring domain for flip potential.
Domain: {domain}

Score 0-100 on: brandability, keyword value, industry relevance,
memorability, extension quality.
Return JSON: {{"domain": str, "score": int, "estimated_value": str, "best_industry": str, "reasoning": str}}"""], params)
    score = json.loads(result[0].outputs[0].text)
    if score.get("score", 0) > 75:
        requests.post(WEBHOOK, json={"text": f"DOMAIN: {domain}\nScore: {score['score']}/100\nValue: {score['estimated_value']}\n{score['reasoning']}"})
← Previous
Money Makers
18 — Freelancer Toolkit

Freelancer Toolkit
Replace Your SaaS Stack

Stop paying monthly for tools you can build in one cell. Proposals, invoices, outreach, contracts — all private, all yours.

14BA100 80GB
~$0.02/proposal
Client proposal generator
Describe the project — "website redesign for a dentist, 6 pages, 4 week timeline." Model writes a professional proposal with scope of work, deliverables, timeline, pricing breakdown, terms. Looks like you spent an hour on it. Took 30 seconds. Stop losing jobs because your proposals suck. Or sell proposal templates/generation as a service to other freelancers — $9/month. Sell on: your own site, Gumroad, freelancer communities.
proposal_generator.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=2048)

project = "Website redesign for a dental practice, 6 pages, modern design, mobile responsive"
timeline = "4 weeks"
rate = "$75/hour"

result = llm.generate([f"""Write a professional client proposal.
Project: {project}
Timeline: {timeline}
Rate: {rate}

Include:
1. Executive summary (2 sentences)
2. Scope of work (detailed deliverables)
3. Timeline with milestones
4. Pricing breakdown by phase
5. Terms (payment schedule, revisions, ownership)

Tone: professional but warm. Make the client feel confident."""], params)

with open("/outputs/proposal.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Proposal generated")
7BCPU
~$0.001/invoice
Invoice generator
Describe the work done, model generates a formatted invoice with line items, totals, payment terms, your branding. Outputs as text or feed into a PDF template. Never pay for FreshBooks or Wave again. Or build it as a UI site and offer free invoicing to freelancers — monetize with premium features. Cost: basically zero. Replace a $15/month SaaS with your own free tool.
invoice_generator.py
from vllm import LLM, SamplingParams
from datetime import datetime

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=1024)

work = "Built 3 landing pages, 2 rounds of revisions, 1 logo concept"
client = "Acme Corp"
rate = 75
hours = 24

result = llm.generate([f"""Generate a professional invoice.
From: [Your Name], [Your Address]
To: {client}
Date: {datetime.now().strftime('%B %d, %Y')}
Due: Net 30

Work performed: {work}
Rate: ${rate}/hr
Hours: {hours}

Format as a clean, professional invoice with line items and total."""], params)

with open("/outputs/invoice.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print(f"Invoice: ${rate * hours} for {client}")
14BA100 80GB
~$0.02/email
Email outreach writer
Paste target company info — their website, what they do, recent news. Model writes a personalized cold email that references THEIR specific situation, not generic "I'd love to connect" garbage. Sell to: sales teams ($49/month for API access), recruiting firms ($99/month), or use it yourself to land clients. A personalized email gets 3-5x the response rate of templates. At $49/month with 100 sales teams = $4,900/month.
cold_email.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=512)

target = {
    "company": "Bloom Dental",
    "what_they_do": "3-location dental practice in Austin",
    "recent_news": "Just opened their third location",
    "your_service": "website design and SEO"
}

result = llm.generate([f"""Write a cold email to {target['company']}.
They are: {target['what_they_do']}
Recent: {target['recent_news']}
You offer: {target['your_service']}

Rules:
- Reference something specific about THEM (not generic)
- One clear value proposition
- One specific CTA (not "let me know if you're interested")
- Under 150 words
- Sound like a human, not a template"""], params)

print(result[0].outputs[0].text)
32BA100 80GB
~$0.04/site
Portfolio site builder
Describe your work, your style, your projects. Model generates a complete portfolio website — HTML, CSS, responsive, professional. Publish directly as a SeqPU UI site or download the HTML. Charge other freelancers $50-200 for a custom portfolio. Cost: $0.04. Sell on: Fiverr, Twitter/X, design communities, r/webdev. Or offer "portfolio in 5 minutes" as a UI site — charge $29 per generation.
portfolio_builder.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.7, max_tokens=8192)

freelancer = {
    "name": "Sarah Chen",
    "role": "Product Designer",
    "projects": ["Fintech app redesign", "E-commerce checkout flow", "SaaS dashboard"],
    "style": "minimal, clean, lots of whitespace",
    "colors": "black, white, one accent color"
}

result = llm.generate([f"""Build a complete portfolio website in HTML + CSS.
Freelancer: {freelancer['name']} — {freelancer['role']}
Projects: {', '.join(freelancer['projects'])}
Style: {freelancer['style']}
Colors: {freelancer['colors']}

Single HTML file with embedded CSS. Responsive. Professional.
Include: hero section, about, projects grid, contact form, footer."""], params)

with open("/outputs/portfolio.html", "w") as f:
    f.write(result[0].outputs[0].text)
print("Portfolio site generated")
32BA100 80GB
~$0.05/review
Contract reviewer
Paste a client contract, model reads every clause, flags risky terms (unlimited revisions, IP assignment, non-compete overreach), suggests specific edits, explains in plain English. Lawyers charge $200-500/hr. Charge $25-50 per review. Sell on Fiverr — "contract review" gigs go for $50-150. Your cost: 5 cents.
contract_reviewer.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.2, max_tokens=2048)

contract = Path("/data/contract.txt").read_text()

result = llm.generate([f"""Review this contract from a freelancer's perspective.
Flag every clause that could hurt the freelancer.

For each issue:
1. Quote the exact clause
2. Explain the risk in plain English
3. Suggest a specific rewrite
4. Rate severity: LOW / MEDIUM / HIGH

Contract:
{contract[:8000]}"""], params)
with open("/outputs/contract_review.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Contract review complete")
7BT4
~$0.005/letter
HOA violation letter responder
Paste the violation letter you received, model drafts a professional response or appeal. Cites common HOA law, addresses the specific violation, maintains a respectful but firm tone. Homeowners get these constantly and pay lawyers $200+ to respond. Charge $10/response. Market on HOA complaint Facebook groups — there are dozens with 50K+ members all furious about their HOA.
hoa_responder.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=1024)

violation = Path("/data/hoa_letter.txt").read_text()

result = llm.generate([f"""Draft a professional response to this HOA violation letter.

Rules:
- Address the specific violation cited
- If disputable, cite common HOA governance rules
- Respectful but firm tone
- Request specific evidence if not provided
- Propose a reasonable resolution timeline
- Never threatening, always professional

Violation letter:
{violation[:3000]}"""], params)
with open("/outputs/hoa_response.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("HOA response drafted")
14BA100 80GB
~$0.02/filing
Small claims court document generator
Describe your dispute — who owes you money, what happened, what evidence you have. Model generates the filing paperwork formatted for your jurisdiction. People avoid small claims because the paperwork is intimidating — this removes that barrier. Charge $15/filing. Market on r/legaladvice, r/smallclaims, local community Facebook groups.
small_claims.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=2048)

dispute = {
    "plaintiff": "Your name",
    "defendant": "Contractor name",
    "amount": "$3,500",
    "state": "California",
    "description": "Paid contractor for bathroom remodel, work never completed, won't return calls",
    "evidence": "Contract, bank statements showing payment, photos of unfinished work, text messages"
}

result = llm.generate([f"""Generate small claims court filing documents for {dispute['state']}.

Plaintiff: {dispute['plaintiff']}
Defendant: {dispute['defendant']}
Amount: {dispute['amount']}
Description: {dispute['description']}
Evidence available: {dispute['evidence']}

Include: claim statement, factual summary, damages calculation,
list of evidence to attach. Format for the court."""], params)
with open("/outputs/small_claims_filing.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Filing documents generated")
← Previous
Trading & Markets
19 — Content Creator

Content Creator
Scale Your Output

One blog post becomes 30 social posts. One topic becomes a full script. One keyword becomes a 2000-word article. Scale without hiring.

14BA100 80GB
~$0.03/script
YouTube script writer
Give it a topic + your channel's style. Model writes a full script with hook (first 30 seconds), chapters, talking points, B-roll suggestions, CTAs, end screen script. Sell script writing as a service to YouTubers — $50-200/script. Or use it yourself and go from 1 video/week to 5. At $100/script and 5 clients = $500/day. Sell on: Fiverr, Twitter/X, YouTube creator Discords, r/NewTubers.
youtube_script.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=4096)

topic = "Why Most People Fail at Meal Prep (And How to Fix It)"
style = "casual, funny, fast-paced, lots of examples"
length = "10 minutes"

result = llm.generate([f"""Write a YouTube script.
Topic: {topic}
Style: {style}
Target length: {length}

Structure:
1. HOOK (first 30 sec) — pattern interrupt, make them stay
2. INTRO — what they'll learn, why it matters
3. CHAPTERS — 3-5 main points with examples
4. B-ROLL SUGGESTIONS in [brackets]
5. CTA — subscribe, comment prompt
6. END SCREEN — what to watch next

Write exactly how someone would SPEAK, not read."""], params)

with open("/outputs/script.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Script written")
7BT4
~$0.005/batch
Social media repurposer
Paste one blog post or video transcript. Get 10 tweets, 3 LinkedIn posts, 2 Instagram captions, 1 TikTok script — all in your voice, all formatted for each platform. Schedule it daily. Charge $99-299/month as a "social media content engine" for businesses. They give you one piece of content per week, you give them 30 pieces of social content. Sell to: small businesses, coaches, consultants, agencies. 20 clients at $199 = $3,980/month.
social_repurposer.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=2048)

content = Path("/data/blog_post.txt").read_text()

result = llm.generate([f"""Repurpose this content for social media.
Generate ALL of these:

## TWITTER (10 tweets)
- Each under 280 chars
- Mix: quotes, stats, hot takes, questions
- Include relevant hashtags

## LINKEDIN (3 posts)
- Professional tone, storytelling format
- 150-300 words each

## INSTAGRAM (2 captions)
- Engaging, emoji-friendly
- Include CTA and hashtags

Original content:
{content[:4000]}"""], params)

with open("/outputs/social_content.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("30 social posts generated from 1 article")
14BA100 80GB
~$0.03/newsletter
Newsletter writer
Feed it your notes + links from the week. Model writes your weekly newsletter in YOUR voice — trained on your past writing style. Send to your list via Mailchimp/Beehiiv/Substack. Charge subscribers $9/month or sell sponsorships at $50-500 per issue. Or sell newsletter writing as a service to other creators — $200-500/month per client. Costs you $0.03 and 5 minutes to review.
newsletter_writer.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=2048)

notes = Path("/data/weekly_notes.txt").read_text()
voice_examples = Path("/data/past_newsletters.txt").read_text()

result = llm.generate([f"""Write this week's newsletter.

My writing style (match this voice):
{voice_examples[:2000]}

This week's notes and links:
{notes[:3000]}

Structure:
- Punchy subject line
- Opening hook (1-2 sentences)
- 3-5 sections with headers
- Each section: insight + link + why it matters
- Closing with personal note
- P.S. with one ask"""], params)

with open("/outputs/newsletter.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Newsletter draft ready for review")
32BA100 80GBSearch
~$0.08/article
SEO article generator
Give it a target keyword. Model searches the web for current top-ranking articles, analyzes what they cover, writes a 2000-word article that covers everything they do PLUS gaps they missed. Proper headers, internal link suggestions, meta description. Sell to: SEO agencies ($25-75/article in bulk), blog owners, affiliate marketers. An SEO agency needs 100 articles/month — that's $2,500-7,500/month from one client. Sell on: SEO agency Slack groups, r/SEO, Upwork.
seo_article.py
from vllm import LLM, SamplingParams
import requests, os

llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.7, max_tokens=4096)

keyword = "best project management tools for small teams"

# Search for top-ranking articles
resp = requests.post("https://google.serper.dev/search",
    json={"q": keyword, "num": 5},
    headers={"X-API-KEY": os.environ["SERPER_KEY"]})
competitors = "\n".join([f"- {r['title']}: {r['snippet']}" for r in resp.json().get("organic", [])])

result = llm.generate([f"""Write a 2000-word SEO article targeting: "{keyword}"

Top-ranking competitors cover:
{competitors}

Your article must:
1. Cover everything competitors cover
2. Add sections they missed
3. Use the keyword naturally (not stuffed)
4. Include H2/H3 subheadings
5. Write a meta description (under 160 chars)
6. Suggest 3 internal link anchor texts"""], params)

with open("/outputs/seo_article.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print(f"SEO article for '{keyword}' complete")
← Previous
Freelancer Toolkit
20 — Automation

Automation
Set It and Forget It

Schedule it. Let it run. Wake up to results. Every script here runs unattended on a cron — your AI workforce that never sleeps.

CPU + 7BCron
~$0.002/briefing
Daily news briefing
Scheduled 6am. Scrapes YOUR industry's news sources — not generic news. Model writes a 5-bullet summary with links. Sends to Telegram, email, or Slack. Pick ANY niche — plumbing industry news, dental practice trends, cryptocurrency regulations, real estate market updates. Charge $9-19/month. Newsletter platforms like Beehiiv make this easy to monetize. 1000 subscribers at $9/month = $9,000/month. You never write a word. Publish on: Substack, Beehiiv, your own email list, Telegram channel.
daily_briefing.py
import requests, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=1024)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

# Scrape industry news (replace with your sources)
sources = [
    "https://your-industry-rss.com/feed",
    "https://another-source.com/api/latest"
]
articles = []
for src in sources:
    resp = requests.get(src, timeout=10)
    articles.extend(resp.json()[:10])

headlines = "\n".join([f"- {a['title']}: {a['summary'][:100]}" for a in articles[:15]])

result = llm.generate([f"""Write a morning industry briefing.
5 bullets. Each bullet: what happened + why it matters.
Link to source. Keep it under 300 words total.

Today's news:
{headlines}"""], params)

briefing = result[0].outputs[0].text
requests.post(WEBHOOK, json={"text": f"Good morning. Here's your briefing:\n\n{briefing}"})
print("Briefing sent")
CPU + 7BCascade
~$0.001/check
Price drop alerter
Monitor product pages on Amazon, Best Buy, Walmart. CPU checks prices hourly. Model analyzes whether a "deal" is actually a deal — detects fake sales (price raised then "discounted"), historical price comparison, review quality check. Alerts you only on real deals. Use it yourself to save money, or build a deals Telegram channel/Twitter account. Deals accounts with 50K+ followers make $2-10K/month from affiliate links. Every "deal" link can be an affiliate link.
price_alerter.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

watchlist = json.load(open("/data/watchlist.json"))
for item in watchlist:
    resp = requests.get(item["url"], headers={"User-Agent": "Mozilla/5.0"}, timeout=10)
    result = llm.generate([f"""Is this a real deal or fake?
Product: {item['name']}
Current price: {item['current_price']}
Historical avg: {item['avg_price']}
Listed discount: {item.get('discount', 'unknown')}

Return JSON: {{"real_deal": bool, "savings_pct": float, "verdict": str}}"""], params)
    analysis = json.loads(result[0].outputs[0].text)
    if analysis.get("real_deal") and analysis.get("savings_pct", 0) > 20:
        requests.post(WEBHOOK, json={"text": f"DEAL: {item['name']} - {analysis['verdict']}"})
CPU + 7BCron
~$0.002/scan
Job posting monitor
Watches LinkedIn, Indeed, company career pages for jobs matching YOUR criteria. Model scores relevance — not just keyword matching but actually reads the job description and compares to your skills/preferences. Sends top 5 matches daily to Telegram. Use it yourself to never miss a perfect job. Or build it as a service — "AI job matcher" at $9/month for job seekers. Sell on: Reddit job boards, LinkedIn, Twitter/X. 500 subscribers = $4,500/month.
job_monitor.py
import requests, os, json
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

my_profile = "Senior frontend developer, React/TypeScript, 5 years, remote only, $150K+ target"
jobs = json.load(open("/data/scraped_jobs.json"))

top_matches = []
for job in jobs[:20]:
    result = llm.generate([f"""Score this job match 0-100.
My profile: {my_profile}
Job: {job['title']} at {job['company']}
Description: {job['desc'][:1000]}

Return JSON: {{"score": int, "reason": str, "red_flags": str}}"""], params)
    score = json.loads(result[0].outputs[0].text)
    if score.get("score", 0) > 75:
        top_matches.append(f"{score['score']}/100 — {job['title']} at {job['company']}\n{score['reason']}")

if top_matches:
    msg = "Today's top job matches:\n\n" + "\n\n".join(top_matches[:5])
    requests.post(WEBHOOK, json={"text": msg})
14BA100 80GB
~$0.02/response
Review responder
Reads new Google/Yelp reviews for a business. Drafts professional responses — thankful for 5-stars, empathetic and solution-oriented for 1-stars. Queues for approval. Sell to: local businesses who never respond to reviews (most of them). Charge $49-99/month per business. A restaurant, dentist, plumber — they all need this. Walk into any local business and pitch "I'll make sure every review gets a professional response within 24 hours." 30 businesses at $79/month = $2,370/month.
review_responder.py
from vllm import LLM, SamplingParams
import json

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=512)

reviews = json.load(open("/data/new_reviews.json"))
business = "Sunrise Dental — family dental practice in Phoenix"

for review in reviews:
    result = llm.generate([f"""Write a response to this review for {business}.

Rating: {review['stars']} stars
Review: {review['text']}

Rules:
- 5 stars: thank them specifically for what they mentioned
- 3-4 stars: thank them, address their concern directly
- 1-2 stars: empathize, apologize, offer to make it right
- Always professional, never defensive
- Under 100 words
- Sign as "The Sunrise Dental Team" """], params)
    print(f"{'⭐' * review['stars']} Response:")
    print(result[0].outputs[0].text)
    print("---")
CPU + 7BCron
~$0.003/check
Inventory reorder bot
Monitors stock levels. Model predicts when to reorder based on sales velocity, seasonal trends, lead times. Sends purchase order drafts for approval. Sell to: small ecommerce sellers who constantly run out of stock or over-order. Charge $49-99/month. An Amazon seller losing $500/month from stockouts will happily pay $99 to fix it.
inventory_bot.py
import json, requests, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

inventory = json.load(open("/data/inventory.json"))
for item in inventory:
    result = llm.generate([f"""Analyze inventory for: {item['name']}
Current stock: {item['stock']} units
Daily sales avg: {item['daily_sales']} units
Lead time: {item['lead_days']} days

Should we reorder? If yes, how many units?
Return JSON: {{"reorder": bool, "quantity": int, "urgency": "low"/"medium"/"high", "reason": str}}"""], params)
    analysis = json.loads(result[0].outputs[0].text)
    if analysis.get("reorder"):
        requests.post(WEBHOOK, json={"text": f"REORDER: {item['name']} - Qty: {analysis['quantity']}"})
CPUCron
~$0.002/night
Airbnb pricing optimizer
CPU scrapes comparable listings in your area daily. Model analyzes demand patterns, local events, seasonal trends, day of week. Tells you exactly what to charge tonight. Most hosts set a flat rate and leave money on the table — $50-200/night during events, $20-30 less on slow weekdays. Charge hosts $29/month. Pays for itself the first night. Sell on: Airbnb host Facebook groups, r/airbnb, BiggerPockets forums.
airbnb_pricer.py
import requests, json, os
from vllm import LLM, SamplingParams
from datetime import datetime

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)

my_listing = {"type": "1BR apartment", "location": "Austin, TX", "base_price": 120}
comps = json.load(open("/data/comparable_listings.json"))
events = json.load(open("/data/local_events.json"))

result = llm.generate([f"""Optimize tonight's Airbnb price.
My listing: {my_listing['type']} in {my_listing['location']}
Base price: ${my_listing['base_price']}
Day: {datetime.now().strftime('%A')}

Comparable listings tonight:
{json.dumps(comps[:10], indent=2)}

Local events this week:
{json.dumps(events[:5], indent=2)}

Return JSON: {{"recommended_price": int, "reasoning": str, "demand_level": "low"/"medium"/"high"}}"""], params)
print(json.loads(result[0].outputs[0].text))
CPU + 7BCron
~$0.002/scan
App Store review monitor
CPU watches your competitors' app reviews daily. Model categorizes complaints and feature requests by theme. Know what users hate about your competitor before anyone else — then build what they're asking for. Charge SaaS companies $99/month per competitor monitored. 20 companies monitoring 3 competitors each = $5,940/month.
app_review_monitor.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

competitors = ["com.competitor.app1", "com.competitor.app2"]
for app_id in competitors:
    reviews = requests.get(f"https://your-scraper.com/api/reviews/{app_id}?days=1").json()
    if not reviews: continue
    review_text = "\n".join([f"{'⭐' * r['rating']} {r['text'][:150]}" for r in reviews[:20]])
    
    result = llm.generate([f"""Analyze these app reviews from the last 24 hours.
App: {app_id}

Categorize into:
1. BUGS — what's broken
2. FEATURE REQUESTS — what they want
3. COMPLAINTS — what they hate
4. PRAISE — what they love

Reviews:
{review_text}"""], params)
    requests.post(WEBHOOK, json={"text": f"Review digest for {app_id}:\n{result[0].outputs[0].text[:500]}"})
CPU + 7BCron
~$0.003/scan
Expired listing prospector
CPU watches MLS for expired and withdrawn listings daily. Model writes a personalized pitch email to the homeowner explaining why their home didn't sell and how you can help. Real estate agents pay $200-500/month for lead gen tools worse than this. Build it for agents at $99/month — or use it yourself if you're an agent. 50 agent subscribers = $4,950/month.
expired_listings.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=512)

expired = json.load(open("/data/expired_listings.json"))
for listing in expired[:10]:
    result = llm.generate([f"""Write a personalized letter to a homeowner whose listing just expired.
Address: {listing['address']}
Days on market: {listing['days_on_market']}
List price: ${listing['price']}

Rules: empathetic not salesy, reference their specific property,
mention one possible reason it didn't sell, offer a free market analysis,
under 200 words."""], params)
    print(f"Letter for {listing['address']}:")
    print(result[0].outputs[0].text)
← Previous
Content Creator
21 — Agents & Scrapers

Agents & Scrapers
The Internet Works for You

CPU agents that watch the internet 24/7 for almost nothing. When they find something, they think about it, then alert you. Your private intelligence network.

CPU24/7
~$0.001/scan
Craigslist/FB Marketplace deal hunter
CPU scrapes listings every hour. Model identifies underpriced items based on retail value — furniture, electronics, vehicles, collectibles. Alerts within minutes of posting so you get there first. Flippers make $1-5K/month buying underpriced items and reselling. This bot finds the deals before anyone else sees them. Use it yourself or build a deals Telegram channel with affiliate links.
deal_hunter.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

listings = json.load(open("/data/scraped_listings.json"))
for item in listings:
    result = llm.generate([f"""Is this underpriced?
Item: {item['title']}
Price: ${item['price']}
Condition: {item['condition']}
Location: {item['location']}

Estimate retail/market value. Is this a deal worth driving for?
Return JSON: {{"retail_value": int, "deal_score": 1-10, "profit_estimate": int, "verdict": str}}"""], params)
    analysis = json.loads(result[0].outputs[0].text)
    if analysis.get("deal_score", 0) >= 7:
        requests.post(WEBHOOK, json={"text": f"DEAL: {item['title']} - ${item['price']}\nValue: ${analysis['retail_value']}\nProfit: ~${analysis['profit_estimate']}\n{item['url']}"})
CPU + 7BCron
~$0.002/scan
Government contract scanner
CPU watches SAM.gov for new contracts daily. Model reads requirements, matches to your capabilities, scores fit. Small businesses miss contracts because they don't know they exist — billions in government work goes uncontested. Charge $49/month to small contractors. Or use it yourself to find contracts your business qualifies for.
gov_contracts.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

my_capabilities = "IT consulting, web development, cloud migration, cybersecurity, small business"
contracts = requests.get("https://api.sam.gov/opportunities/v2/search?limit=20&api_key=" + os.environ["SAM_KEY"]).json()

for contract in contracts.get("opportunitiesData", []):
    result = llm.generate([f"""Score this government contract for fit.
My capabilities: {my_capabilities}

Contract: {contract.get('title', '')}
Agency: {contract.get('fullParentPathName', '')}
Type: {contract.get('type', '')}
Description: {contract.get('description', '')[:1000]}

Return JSON: {{"fit_score": 0-100, "relevant_capabilities": list, "deadline": str, "estimated_value": str, "recommendation": str}}"""], params)
    score = json.loads(result[0].outputs[0].text)
    if score.get("fit_score", 0) > 60:
        requests.post(WEBHOOK, json={"text": f"CONTRACT: {contract['title'][:80]}\nFit: {score['fit_score']}/100\n{score['recommendation']}"})
CPU + 7BCron
~$0.002/scan
Scholarship finder for students
CPU scrapes scholarship databases — Fastweb, Scholarships.com, college-specific pages. Model matches requirements to student profile — GPA, major, ethnicity, state, activities. Parents pay $9/month instantly. Students miss thousands in free money because they can't find the right scholarships. 1000 subscribers = $9,000/month. Sell on: parenting Facebook groups, high school counselor networks, college prep subreddits.
scholarship_finder.py
import requests, json, os
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]

student = {
    "gpa": 3.5, "major": "Computer Science", "state": "Texas",
    "ethnicity": "Hispanic", "activities": "robotics club, volunteer tutoring",
    "year": "Junior", "financial_need": True
}

scholarships = json.load(open("/data/scholarships.json"))
matches = []
for s in scholarships:
    result = llm.generate([f"""Does this student qualify for this scholarship?
Student: {json.dumps(student)}
Scholarship: {s['name']}
Requirements: {s['requirements']}
Amount: {s['amount']}
Deadline: {s['deadline']}

Return JSON: {{"qualifies": bool, "match_score": 0-100, "missing": str}}"""], params)
    score = json.loads(result[0].outputs[0].text)
    if score.get("qualifies"):
        matches.append(f"{s['name']} — {s['amount']} (due {s['deadline']})")

if matches:
    requests.post(WEBHOOK, json={"text": f"Found {len(matches)} scholarships:\n" + "\n".join(matches[:10])})
7BT4
~$0.005/letter
Landlord/tenant letter generator
Works for BOTH sides. Landlords: generate lease violation notices, late rent letters, move-out checklists. Tenants: generate repair requests, security deposit demands, habitability complaints. Legally informed language without paying a lawyer. Charge $10/letter. Both landlords AND tenants need this — constant demand. Sell on: r/landlord, r/tenanthelp, property management Facebook groups.
landlord_tenant.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=1024)

letter_type = "Security deposit demand (tenant)"
details = {
    "tenant": "Jane Smith",
    "landlord": "ABC Properties LLC",
    "address": "123 Main St, Apt 4B",
    "move_out_date": "March 1, 2026",
    "deposit": "$2,400",
    "issue": "Landlord has not returned deposit after 45 days, no itemized deductions provided"
}

result = llm.generate([f"""Write a formal {letter_type}.
Details: {json.dumps(details)}

Rules:
- Reference applicable state law (mention tenant should verify their state)
- Professional, firm, not threatening
- Include specific deadline for response
- Mention next steps if no response (small claims)
- Format as a proper business letter"""], params)
print(result[0].outputs[0].text)
14BA100 80GB
~$0.03/return
AI tax preparer assistant
Upload W2s, 1099s, receipts. Model organizes everything, finds deductions you missed, generates a summary for your accountant. Or use it yourself for simple returns. Charge $29/year during tax season — cheaper than TurboTax, smarter than doing it yourself. Market January through April. 5000 users at $29 = $145K in four months.
tax_assistant.py
from vllm import LLM, SamplingParams
from pathlib import Path

llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.2, max_tokens=2048)

tax_docs = Path("/data/tax_documents.txt").read_text()

result = llm.generate([f"""Analyze these tax documents and prepare a summary.
1. Categorize all income sources (W2, 1099, other)
2. Identify potential deductions (home office, vehicle, business expenses, education, charitable)
3. Flag anything unusual that needs an accountant
4. Estimate tax liability vs payments made

Documents:
{tax_docs[:8000]}

IMPORTANT: This is for preparation only, not legal tax advice."""], params)
with open("/outputs/tax_summary.txt", "w") as f:
    f.write(result[0].outputs[0].text)
print("Tax prep summary complete")
← Previous
Automation
22 — Automotive

Automotive

Every car owner needs help. Evaluating deals, understanding codes, planning maintenance. Build the tools mechanics charge $100/hr for.

7BT4
~$0.005/evaluation
Used car price evaluator
Paste a car listing from Craigslist, FB Marketplace, or a dealer. Model searches comparable sales, tells you if it's a good deal and what to offer. Save buyers $500-2000 per purchase. Charge $5/evaluation or $19/month unlimited. Sell on: car buying forums, r/whatcarshouldIbuy, r/askcarsales, Facebook car groups. Every single person buying a used car needs this.
car_evaluator.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=1024)

listing = {
    "year": 2020, "make": "Toyota", "model": "Camry SE",
    "mileage": 45000, "price": 22500,
    "condition": "Clean title, one owner, no accidents",
    "location": "Dallas, TX"
}

result = llm.generate([f"""Evaluate this used car listing.
Car: {listing['year']} {listing['make']} {listing['model']}
Mileage: {listing['mileage']:,}
Asking price: ${listing['price']:,}
Condition: {listing['condition']}
Location: {listing['location']}

Analyze:
1. Fair market value range for this car
2. Is this price good, fair, or high?
3. What to offer
4. Red flags to check
5. Expected maintenance costs in next 12 months"""], params)
print(result[0].outputs[0].text)
7BCPU
~$0.001/code
OBD-II code interpreter
Every car owner googles OBD codes. Build the better answer. Plug in your code reader, feed the codes to the model, get plain English diagnosis + estimated repair cost + "is this safe to drive." Charge $2.99/month or $0.50/lookup. Build as a UI site — car owner types in their code, gets an instant answer. Market on car forums, TikTok car content, r/MechanicAdvice.
obd_interpreter.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=512)

codes = ["P0420", "P0171"]
car = "2018 Honda Civic 1.5T"

result = llm.generate([f"""Diagnose these OBD-II codes for a {car}.
Codes: {', '.join(codes)}

For each code explain:
1. What it means in plain English
2. Common causes for THIS specific car
3. Severity: safe to drive? how urgent?
4. Estimated repair cost range
5. Can I DIY this or need a mechanic?

Be specific to the {car}, not generic."""], params)
print(result[0].outputs[0].text)
← Previous
Agents & Scrapers
23 — Agriculture

Agriculture

500 million smallholder farmers worldwide. Most lose 20-40% of crops to problems they can identify too late. AI changes that equation.

7BT4Vision
~$0.005/photo
Crop disease identifier
Upload a photo of your plant, model identifies the disease and treatment. There are 500 million smallholder farmers globally who lose 20-40% of crops to disease they can't identify in time. Build as a WhatsApp bot — $5/month. Or free tier with ads. Sell through agricultural extension programs, farming co-ops, ag supply stores. One region with 10,000 farmers at $5 = $50K/month.
crop_disease.py
from vllm import LLM, SamplingParams
import base64

llm = LLM(model="Qwen/Qwen3-7B-VL")
params = SamplingParams(temperature=0.3, max_tokens=512)

with open("/data/plant_photo.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

result = llm.generate([f"""Identify any disease in this plant photo.

1. Plant species (if identifiable)
2. Disease name and stage
3. Cause (fungal, bacterial, viral, nutrient deficiency)
4. Treatment options (organic and chemical)
5. Prevention for next season
6. Urgency: how fast does this spread?

Be specific and actionable for a farmer."""], params)
print(result[0].outputs[0].text)
← Previous
Automotive
24 — Fashion

Fashion

Style is personal. AI that understands your closet, your body, your taste — not generic "wear blue with khaki" advice.

7BT4Vision
~$0.005/outfit
Outfit recommender from closet photos
Photograph your closet — or just describe what you own. Model generates outfit combinations you never thought of, matched by color theory, occasion, season, and style. Build as UI site, charge $4.99/month. Fashion influencers would promote this for free because their followers would love it. 5000 subscribers at $4.99 = $24,950/month. Sell on: Instagram, TikTok fashion community, Pinterest.
outfit_recommender.py
from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=1024)

closet = {
    "tops": ["white oxford shirt", "navy crew neck sweater", "olive bomber jacket", "black turtleneck"],
    "bottoms": ["dark wash jeans", "khaki chinos", "black dress pants", "grey joggers"],
    "shoes": ["white sneakers", "brown chelsea boots", "black dress shoes"],
    "occasion": "casual Friday at work",
    "season": "fall"
}

result = llm.generate([f"""Generate 3 outfit combinations from this closet.
Tops: {', '.join(closet['tops'])}
Bottoms: {', '.join(closet['bottoms'])}
Shoes: {', '.join(closet['shoes'])}
Occasion: {closet['occasion']}
Season: {closet['season']}

For each outfit:
1. The combination
2. Why it works (color, proportion, vibe)
3. One accessory suggestion to elevate it"""], params)
print(result[0].outputs[0].text)
← Previous
Agriculture
25 — Data & Privacy

Your code is yours.
Your data is yours.
This is the only way
to keep it that way.

Every time you paste code into ChatGPT, send a prompt to an API, or use an AI browser extension — your data leaves your control. Your code, your customer data, your proprietary logic — it's on someone else's server, training someone else's model, building someone else's competitive moat. SeqPU is built so that never happens. Your servers talk to your rented servers. Everything stays inside your Cloudflare edge. We do not train on your code. All your code and data belongs to you. We are here to support creators — not to take from them.

The problem with every AI platform
You paste code into ChatGPT — it's on OpenAI's servers. You call Claude's API — it's on Anthropic's servers. You use GitHub Copilot — it's on Microsoft's servers. One prompt at a time, your intellectual property drips out through a leaky bucket. You have no control over what happens next. SeqPU is the closed loop. Your server, your rented GPU, your Cloudflare edge. Nothing leaks because there's nowhere for it to leak to.

Why This Is Truly Private

When you run on SeqPU, here's what actually happens: your browser connects to Cloudflare's edge network — encrypted, under your control. Our CloudSafe edge worker validates your identity and injects a cryptographic origin proof. Without that proof, our backend rejects the request outright. Your job is created, your secrets are encrypted, and the work is queued to an isolated GPU container with an authenticated service token. That container is your GPU, your filesystem, your environment. It runs your code. It writes results back to your own database record. Your frontend picks up the results in real time.

At no point does a third party see your data. There is no shared pipe. There is no middleman with a backdoor into your execution environment. It's your server talking to your rented server over infrastructure you control. That's it.

Compare that to any AI API: you send a prompt, it goes to their server, it's processed on their hardware, stored in their logs, potentially used for their training. You have no visibility and no control. With SeqPU, you run the model yourself on your rented GPU. The data never leaves infrastructure you control.

Cloudflare Zero Trust & CloudSafe

Every request to SeqPU passes through Cloudflare Zero Trust and our CloudSafe edge worker — the same architecture that protects banks, governments, and the largest enterprises on the internet. This isn't bolted-on security. It's the foundation everything is built on.

CloudSafe is the front door to everything. Every single request — browser sessions, SDK calls, headless API, Telegram bots, Discord bots, Slack integrations, WhatsApp integrations — all hit CloudSafe first. Nothing reaches your compute without passing through it.

  • Two authentication paths — browser users get Firebase token validation. SDK and headless users get Cloudflare Access service tokens with JWT assertion. Both paths inject a cryptographic origin proof that the backend requires. No proof, no access.
  • Every request is traced — origin proof, request type, real client IP, unique request ID. Full accountability at every hop.
  • DDoS dies at the edge — Cloudflare's global network absorbs volumetric attacks. Your GPU never sees them.
  • Malicious payloads die at CloudSafe — validated and filtered before they ever reach your execution environment.

The 10-Layer Security System

CloudSafe is just the perimeter. Behind it, SeqPU runs a 10-layer deep security architecture from edge to execution. We don't publish the specifics of each layer because that's how security works — the less an attacker knows about what's between them and your data, the better.

What we will tell you: every hop is authenticated. Every payload is validated. Every secret is encrypted. Every container is isolated. Every session is ephemeral. Ten layers, each independent, each capable of stopping an attack on its own. You'd have to break all ten to reach anything — and by the time you've hit the second, we already know you're there.

We Do Not Train On Your Code

This is not a qualified statement. This is not "we don't train on your code unless you opt in." This is not "we anonymize your data for research." We do not train on your code. Period.

Your code, your data, your prompts, your model outputs — all of it belongs to you. The architecture has no collection mechanism. Your code lives in the job payload, executes on your rented GPU, and results write back to your own database record. There is literally nowhere for training data to go. No pipeline, no ingestion, no interest in building one.

Your IP stays yours
Every line of code you write. Every model you fine-tune. Every dataset you upload. Every workflow you build. Every product you publish. 100% yours. We don't claim rights. We don't share it. We don't peek at it. You are the creator. You own what you create.

MCO — When You Do Need an External AI

Running models locally on SeqPU keeps everything private by default. But sometimes you need to call an external AI — OpenAI, Anthropic, Google, Mistral, Groq. The moment you do, your data leaves your environment. That's where MCO (Model Context Orchestration) comes in.

MCO is a product that lives at the Cloudflare edge — between you and any AI provider. It scans your requests before they leave and scans responses before they come back. It's the security layer that protects you when your data has to cross a boundary.

  • Prompt injection defense — catches jailbreak attempts, instruction overrides, roleplay attacks, and chat markup injection before they reach the model.
  • Secret scanning — detects API keys, tokens, private keys, and credentials from every major provider that might accidentally be in your prompts. Catches them before they leave your environment.
  • PII detection — flags emails, phone numbers, Social Security numbers, and credit card numbers before they're sent to a third-party AI.
  • SSRF & path traversal prevention — blocks attempts to reach internal infrastructure or escape the intended scope.
  • Bring Your Own Key — use your own API keys for any supported provider. MCO scans input, proxies to the provider, scans the output, returns it clean. AI safety as a layer, not a replacement.
  • Four strictness levels — low, medium, high, paranoid. You choose how aggressive the scanning is based on your risk tolerance.
  • Atomic billing — MCO credits run on dedicated infrastructure with single-threaded access per user. No race conditions. No overbilling. $0.001 per 1K tokens scanned.

Every other platform lets your data flow to AI providers unchecked. MCO is the checkpoint. The only product purpose-built to defend against LLM-vectored attacks on your data in transit.

Server-to-Server — The Closed Loop

Here's what makes this fundamentally different from every other AI platform: it's your servers talking to your rented servers. That's the entire data flow. No third party sits in the middle.

Your browser connects to your Cloudflare edge. CloudSafe authenticates you and forwards to backend services. Your job is created, your secrets are encrypted, your code is queued to an isolated GPU container. The container runs your code, writes results to your own database record, and your frontend picks them up in real time. The model runs on your rented hardware. The data stays on your rented hardware.

This is the same privacy model as owning a physical server in a data center — except it scales from CPU to 384GB of GPU in seconds and costs nothing when it's not running. The privacy of on-premise. The flexibility of cloud. The cost of serverless.

Using external API keys
If you add an API key to Secrets and call an external service (OpenAI, Anthropic, etc.), that data leaves your environment and goes to their servers. That is your explicit choice, not ours. When you run local models, nothing leaves. When you call external APIs, standard third-party terms apply. Use MCO to scan and protect those calls. We make the boundary clear so you always know where your data is going.

Why We Built It This Way

Most platforms treat your data as the product. They offer free tiers and cheap access because your usage is the value. Your prompts train their models. Your workflows inform their roadmap. Your data feeds their competitive moat.

We built SeqPU the opposite way. You pay for compute by the second. That's our revenue. We have zero incentive to touch your data because your data has zero value to our business model. Our incentive is to make your compute faster, cheaper, and more private — because that's what keeps you paying for seconds.

We are here to support creators. To give builders a platform where they can innovate without looking over their shoulder. To make sure the people who create the value are the people who capture the value. Create. Innovate. Profit. That's the deal.

← Previous
Creative
26 — On Premise

What do you have?
How do we unlock it?
How do we make it faster?

Our first question is never "what do you want to buy." It's what compute do you already own, how do we unlock its hidden potential, and how do we start organizing your data to make your systems faster and smarter from day one.

Your Data Never Leaves

Every AI API you use — ChatGPT, Claude, Gemini — your data flows through their servers. Every prompt. Every document. Every customer record. One prompt at a time, dripping through a leaky bucket.

Securing your data is only half the battle. It still needs to be used securely. API calls, productivity platforms, browser extensions — anything can be the leak. We turn your existing compute infrastructure into a private internet of compute where you can run models as powerful as Claude on your own hardware, so your data never leaves.

For regulated industries — healthcare (HIPAA), legal (privilege), financial (SOX/PCI), government (ITAR/CMMC) — this isn't a feature. It's a requirement.

CPU First — Play With What You Have

We start by assessing your existing on-premise compute infrastructure, cloud credits, and API credits. Most organizations have more compute under the hood than they realize.

  • CPUs are cheap and already inside your business — we put them to work as the backbone of your private compute network, handling tasks and inference without touching a GPU.
  • Most AI tasks don't need a GPU. Classification, extraction, routing, simple Q&A — a 7B model on a CPU handles it. No GPU needed.
  • When the work demands it, we move to GPUs — large model inference, image generation, video processing, 3D rendering. The heavy creative and analytical work that no CPU can touch.
  • When a specialized API is the right call, we act as the buffer. An orchestrator model sanitizes your data before it ever leaves your network. Then we work on bringing that capability securely in-house — built for you, running internally, tuned to your needs.

The Three Stages

1
Max Out What You Have
Every server counts. That 20-year-old CPU rack? It has value. High-value tasks run on your best hardware. Low-value tasks run on everything else. Nothing sits idle. We audit your full infrastructure and extract every unit of inference hidden inside it — before we ever suggest buying anything new.
2
Encrypted Cloud, Still Yours
When on-prem capacity isn't enough, we extend into encrypted cloud environments that only you control. Your data integrity is maintained. Your security model doesn't change. You get burst capacity without losing the walled-off isolation that makes your data yours. SeqPU's serverless GPUs are your cloud extension — pay per second, zero idle cost, Cloudflare encryption.
3
Capex Only When It Actually Pencils
The math is brutal. On a 4-year hardware cycle, a GPU needs to run 12 hours a day, every single day, for all 4 years to beat serverless pricing. Most don't. Most real workloads run 2-4 hours/day. We show you the numbers before you spend the money.
The 4-year capex mathReality
To beat serverless pricing12hrs/day × 4yrs
Most real workloads run at2–4hrs/day
Your capex risk with SeqPU$0

What We Build For You

We see AI as a dynamic new member of your team. There are core skills every system needs, but the rest is built around YOUR business.

Every business creates its own unique AI advantages. With access to your data — sales numbers, contracts, customer records, operational data, internal docs — you can generate insights, automations, and competitive advantages no one else has.

The biggest wins we find: communication breakdowns costing you money. Let AI monitor across your business processes, catching missed steps, keeping everyone in the loop before small things become big problems.

Real Examples

  • Pool idle hospital CPUs → scale AI inference for clinical teams. HIPAA compliant. Data never leaves the building.
  • Existing business servers → internal AI chatbot for your team, zero new hardware. Answers questions about your own docs and processes.
  • Legacy infrastructure → modern agentic workflows layered on top. Your old servers learn new tricks.
  • Private data + local models → sovereign IP pipeline, fully on-prem. Your competitive intelligence stays yours.

Deployment Options

A
Full On-Premise
Everything runs on your hardware. Nothing touches the internet. Maximum security. Maximum control.
B
Complete Cloud
SeqPU handles everything. Serverless GPUs. Cloudflare encryption. Firebase auth. Zero maintenance on your end.
C
Hybrid
Sensitive data processes on-premise. Heavy compute bursts to cloud. Best of both worlds. Most common choice.

The Shift

In 1995, the problem wasn't "where do I get a server." Servers were everywhere. The real problem was how do you make servers work together to build something none of them could build alone. That's what TCP/IP, HTTP, and the web browser solved — not more servers, not cheaper servers. A protocol for collaboration.

Every GPU company today is solving the 1995 server problem. SeqPU is solving the 1996 internet problem: "Given that GPUs exist everywhere, how do I make them think together?"

Old WorldNew World
"I need a GPU""I need a problem solved"
Rent hardwareOrchestrate capability
One model, one GPU, one callMany specialists, many chips, one pipeline
You manage the infrastructureThe network manages the infrastructure
Pay for hardware timePay for thinking

Why Nobody Else Landed Here

SeqPU — Zero capex · Pass-through pricing
We own no hardware. We push no software seats. Revenue comes from being the network that optimizes compute usage. If we help you use 15% less compute by right-sizing models, we take 1% of the savings. Our revenue goes up when you spend less.
  • GPU Clouds (CoreWeave, Lambda, Hyperscalers) — bought 100,000 GPUs. Incentivized to sell you the biggest model on the biggest GPU for the longest time. Efficiency is their enemy.
  • Marketplaces (Akash, io.net) — take a cut of every transaction. They don't care if you're wasting compute. They still get their cut.
  • Token APIs (OpenAI, Anthropic, Together AI) — charge per token. Sending a 50K-token page to a 70B instead of a 7B first? 100× more revenue for them.
Our alignment
We are locked in to one thing only: your results. Not hardware. Not software. Not a cloud provider. Hardware agnostic. Software agnostic. We work with what you have and optimize from there. From text to compute, to full agentic networks, to consuming and storing all your valuable IP — we are the solution for organizations that want to respect the integrity of their innovations and data without compromising on capability or cost.
← Previous
Data & Privacy
27 — Pricing

Pay for compute.
Not tokens.

You pay for the GPU time your code uses — by the second. When it's not running, you pay nothing. No seat fees. No monthly minimums. No idle cost. No token tax. You're paying for the machine, not the middleman.

How Billing Works

You click Run All. The GPU spins up. The meter starts. Your code runs. Your code finishes. The GPU spins down. The meter stops. You're billed for exactly the seconds it ran. Not the minute. Not the hour. The second.

Nothing running? $0. Nobody calling your API? $0. Tuesday at 3am and nobody's using your tool? $0. You only pay when compute is actively running.

GPU Pricing

GPUVRAMPer SecondPer Hour
T416 GB$0.000164$0.59
L424 GB$0.000222$0.80
A10G24 GB$0.000306$1.10
L40S48 GB$0.000542$1.95
A100 40GB40 GB$0.000583$2.10
A100 80GB80 GB$0.000694$2.50
RTX PRO 6000$0.000842$3.03
H10080 GB$0.001097$3.95
H200141 GB$0.001261$4.54
B200192 GB$0.001736$6.25

CPU Pricing

$0.0000131 per core per second — that's $0.047 per core per hour.

Minimum 0.125 cores per container. CPU handles everything that doesn't need a GPU: API calls, web scraping, data processing, orchestration, email, file manipulation. Most agent work is CPU. 1,000 messages/day on CPU costs about $1.40/day.

Memory Pricing

$0.00000222 per GiB per second

  • Per hour: $0.008/GiB
  • Per day: $0.19/GiB

Memory is your running container's RAM — separate from GPU VRAM which is included in the GPU price. Most jobs use the default allocation. You don't think about this unless running something very memory-heavy.

What Things Actually Cost

What you doGPUTimeCost
Summarize a documentA100 80GB8 sec$0.006
Translate a paragraphT43 sec$0.0005
Generate an imageL40S15 sec$0.008
Transcribe 1 hour of audioT45 min$0.05
CPU script (API call + format)CPU 2 cores2 sec$0.00005
Process 100 receipts (vision)L40S10 min$0.33
Deep research queryH20030 sec$0.038

The Economics — API vs Your Own Model

API pricing is a premium. You're paying for convenience — and giving up your data. Every prompt, every document, every customer record flows through someone else's servers.

ApproachCost per 1M tokensYour data
GPT-4o$2.50–10/MSent to OpenAI
Claude Sonnet$3–15/MSent to Anthropic
Gemini Pro$1.25–5/MSent to Google
Your 7B on T4~$2/MStays on your server
Your 14B on A100~$8/MStays on your server
Your 32B on A100~$17/MStays on your server

API calls are a premium — they better be worth it. Because they also take your data. A specific AI built for YOUR task, on YOUR hardware, with YOUR data, will usually pull better results than a general AI built for everyone. A 7B model tuned on your customer support data outperforms GPT-4 on your support tasks — and costs 6x less per token.

The Markup — How You Earn

Set 0-30% markup when you publish a tool. The caller pays compute + your markup. You keep the difference on every single call, 24/7.

  • You build a translation API on T4. Compute: $0.001/call. You set 20% markup. Caller pays $0.0012. You keep $0.0002.
  • At 10,000 calls/day: $2/day = $60/month passive.
  • You build a document summarizer on A100. Compute: $0.006/call. You set 25%. Caller pays $0.0075. You keep $0.0015.
  • At 2,000 calls/day: $3/day = $90/month passive.

The same markup game the API providers play — except now you're the provider.

Pricing

Pay-as-you-go billing, by the second. $0 when idle. Add credits via Stripe to start. Below: what $1.00 of credits (1,000,000 micro-dollars) buys you on each hardware tier.

  • ~170 calls on A100 80GB at 8 seconds each
  • ~6,000 calls on T4 at 3 seconds each
  • ~33,000 CPU runs at 2 seconds each

Enough to build something, test it, and prove it works.

Storage

WhatHow longCost
Code, secrets, GPU selectionForeverFree
Model cachePersists across all projectsIncluded
Uploaded filesPersists for the projectStorage rate/GiB
Output filesUntil you clear themStorage rate/GiB
Nothing running$0
The bottom line
No token tax. No seat fees. No idle costs. Pay for the machine by the second. Keep your data. Set your markup. The compute is the cheapest part — the margin is yours.
← Previous
On Premise
↑ Done
Back to Home