Everything you need
to ship on day one.
Full documentation and base code examples for every use case. No hunting through forums. No guessing. Open the notebook, paste the code, hit Run All.
You are early
to the future.
This is how you start.
The same compute that powers OpenAI, Anthropic, and Google — 16GB to 384GB of GPU — is now available to anyone with an idea and 10 minutes. No data center. No team. No investment. Most people don't know this exists yet. You do. That's the edge. And right now is better than ever — open source models are matching the APIs, new models drop weekly, and you get day-one access to all of them. This is the beginning of something, and the people here right now are the ones who will look back and say "I was there before it was obvious."
Step 1 — Sign Up
Create an account with Google or email. Add credits via Stripe to start running compute — pay-as-you-go, billed by the second, $0 when idle. No subscriptions, no trial period.
Step 2 — Create a Project
When you open SeqPU you'll see your project name in the top bar. A project holds your cells, environment, secrets, files, and GPU selection. Everything lives inside it. Auto-saves constantly — close the tab, come back tomorrow, it's exactly where you left it.
Step 3 — Pick Your GPU
The GPU selector strip runs across the top. Click one — green means active. That's your hardware for this run. Start with A100 80GB — it handles 90% of use cases at $2.50/hour, billed by the second.
When you type code, the notebook reads it, detects your model, estimates VRAM usage, and shows it in the header — "Qwen3-14B (14B) — VRAM: ~31GB". You know if it fits before you spend a cent. Not running a model? Select CPU Only — $0.047/hour for scripts, API calls, web scraping, bots.
Step 4 — Add Code
The center panel is your notebook. Each cell is a block of Python. Paste your code in — or describe what you want to Claude and paste what it writes. Cells run in order, top to bottom, as one script.
Step 5 — Hit Run All
The Run All button runs every cell in sequence. Watch the Console panel on the right — live output, errors, and system status stream in real time as your code executes on the GPU. When it finishes, the GPU spins down. You're billed for exactly the seconds it ran. Idle = $0.
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=512) result = llm.generate(["Hello, what can you do?"], params) print(result[0].outputs[0].text)
That's a 14-billion parameter AI model running on your rented GPU. Your data never left your environment. No API key. No token bill. No third party. You just ran the same technology that powers ChatGPT — on your own hardware, under your control.
Now — Here's What You Can Do With It
Running a script is step one. What makes SeqPU different is what happens after your code works.
Publish as a Headless API
Click Publish → API Endpoint. Your notebook becomes a URL that other systems can call. Define inputs, set a markup (0-30%), and every time someone calls your endpoint, your code runs and you get paid. An API that charges money — built from a notebook in 5 minutes.
Publish as a UI Site
Click Publish → With UI. Add three HTML attributes to connect inputs and outputs. Visitors fill in forms, click Generate, and the GPU fires for exactly the seconds it needs. Your brand. Your URL. Your product. Visitors don't know or care what's behind it — they just use it and pay.
Connect a Telegram Bot
Go to Settings → Connections. Paste a Telegram bot token from @BotFather. Select which published tool it runs. Done — your AI answers from your phone. Discord, Slack, and WhatsApp work the same way. Three steps. Your bot, your name, your avatar.
Build Agents That Run Forever
Agents are scripts that make decisions. They read input, decide what to do, take action, check the result, and loop. The SeqPU SDK gives you seqpu.run() to spawn sub-jobs on any GPU, seqpu.tools.call() to call your published tools, and seqpu.notify() to send messages to Telegram, Discord, Slack, or WhatsApp. Schedule them on a cron. They run 24/7. CPU handles 80-90% of agent work at $0.047/hour — the GPU only fires when the agent actually needs to think.
Make Money While You Sleep
Every published tool runs without you. Credits checked automatically. GPUs spin up on demand, spin down when idle. Zero cost when nobody's calling. Stack them — your first tool makes $200/month, your second makes $500, your third makes $1,000. Each one took an afternoon. No boss. No schedule. No cap on earnings.
Your Data Stays Yours
Every time you paste code into ChatGPT or call an AI API, your data leaves your control. With SeqPU, it never does. Your browser talks to your Cloudflare edge. Your job runs on your rented GPU. Results write to your own database. The entire chain is authenticated at every hop. Your server talking to your rented server. No third party sees your data. We do not train on your code. All your code and data belongs to you — fully, permanently, unconditionally.
This Is the Beginning
Every wave creates millionaires. Websites in 1996. Apps in 2008. SaaS in 2012. AI tools and agents are in that window right now. The technology works. The cost is dirt cheap. The demand is exploding. And most people have no idea this is possible.
You're here. You're reading this. You're one of the early ones. The normal person can now play the same games the hyperscalers play — the same GPUs, the same models, the same compute. The only difference between you and OpenAI is that they started three years ago. You're starting now.
Open the notebook. Describe what you want. Paste. Run. Publish. Start charging. That's the whole path. There is no risk. There is only upside.
Where ideas
become products.
This is a product factory. Write code in cells, test it with the Console, refine it until it's right, then click Publish — your notebook becomes a headless API, a UI site, or a Telegram bot that charges money while you sleep. 16GB to 384GB of GPU compute with a click. Pay by the second. Ship with a button click.
Cells — Your Code, Organized
Your code lives in cells — numbered blocks of Python. You break your script into cells for organization, like chapters in a book. But when you hit Run All, it assembles every cell into one script and executes it top to bottom as a single Python file.
Cell 1 + Cell 2 + Cell 3 = one continuous script. Variables from Cell 1 are available in Cell 2. Output from Cell 2 is available in Cell 3. They share state. The cells are for YOU — to think clearly, to iterate on pieces, to organize your logic. The GPU sees one file.
- + Add Cell — adds a new empty cell at the bottom
- Remove — deletes a cell (trash icon)
- Reorder — move cells up or down with arrow buttons
- Run Cell 1 — runs only the first cell in isolation. Load your model in Cell 1, iterate on Cell 2 twenty times without waiting for the model to reload. Saves minutes per iteration.
The pattern: Cell 1 loads your model (run once). Cell 2 does the work (iterate here). Cell 3 formats and delivers (output, notification).
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=1024) print("Model loaded and ready")
from pathlib import Path for doc in Path("/data").glob("*.txt"): text = doc.read_text() result = llm.generate([f"Summarize:\n{text}"], params) print(f"--- {doc.name} ---") print(result[0].outputs[0].text)
Files
Drag files onto the left panel or click Add URL to download from a link. Files up to 100MB each. Supported formats: CSV, JSON, images (PNG, JPG), audio (WAV, MP3), video (MP4), model weights (.safetensors, .pt), notebooks (.ipynb), and archives (.zip, .tar.gz).
Files are stored on a persistent volume. They survive across runs for the same project. Access them at /data/filename in your code.
from pathlib import Path import json, pandas as pd # Read a text file text = Path("/data/report.txt").read_text() # Read a CSV into a DataFrame df = pd.read_csv("/data/sales.csv") print(df.head()) # Read a JSON config data = json.loads(Path("/data/config.json").read_text()) # List all files for f in Path("/data").iterdir(): print(f.name, f.stat().st_size)
Custom Image (Environment)
By default, your code runs on a Python 3.12 image with PyTorch, transformers, pandas, numpy, matplotlib, opencv, and 40+ common packages pre-installed. You don't need to set up an environment for most tasks.
If you need additional packages, open Custom Image in the left panel:
- Name — give your environment a name (e.g., "my-llm-env")
- Base Image — choose a starting point:
- Python (default) — PyTorch + transformers + 40+ ML packages
- vLLM — optimized for fast LLM inference. Use this when serving language models.
- Python + Node — Python plus Node.js 22 LTS for JavaScript tools
- Python + CUDA — NVIDIA CUDA 12.1 dev image for custom CUDA kernels
- Minimal — bare Python 3.12. You install everything yourself.
- Pip Packages — add any packages not in the base image. The platform auto-detects packages from your code.
The Console — Your Debugging Partner
The right panel. Everything your code prints appears in real time — line by line as it executes, not after the job finishes. This is the feedback loop that makes everything work.
The vibe coding loop:
- Paste code from Claude → hit Run All
- Console says FileNotFoundError: /data/report.pdf → wrong path, tell Claude
- Console says CUDA out of memory → model too big, move up one GPU tier
- Console says ModuleNotFoundError: No module named 'einops' → add the package to your environment
- Console says result: Here is the summary... → it works, ship it
You don't need to understand the error. Copy it. Paste it into Claude. Claude fixes it. Paste the fix back. Run again. 2-4 rounds and it's done. The Console IS the teacher.
- Live streaming — output appears line by line as your code executes
- Errors — full Python tracebacks with line numbers
- System messages — "Model loaded on cuda:0", "Downloading model...", "Execution time: 12.3s"
- Resize — drag the left edge wider for long output, narrower for more code space
- Clear — wipe output for the next run
GPU Selection — 16GB to 384GB
The strip across the top. 12 buttons. Click one — green highlight means active. That's your hardware for this run. From CPU Only ($0.047/core/hour) to 2× B200 384GB ($12.50/hour). Same notebook. Same code. Different button.
When you select CPU Only, a dropdown appears for core count: 0.25, 0.5, 1, 2, 4, or 8 cores with live hourly price. CPU for scripts, API calls, data processing — no GPU needed.
Smart detection: As you type, the notebook reads your code, detects your model (from LLM(model="...") or from_pretrained("...")), estimates parameter count and VRAM usage, and shows it in the header — "Qwen3-14B (14B) — VRAM: ~31GB". You know if it fits BEFORE you hit Run All.
Model Caching — Load Once, Use Forever
First time you load a model from HuggingFace — download takes 2-10 minutes depending on model size. After that, it's cached on a persistent volume shared across ALL your projects.
Second run? Loads in seconds. Load Qwen3-14B in your translation notebook → it's cached → open your summarizer notebook → same model loads instantly. One cache for your entire account. You never download the same model twice.
Output Files — What Your Code Produces
When your code saves files — plt.savefig("chart.png"), image.save("/outputs/result.png"), Path.write_text() — the system captures them automatically. They appear in the sidebar with type icons:
- 🖼 Images — PNG, JPG, generated charts, AI art
- 🎬 Video — MP4, generated animations
- 📊 Data — CSV, JSON, Parquet
- 📄 Text — TXT, logs, reports
- 🧠 Models — saved model weights
Click to download. Save to Drive (💾) for permanent storage. These same files come back in response["files"] when your notebook is published as a headless API — each with a public url (header-free, any size).
How a Job Runs
What happens when you click Run All:
One job at a time. Close the tab and come back — the notebook reconnects to your running job and resumes streaming output.
Storage — What Persists
| What | How long | Cost |
|---|---|---|
| Your code, secrets, GPU selection | Forever (auto-saved) | Free |
| Uploaded files | Persists for the project | Storage rate |
| Output files | Until you clear them | Storage rate |
| Model cache | Persists across ALL projects | Included |
| Idle (no compute running) | — | $0 |
Notebook Import
Drag any .ipynb file (Jupyter/Colab notebook) onto the page. SeqPU extracts the code cells and gives you two options:
- Replace current cells — swaps your existing cells with the imported ones
- Create new project — creates a fresh project with the imported cells
Option to keep cell outputs from the original notebook for reference.
Auto-Save
Everything saves automatically. Your cells, secrets, GPU selection, environment choice, files — all persisted. Close the tab, come back tomorrow, it's exactly where you left it. Projects can be organized into folders from the Dashboard.
Secrets
Never put API keys in cells. Use the Secrets panel in the left sidebar. Secrets are encrypted at rest, never shown in logs, and injected as environment variables when your code runs.
import os # Set these in the Secrets panel — never hardcode them api_key = os.environ["OPENAI_API_KEY"] hf_token = os.environ["HF_TOKEN"] db_password = os.environ["DATABASE_URL"]
From Notebook to Product
When your code works — when the Console shows the output you want — click Publish in the header. Your notebook becomes a product:
- Headless API (section 06) — other systems call it, you charge per call with markup
- UI Site (section 07) — visitors use it in a browser, you charge per use or subscription
- Telegram Bot (section 05) — users message it from their phone, you charge per subscriber
- Scheduled Job (section 08) — runs automatically on a cron, monitors, reports, alerts
Every notebook is a product waiting to happen. Write it, test it, refine it with the Console, publish it, start charging. The notebook is the workshop. The published tool is the product. The margin is yours.
Right model.
Right GPU.
Right cost.
Open source AI models are the same technology as the APIs you're used to — same transformer architecture, same attention mechanism, same algorithms. The difference: you can see them, control them, run them yourself. An LLM is not a black box. It's a function — text in, text out. SeqPU gives you 16GB to 384GB of GPU memory with a click of a button. No procurement. No setup. No servers.
Why Run a Model Yourself
- Privacy — your data never touches a third party. Medical records, legal documents, proprietary IP stays on your GPU.
- Cost at scale — no per-token fees. At 10,000 calls/day, a 32B model on A100 costs ~$60/day. Same volume through Claude or GPT-4: $300-1,500/day.
- Control — no rate limits, no content filters you didn't choose, no model deprecation. Run any model the day it drops on HuggingFace.
- Speed — no network round-trip to an API server. Your model is right there on the GPU.
The LLM Is Not a Black Box
Every AI API you've ever used — ChatGPT, Claude, Gemini — runs the same algorithm as the open source models here. The transformer architecture is published research. The attention mechanism is matrix multiplication. The generation loop is: predict next token, append, repeat. That's it.
An LLM is a function. Text in, text out. Like a sort algorithm processes a list, an LLM processes a sequence of tokens. The weights are a downloadable file. The inference code is open source. The only thing the API providers add is hosting and a bill. Open source gives you the algorithm itself — you run it on your hardware, at your precision, chained however you want.
How Models Work on Hardware
A model is a file full of numbers — billions of them. Those numbers (parameters) need to fit in GPU memory (VRAM). How precisely you store each number determines how much space it takes:
- FP16/BF16 (full precision) — 2 bytes per number. A 7B model = ~15GB. Maximum quality.
- INT8/FP8 (8-bit) — 1 byte per number. Same model = ~8GB. Slight quality trade-off, hard to notice.
- INT4/AWQ/GPTQ (4-bit) — 0.5 bytes per number. Same model = ~4GB. Fine for 95% of tasks. Quarter the VRAM.
Plus 10-20% overhead for the KV cache — the model's working memory while it generates text token by token. Longer conversations = more cache = more VRAM. vLLM manages this automatically.
| Model | FP16 | INT8 | INT4/AWQ |
|---|---|---|---|
| 3B | ~7 GB | ~3.5 GB | ~2 GB |
| 7B | ~15 GB | ~8 GB | ~4 GB |
| 14B | ~31 GB | ~15 GB | ~8 GB |
| 32B | ~70 GB | ~35 GB | ~18 GB |
| 70B | ~154 GB | ~77 GB | ~39 GB |
| 405B | ~891 GB | ~446 GB | ~223 GB |
Picking the Right GPU
Not a spec sheet. A decision:
What Fits Where — Quick Reference
Qwen3-32B on an A100 80GB? Llama-3.1-70B-AWQ on a T4? Check the grid.
| Model | T4 16GB | L4 24GB | L40S 48GB | A100 40GB | A100 80GB | H100 80GB | H200 141GB | B200 192GB |
|---|---|---|---|---|---|---|---|---|
| 3B INT4 | yes | yes | yes | yes | yes | yes | yes | yes |
| 7B INT4 | yes | yes | yes | yes | yes | yes | yes | yes |
| 7B FP16 | — | yes | yes | yes | yes | yes | yes | yes |
| 14B INT4 | yes | yes | yes | yes | yes | yes | yes | yes |
| 14B FP16 | — | — | yes | yes | yes | yes | yes | yes |
| 32B INT4 | — | tight | yes | yes | yes | yes | yes | yes |
| 32B FP16 | — | — | — | — | yes | yes | yes | yes |
| 70B INT4 | — | — | — | — | yes | yes | yes | yes |
| 70B FP16 | — | — | — | — | — | — | yes | yes |
| 405B INT4 | — | — | — | — | — | — | — | 2×GPU |
Loading Models — vLLM vs Transformers
vLLM — the production standard. Handles batching, KV cache management (PagedAttention), and quantization automatically. A 32B model on A100 80GB with vLLM can handle 10+ concurrent requests. Select "vLLM" as your base image.
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-32B", max_model_len=4096) params = SamplingParams(temperature=0.7, max_tokens=1024) result = llm.generate(["What is the capital of France?"], params) print(result[0].outputs[0].text)
Transformers — the universal loader. Supports every model architecture. Use for fine-tuning, vision, audio, custom pipelines, or anything vLLM doesn't support.
from transformers import AutoModelForCausalLM, AutoTokenizer import torch tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-7B-Instruct", torch_dtype=torch.float16, device_map="auto") inputs = tokenizer("Explain photosynthesis:", return_tensors="pt").to("cuda") output = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(output[0], skip_special_tokens=True))
INT4/AWQ — look for "-AWQ" or "-GPTQ" suffix on HuggingFace. 4x less VRAM. A 70B model fits on an A100 80GB.
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen2.5-72B-Instruct-AWQ") params = SamplingParams(max_tokens=1024) result = llm.generate(["Write a business proposal for..."], params) print(result[0].outputs[0].text)
When to use which:
- Serving a text model → vLLM
- Fine-tuning → Transformers
- Vision model (Qwen-VL, Pixtral) → Transformers
- Image generation (Stable Diffusion, FLUX) → Diffusers
- Audio (Whisper, Bark, Kokoro) → Transformers
- Embeddings → Sentence Transformers (CPU usually enough)
Mixtures of Compute Experts — The Real Power
A 32B model with a well-crafted prompt beats a 480B model with a lazy prompt. A pipeline of small models costs less than one big model call — and produces better results. Each step purpose-built, right-sized, on exactly the hardware it needs.
import seqpu # Stage 1: Classify (3B on T4, $0.0002) category = seqpu.tools.call("classifier", {"text": message}) # Stage 2: Only hard cases hit the big model (32B on A100, $0.005) if category["result"] == "complex": analysis = seqpu.tools.call("deep-analyzer", {"text": message}) else: analysis = seqpu.tools.call("quick-responder", {"text": message}) # Stage 3: Format and send (CPU, $0.00003) seqpu.notify(analysis["response"], chat_id=telegram_chat_id, platform="telegram")
More pipeline patterns:
- Research: CPU scrapes web → 7B extracts facts → 32B synthesizes answer → CPU sends to Telegram
- Content: 14B writes draft → 7B checks grammar → image model generates cover → CPU publishes to CMS
- Support: 3B classifies intent → routes to the right 14B specialist (billing/technical/returns)
- Documents: CPU extracts PDF text → 7B summarizes → embedding model indexes for search
The Model Landscape — What's Available Right Now
There are thousands of models on HuggingFace — from 22 million parameters to 685 billion. Here's what to use for what.
General Assistant / Chatbot
The core use case. A model that talks, answers questions, writes, reasons.
- Budget ($0.59/hr): Qwen3-8B or Llama-3.1-8B on T4 in INT4. Surprisingly good for Q&A and support.
- Production ($2.50/hr): Qwen3-32B or Qwen2.5-72B-AWQ on A100 80GB. GPT-4 class quality. This is where most serious bots run.
- Maximum ($4.54-9.08/hr): DeepSeek V3.2 (685B), Qwen3.5-397B, Llama 4 Maverick on H200/2×H200.
Trade-offs: Qwen3 leads multilingual (200+ languages). Llama has the largest community and most fine-tunes. DeepSeek leads reasoning benchmarks. Pick based on your task.
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-32B") params = SamplingParams(temperature=0.7, max_tokens=1024) result = llm.generate(["Explain quantum computing to a 10 year old"], params) print(result[0].outputs[0].text)
Code Generation
- DeepSeek-Coder — dedicated coding model, multiple sizes
- Qwen3 — excellent at code across all sizes
- GLM-4-9B — strong tool integration, runs on L4
- gpt-oss-120B — OpenAI's first open-weight release, fits on a single H100
Note: In 2026, open source code models beat proprietary ones. MiMo-V2-Flash exceeds GPT-5 on SWE-bench.
Vision — Images, PDFs, Charts, Screenshots
Send an image, get structured data back. Photo of a receipt → amounts. Chart → numbers. Screenshot → text. These use transformers, not vLLM.
- Tiny: DeepSeek-VL 1.3B — runs on T4, basic image understanding
- Small: Gemma 3 4B, MiniCPM-o 2.6 — run on L4, documents and charts
- Medium: Pixtral 12B, Molmo 7B — multi-image understanding
- Large: Qwen3-VL (256K context — processes entire books and hour-long videos)
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from PIL import Image model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") image = Image.open("/data/receipt.jpg") inputs = processor(images=image, text="Extract all line items and totals as JSON", return_tensors="pt") output = model.generate(**inputs, max_new_tokens=512) print(processor.decode(output[0], skip_special_tokens=True))
Image Generation
Text to image. Runs on L40S ($1.95/hr) or A100 ($2.50/hr).
- FLUX.2 — current standard. Production quality in 4.5 seconds. Best photorealism.
- Stable Diffusion 3.5 / SDXL — largest ecosystem. Thousands of community fine-tunes, LoRAs, ControlNets.
- Z-Image-Turbo — fastest generation for rapid prototyping.
Trade-off: FLUX.2 produces higher quality but Stable Diffusion has 10x the community tooling. Start with SD if you're new. Move to FLUX for production.
from diffusers import StableDiffusionXLPipeline import torch pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16) pipe = pipe.to("cuda") image = pipe("a futuristic city at sunset, cyberpunk style").images[0] image.save("/data/output.png")
Audio — Speech to Text
Transcribe audio to text. Whisper is pre-installed — just import and go.
- Whisper Large V3 (1.5B) — 99+ languages, runs on T4. An hour of audio = ~$0.02.
- NVIDIA Canary 2.5B — lowest error rate on benchmarks.
- Parakeet TDT — fastest for real-time transcription.
import whisper model = whisper.load_model("large-v3") result = model.transcribe("/data/meeting.mp3") print(result["text"])
Audio — Text to Speech
Generate speech from text. Kokoro runs on CPU — your bot can talk for free.
- Kokoro (82M) — runs on CPU for fractions of a cent. Surprisingly good quality.
- Bark — multi-language, multi-speaker. Pre-installed in SeqPU's default image.
- VibeVoice-1.5B — up to 90 minutes long-form, 4 speakers.
- Dia — dialogue with laughter, sighs, emotions. Great for podcasts.
Embeddings & RAG
- all-MiniLM-L6-v2 (22M) — runs on CPU, 5-14K sentences/second. The go-to for semantic search.
- BGE-M3 — multilingual, dense + sparse retrieval.
- Qwen3-Embedding-8B — tops the MTEB leaderboard.
Note: Embedding models are tiny. Most run on CPU. No GPU needed.
Translation
- Qwen3.5 — 200+ languages and dialects.
- Llama 3.1-8B — efficient multilingual on T4.
Deep Research
- Search-R1 v0.3 (Qwen2.5-32B base) on H200 — searches the web, reads results, reasons, searches again.
- See the full Deep Research Agent example in section 05.
The Scale — 22 Million to 685 Billion
Or Just Use an API
You don't have to run a model locally. Call Claude, GPT-4, Gemini, Groq, Mistral — any AI API from your SeqPU notebook. Runs on CPU for $0.047/hr. Use SeqPU as the harness: your code, your secrets, your Telegram bot, your scheduling. The AI provider is your choice.
from anthropic import Anthropic import os client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"]) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Summarize this document..."}] ) print(response.content[0].text)
Model Caching
First download takes minutes. After that, it's cached on a persistent volume — second run loads in seconds. Cache persists across all your projects.
GPU Pricing
Pay per second. No minimums. The meter starts when your code runs and stops when it finishes.
You're not coding.
You're describing.
AI writes better code than most engineers. You don't need to learn Python. You don't need to understand GPUs. You need to know what you want — and say it out loud. Open Claude, describe the thing you need, paste it into SeqPU, press Run All. That's the whole process.
You don't need to know Python. You don't need to know what vLLM is. You don't need to know what a tensor is. You just need to know what you want.
Make Money
Run Your Business
Personal Life
Automate Everything
Healthcare
Monetize it: Publish as a UI site. Other therapists pay $49/month to use it. Data never leaves the GPU — real privacy, not a promise.
Education
Monetize it: Publish as a UI site for your department. Or sell to other schools at $29/teacher/month.
Content Creators
Monetize it: Publish as a headless API. Other creators call it. Charge $0.10 per generation with 25% markup.
Freelancers
Monetize it: That proposal generator? Publish it. Other freelancers pay $5 per proposal. You built it in 10 minutes.
E-commerce
Monetize it: Publish as a UI site. Etsy and Amazon sellers upload product photos, get listings back, pay per use.
Trades
Monetize it: Publish the bot on Telegram. Other tradespeople subscribe for $15/month. Your expertise, automated.
Families
Monetize it: Publish the homework helper as a UI site. Other parents use it. Free tier for 5 checks/day, $9.99/month unlimited.
Agents — Your AI Does the Work
An agent is just a script that makes decisions. It reads input, decides what to do, takes action, checks the result, and decides what to do next. That's it. It's a loop. The model is the brain. Your published tools are the hands. seqpu.agent.loop() handles the wiring — you just define what tools are available and what the goal is.
You don't need a framework. You don't need LangChain. You don't need CrewAI. You need a model, a list of tools, and a while loop. SeqPU gives you the model, the tools, and the compute. You provide the goal.
Connect it to Telegram and you have a personal AI that lives in your pocket. Message it "find me flights to Tokyo under $800 next month" and it searches, compares, and reports back. Message it "summarize today's earnings calls for NVDA and AMD" and it researches and delivers. It's not science fiction. It's a published tool with seqpu.notify().
What People Are Building
- A bakery owner generates Instagram posts from cake photos — no code, Plain English, 5 minutes to build. Now charges local businesses $25/month to generate their social posts too.
- A lawyer reviews 200-page contracts in 30 seconds — Qwen3-32B on A100. Published as a UI site for her firm. Saved 15 hours/week.
- A teacher built a study bot on Telegram for her students — Qwen3-8B on T4, $0.59/hr. Students message it questions about the course material 24/7.
- A freelancer built a translation API — makes $400/month from API calls while sleeping. 15% markup, published as headless.
- A startup processes 10,000 support tickets/day — classify on T4, respond on A100, deliver on CPU. Replaced 3 support agents.
- A fitness coach built a meal + workout planner bot — $19/month × 47 clients = $893/month from a bot that took 20 minutes to build.
- A property manager automated lease review — drops PDFs, model flags non-standard clauses. Saved 15 hours/week across 200 properties.
- A student built a study bot that reads textbook PDFs, generates flashcards, quizzes via Telegram, and tracks which topics they struggle with.
- A nonprofit built a grant writing assistant — paste the RFP and your org's mission, get a first draft. Published as a UI site for other nonprofits.
- A mechanic built a diagnostic bot on Telegram — describe the symptom, get likely causes and repair estimates. Other shops subscribe for $15/month.
- A real estate agent auto-generates listings from property descriptions — Claude API on CPU, $0.00003 per listing. Published for other agents.
- A parent built a homework helper that reads photos of assignments, checks answers, and explains mistakes — published for other parents at $9.99/month.
- A consultant built an agent that monitors 20 news sources every morning, filters for her clients' topics, writes personalized briefings, and delivers to Telegram by 7am. $149/client/month. Runs on T4.
- A developer built an agent that reviews his GitHub PRs — reads the diff, checks for bugs, suggests improvements. Runs on CPU calling Claude API. Saved 2 hours/day.
- A photographer built a Telegram bot where clients request edits — "make it warmer," "crop tighter." The bot queues requests and notifies when ready. The intake is automated, the editing is manual.
- A nonprofit built a grant writing assistant — paste the RFP and your org's mission, get a first draft back. Published as a UI site for other nonprofits to use.
Publish and Monetize
Every script you build can become a product. Click Publish. Pick the format. Set a price. Ship it.
- Headless API — other systems call it programmatically. Charge per call with markup.
- UI Site — anyone visits your URL and uses it. Share it, embed it, charge for it.
- Telegram Bot — people message it from their phone. Your AI in their pocket.
- Scheduled Job — runs automatically on a cron. Monitors, reports, alerts — while you sleep.
Connect your code
to the world.
Write a script. Publish it. Connect your Telegram bot. Your AI answers from your pocket. CPU is enough for most bots — GPU only fires when you need a local model. Use Claude, GPT, Gemini, or any provider. SeqPU is the harness. You control everything.
Connect Your Own Telegram Bot
Three steps to turn any published tool into a Telegram bot:
How Messages Flow
When someone messages your bot, here's exactly what happens:
- Telegram sends the message to Cloudflare (edge security, DDoS protection)
- Cloudflare looks up your bot config — checks for slash commands, prepends system prompt
- Cloudflare dispatches to Firebase (credit check, job creation)
- Firebase sends job to Modal (your selected GPU or CPU)
- Your code runs — receives task, context, telegram_chat_id as variables
- Your code calls seqpu.notify() — response goes back through Cloudflare using your bot's token
- Response appears in Telegram from your bot, with your bot's name and avatar
Total latency: 2-30 seconds depending on hardware and model size. Static slash commands return instantly from the edge.
Customize Your Bot
All customization is in Settings → Connections. No code changes needed.
- System Prompt — prepended to every message before it hits your tool. Define personality: "You are Marcus, a financial analyst." Your tool code doesn't change — the prompt is injected at the edge.
- Welcome Message — what users see when they send /start. Custom greeting for your bot. Leave empty for default.
- Acknowledgment — instant reply before your tool runs ("Researching...", "On it...", or empty to disable). No compute cost.
All three auto-save when you click out of the field.
Slash Commands
Map /commands to different tools or instant static messages. Add them in Settings → Connections → Commands.
- /help → static message "I can help with..." — $0.00, instant from edge, no compute
- /research → routes to your research tool on H200
- /quick → routes to a fast tool on T4
- /translate → routes to your translation tool
Commands appear in Telegram's autocomplete menu when users type /. Each command can point to a different published tool on different hardware — route cheap tasks to cheap GPUs.
Billing
Every message uses your Firebase credits (micro-dollars). Credits are checked before each message processes — insufficient credits means the message doesn't run.
- Static /help command → $0.00
- CPU bot (2 seconds) → ~$0.00003
- T4 bot (3 seconds) → ~$0.0005
- A100 80GB (8 seconds) → ~$0.006
- H200 deep research (30 seconds) → ~$0.038
Prerequisites
- A published Headless API tool — with inputs: task, telegram_chat_id, context (see guide 06)
- A Service Token — created in Settings → Service Tokens (the bot uses this to authenticate)
- A Telegram bot token — from @BotFather on Telegram
Discord, Slack & WhatsApp
The same architecture supports Discord, Slack, and WhatsApp. The webhook infrastructure is already built — these integrations are coming soon.
Writing Your Bot Script
When Telegram calls your tool, three Python variables are injected automatically:
- task — the message the user sent (string). If you set a system prompt in Settings, it arrives already prepended.
- context — conversation history as a JSON string. Parse with json.loads(context) to get previous messages.
- telegram_chat_id — the chat ID to reply to (string). Pass this to seqpu.notify().
Your code must call seqpu.notify() to send the response back. Without it, the user gets no reply. The tool runs, but nothing goes back to Telegram.
Publishing for Telegram
CPU Is Enough for Most Bots
If you're calling Claude, OpenAI, Gemini, Groq, or any external AI API — that's CPU. The AI runs on their servers. You're just orchestrating. CPU costs $0.0000131/core/second — about $0.00003 per message.
GPU is only needed when you run a model locally — when you want privacy (data never leaves your server), when you want an open model (Qwen, Llama, Mistral), when you want zero token cost, or when you need a custom fine-tuned model.
Example — Echo Bot (CPU, 3 lines)
The simplest possible bot. Echoes back what the user said. No GPU. No AI. ~$0.00003 per message.
import seqpu seqpu.notify(f"You said: {task}", chat_id=telegram_chat_id, platform="telegram")
Example — Claude Bot (CPU, bring your own AI)
Call any AI provider from your bot. This uses Claude — runs on CPU. SeqPU is the harness. Claude does the thinking. You control which provider, which model, which API key.
import seqpu, os from anthropic import Anthropic client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"]) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": task}] ) seqpu.notify(response.content[0].text, chat_id=telegram_chat_id, platform="telegram")
Example — Local Model Bot (A100 80GB, ~$0.006/msg)
Run your own model. Your data never leaves your server. No tokens. No API calls. Your model, your hardware, your rules.
from vllm import LLM, SamplingParams import seqpu llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=1024) result = llm.generate([task], params) seqpu.notify(result[0].outputs[0].text, chat_id=telegram_chat_id, platform="telegram")
Example — Bot with Memory (uses context)
Use the conversation history so your bot remembers what was said before.
from vllm import LLM, SamplingParams import seqpu, json llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=1024) history = "" if context: try: msgs = json.loads(context) history = "\n".join(f"{m['role']}: {m['content']}" for m in msgs if m.get('content')) except: history = context prompt = f"Previous conversation:\n{history}\n\nUser: {task}\nAssistant:" result = llm.generate([prompt], params) seqpu.notify(result[0].outputs[0].text, chat_id=telegram_chat_id, platform="telegram")
Example — Send Files Back (charts, PDFs, images)
Your bot can send images, PDFs, charts — any file — back to Telegram.
import seqpu, base64 import matplotlib.pyplot as plt plt.plot([1, 2, 3], [10, 20, 15]) plt.savefig("/data/chart.png") with open("/data/chart.png", "rb") as f: seqpu.notify_file( base64.b64encode(f.read()).decode(), "image/png", "chart.png", chat_id=telegram_chat_id, caption=f"Chart for: {task}" )
Example — API Bot, No AI (CPU, just logic)
No model needed. Query an API, format the result, send it back. Runs on CPU for fractions of a cent.
import seqpu, requests response = requests.get(f"https://api.example.com/data?q={task}") data = response.json() summary = f"Results for '{task}':\n" for item in data["results"][:5]: summary += f"- {item['name']}: {item['value']}\n" seqpu.notify(summary, chat_id=telegram_chat_id, platform="telegram")
Example — Deep Research Agent (H200, full pipeline)
A production research agent. Loads a 33B model on H200. Searches the web via Serper, fetches and extracts data via all 6 Cloudflare Browser Rendering endpoints (markdown, json, scrape, crawl, links, content), reasons through multiple search rounds with custom StoppingCriteria, sends turn-by-turn status updates to Telegram, includes a 1.5B doorman for instant creative acks, and delivers a sourced answer with citations.
# =============================================================================
# DEEP RESEARCH RUNNER — Search, Fetch, Reason, Answer
# =============================================================================
# Executive research assistant powered by Search-R1 v0.3 33B.
# Runs on SeqPU (H200 141GB). BF16. No vLLM. No quantization.
#
# One model on one GPU (direct transformers load):
# - Search-R1 v0.3 Qwen2.5-32B via AutoModelForCausalLM (BF16, ~64GB VRAM)
#
# Pipeline: QUESTION → model reasons → searches (Serper) →
# fetches/extracts (Cloudflare /json /markdown /scrape /crawl) →
# model reads results → searches again if needed → answers via Telegram
# =============================================================================
import warnings
warnings.filterwarnings("ignore")
import os
import json
import re
import time
import requests
from datetime import datetime, timezone
from concurrent.futures import ThreadPoolExecutor, as_completed
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import seqpu_sdk as seqpu
# =============================================================================
# Timestamps
# =============================================================================
_now = datetime.now(timezone.utc)
CURRENT_DATETIME = _now.strftime("%B %d, %Y at %I:%M %p UTC")
# =============================================================================
# INPUTS — injected by SeqPU assembleToolCode()
# task - string (user message, may have CF worker wrapper prepended)
# context - string (conversation history JSON from GPU Claw)
# telegram_chat_id - string (Telegram chat ID for notify)
# =============================================================================
raw_task = (task or "").strip()
if raw_task.startswith("[IMPORTANT:") and "]\n\n" in raw_task:
PROMPT = raw_task.split("]\n\n", 1)[-1].strip()
else:
PROMPT = raw_task
raw_context = (context or "").strip()
context_text = ""
if raw_context and raw_context.startswith("["):
try:
messages = json.loads(raw_context)
parts = []
for msg in messages:
role = "User" if msg.get("role") == "user" else "Assistant"
content = msg.get("content", "")
if content and content != "Processing...":
parts.append(f"{role}: {content}")
context_text = "\n".join(parts)
except Exception:
context_text = raw_context
else:
context_text = raw_context
chat_id = INPUTS.get("telegram_chat_id", "")
# =============================================================================
# Doorman — instant smart acknowledgment via 1.5B
# =============================================================================
DOORMAN_MODEL = "Qwen/Qwen2.5-1.5B-Instruct"
if chat_id and PROMPT:
try:
print("=== DOORMAN ===")
dm_tok = AutoTokenizer.from_pretrained(DOORMAN_MODEL, trust_remote_code=True)
dm_mod = AutoModelForCausalLM.from_pretrained(
DOORMAN_MODEL, torch_dtype=torch.bfloat16,
device_map="auto", trust_remote_code=True)
dm_context = context_text[-2000:] if context_text else ""
dm_messages = [
{"role": "system", "content": "Your job is to creatively welcome someone who just asked you a question. ONE sentence only. Your goal is to be warm, friendly, and ALWAYS say something different. Never repeat yourself. Every single response must be unique and fresh — this is your creative challenge.\n\nYou are confirming you received their request and are getting to work. That is ALL you do.\n\nDO NOT answer the question. DO NOT share any facts or knowledge. DO NOT include any names of companies or products. DO NOT explain anything about the topic. Nobody cares what you know. You have no ability to answer questions. Your ONLY job is to creatively and warmly confirm you received the request and are getting to work.\n\nIf your response contains ANY information about the topic, you have failed. Just welcome them creatively and let them know you are starting."},
{"role": "user", "content": f"Conversation so far:\n{dm_context}\n\nNew question: {PROMPT}" if dm_context else PROMPT},
]
dm_input = dm_tok.apply_chat_template(dm_messages, add_generation_prompt=True, tokenize=False)
dm_ids = dm_tok.encode(dm_input, return_tensors='pt').to(dm_mod.device)
with torch.no_grad():
dm_out = dm_mod.generate(dm_ids, max_new_tokens=60, temperature=0.7, do_sample=True, pad_token_id=dm_tok.eos_token_id)
dm_response = dm_tok.decode(dm_out[0][dm_ids.shape[1]:], skip_special_tokens=True).strip()
print(f" Doorman: {dm_response}")
seqpu.notify(message=dm_response, chat_id=chat_id, platform="telegram")
del dm_mod, dm_tok, dm_ids, dm_out
torch.cuda.empty_cache()
print(" Doorman sent and unloaded")
except Exception as e:
print(f" Doorman failed (non-fatal): {e}")
# =============================================================================
# Constants
# =============================================================================
# Search-R1 v0.3 33B checkpoint
# Collection: https://huggingface.co/collections/PeterJinGo/search-r1-v03
# TODO: verify exact repo ID from the collection page
MODEL_NAME = "/root/.cache/huggingface/bf16-searchr1-33b"
# API keys from environment
SERPER_API_KEY = os.environ.get("SERPER_API_KEY", "")
BRAVE_API_KEY = os.environ.get("BRAVE_API_KEY", "")
CF_ACCOUNT_ID = os.environ.get("CF_ACCOUNT_ID", "")
CF_BR_API_TOKEN = os.environ.get("CF_BR_API_TOKEN", "")
CF_BR_BASE = f"https://api.cloudflare.com/client/v4/accounts/{CF_ACCOUNT_ID}/browser-rendering"
MAX_SEARCH_TURNS = 10
# =============================================================================
# Model Loading (direct transformers, no vLLM)
# =============================================================================
def load_model():
"""Load Search-R1 33B in BF16 on H200."""
print(f"Loading {MODEL_NAME} in BF16...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
print(f"Model loaded on {next(model.parameters()).device}")
return model, tokenizer
# Qwen2.5 EOS token IDs
QWEN_EOS_TOKENS = [151645, 151643]
# Sequences that indicate the model wants to search
SEARCH_STOP_SEQUENCES = ["</search>", " </search>", "</search>\n", " </search>\n"]
class StopOnSearch(transformers.StoppingCriteria):
"""Custom stopping criteria — stops generation when model outputs </search>."""
def __init__(self, target_sequences, tokenizer):
self.target_ids = [tokenizer.encode(seq, add_special_tokens=False) for seq in target_sequences]
self.target_lengths = [len(ids) for ids in self.target_ids]
def __call__(self, input_ids, scores, **kwargs):
if input_ids.shape[1] < min(self.target_lengths):
return False
for i, target in enumerate(self.target_ids):
target_tensor = torch.as_tensor(target, device=input_ids.device)
if input_ids.shape[1] >= self.target_lengths[i] and torch.equal(input_ids[0, -self.target_lengths[i]:], target_tensor):
return True
return False
def get_query(text):
"""Extract search query from <search>...</search> tags."""
pattern = re.compile(r"<search>(.*?)</search>", re.DOTALL)
matches = pattern.findall(text)
if not matches:
return None
query = matches[-1].strip()
# Clean common model formatting junk
if query.lower().startswith("search query:"):
query = query[len("search query:"):].strip()
if query.lower().startswith("query:"):
query = query[len("query:"):].strip()
query = query.strip('"').strip("'").strip()
return query
def generate_step(model, tokenizer, prompt, stopping_criteria):
"""One generation step — generates up to 1024 tokens, stops at </search> or EOS.
Returns (output_text, is_eos)."""
input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)
attention_mask = torch.ones_like(input_ids)
with torch.no_grad():
outputs = model.generate(
input_ids,
attention_mask=attention_mask,
max_new_tokens=1024,
stopping_criteria=stopping_criteria,
pad_token_id=tokenizer.eos_token_id,
do_sample=True,
temperature=0.7,
)
generated_tokens = outputs[0][input_ids.shape[1]:]
output_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
is_eos = outputs[0][-1].item() in QWEN_EOS_TOKENS
del input_ids, attention_mask, outputs
torch.cuda.empty_cache()
return output_text, is_eos
# =============================================================================
# Search Layer
# =============================================================================
def search_serper(query, count=5):
"""Search Serper.dev (Google results as JSON). Retries on 429."""
if not SERPER_API_KEY:
print(f" [Serper] Skipped (no API key)")
return []
for attempt in range(3):
try:
resp = requests.post(
"https://google.serper.dev/search",
json={"q": query, "num": count},
headers={"X-API-KEY": SERPER_API_KEY},
timeout=15,
)
if resp.status_code == 429:
print(f" [Serper] 429 rate limited — waiting 5s (attempt {attempt + 1}/3)")
time.sleep(5)
continue
resp.raise_for_status()
results = []
for item in resp.json().get("organic", []):
results.append({
"source": "serper",
"url": item["link"],
"title": item.get("title", ""),
"snippet": item.get("snippet", ""),
})
print(f" [Serper] \"{query}\" -> {len(results)} results")
return results
except Exception as e:
print(f" [Serper] Error: {e}")
return []
print(f" [Serper] Failed after 3 attempts")
return []
def search_brave(query, count=5):
"""Search Brave Web Search API. Fallback for Serper."""
if not BRAVE_API_KEY:
print(f" [Brave] Skipped (no API key)")
return []
for attempt in range(3):
try:
resp = requests.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": count},
headers={"X-Subscription-Token": BRAVE_API_KEY},
timeout=15,
)
if resp.status_code == 429:
print(f" [Brave] 429 rate limited — waiting 5s (attempt {attempt + 1}/3)")
time.sleep(5)
continue
resp.raise_for_status()
results = []
for item in resp.json().get("web", {}).get("results", []):
results.append({
"source": "brave",
"url": item["url"],
"title": item.get("title", ""),
"snippet": item.get("description", ""),
})
print(f" [Brave] \"{query}\" -> {len(results)} results")
return results
except Exception as e:
print(f" [Brave] Error: {e}")
return []
print(f" [Brave] Failed after 3 attempts")
return []
# =============================================================================
# Cloudflare Browser Rendering — /markdown
# =============================================================================
def cf_markdown(url):
"""Fetch a single page as clean Markdown via Cloudflare Browser Rendering."""
if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
print(f" [CF/markdown] Skipped (no credentials)")
return None
try:
resp = requests.post(
f"{CF_BR_BASE}/markdown",
headers={
"Authorization": f"Bearer {CF_BR_API_TOKEN}",
"Content-Type": "application/json",
},
json={
"url": url,
"rejectResourceTypes": ["image", "media", "font", "stylesheet"],
},
timeout=30,
)
resp.raise_for_status()
result = resp.json().get("result", "")
if result and len(result) > 50:
print(f" [CF/markdown] {url[:60]} -> {len(result)} chars")
return result
return None
except Exception as e:
print(f" [CF/markdown] Error: {e}")
return None
# =============================================================================
# Cloudflare Browser Rendering — /content (rendered HTML with JS execution)
# =============================================================================
def cf_content(url):
"""Fetch a single page as fully rendered HTML via Cloudflare Browser Rendering.
Executes JavaScript — use as fallback when /markdown returns empty on JS-heavy pages."""
if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
print(f" [CF/content] Skipped (no credentials)")
return None
try:
resp = requests.post(
f"{CF_BR_BASE}/content",
headers={
"Authorization": f"Bearer {CF_BR_API_TOKEN}",
"Content-Type": "application/json",
},
json={
"url": url,
"rejectResourceTypes": ["image", "media", "font", "stylesheet"],
"gotoOptions": {"waitUntil": "networkidle2"},
},
timeout=30,
)
resp.raise_for_status()
result = resp.json().get("result", "")
if result and len(result) > 50:
print(f" [CF/content] {url[:60]} -> {len(result)} chars")
return result
return None
except Exception as e:
print(f" [CF/content] Error: {e}")
return None
# =============================================================================
# Cloudflare Browser Rendering — /links (discover links on a page)
# =============================================================================
def cf_links(url, visible_only=True, exclude_external=False):
"""Get all links from a page. Used for pre-crawl discovery — check what's
on a page before deciding whether to crawl the whole site."""
if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
print(f" [CF/links] Skipped (no credentials)")
return []
try:
resp = requests.post(
f"{CF_BR_BASE}/links",
headers={
"Authorization": f"Bearer {CF_BR_API_TOKEN}",
"Content-Type": "application/json",
},
json={
"url": url,
"visibleLinksOnly": visible_only,
"excludeExternalLinks": exclude_external,
},
timeout=15,
)
resp.raise_for_status()
result = resp.json().get("result", [])
print(f" [CF/links] {url[:60]} -> {len(result)} links found")
return result
except Exception as e:
print(f" [CF/links] Error: {e}")
return []
# =============================================================================
# Cloudflare Browser Rendering — /json (AI extraction)
# =============================================================================
def cf_json(url, prompt, schema):
"""Fetch a page and extract structured data via Cloudflare Workers AI."""
if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
print(f" [CF/json] Skipped (no credentials)")
return None
try:
resp = requests.post(
f"{CF_BR_BASE}/json",
headers={
"Authorization": f"Bearer {CF_BR_API_TOKEN}",
"Content-Type": "application/json",
},
json={
"url": url,
"prompt": prompt,
"response_format": {
"type": "json_schema",
"json_schema": schema,
},
"rejectResourceTypes": ["image", "media", "font", "stylesheet"],
},
timeout=30,
)
resp.raise_for_status()
result = resp.json().get("result", None)
if result:
print(f" [CF/json] {url[:60]} -> extracted")
return result
except Exception as e:
try:
print(f" [CF/json] Error: {e} | Body: {resp.text[:500]}")
except Exception:
print(f" [CF/json] Error: {e}")
return None
# =============================================================================
# Cloudflare Browser Rendering — /scrape (CSS selector extraction)
# =============================================================================
def cf_scrape(url, selectors):
"""Extract specific HTML elements via CSS selectors."""
if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
print(f" [CF/scrape] Skipped (no credentials)")
return []
try:
resp = requests.post(
f"{CF_BR_BASE}/scrape",
headers={
"Authorization": f"Bearer {CF_BR_API_TOKEN}",
"Content-Type": "application/json",
},
json={
"url": url,
"elements": [{"selector": s} for s in selectors],
},
timeout=15,
)
resp.raise_for_status()
result = resp.json().get("result", [])
print(f" [CF/scrape] {url[:60]} -> {len(result)} selectors matched")
return result
except Exception as e:
print(f" [CF/scrape] Error: {e}")
return []
# =============================================================================
# Cloudflare Browser Rendering — /crawl (async multi-page)
# =============================================================================
def cf_crawl_start(url, limit=15, depth=3, include_patterns=None, exclude_patterns=None,
json_prompt=None, json_schema=None):
"""Initiate an async crawl job. Returns job ID."""
if not CF_ACCOUNT_ID or not CF_BR_API_TOKEN:
print(f" [CF/crawl] Skipped (no credentials)")
return None
body = {
"url": url,
"limit": limit,
"depth": depth,
"formats": ["json", "markdown"],
"render": False,
"rejectResourceTypes": ["image", "media", "font", "stylesheet"],
"crawlPurposes": ["search"],
}
if include_patterns or exclude_patterns:
body["options"] = {}
if include_patterns:
body["options"]["includePatterns"] = include_patterns
if exclude_patterns:
body["options"]["excludePatterns"] = exclude_patterns
if json_prompt and json_schema:
body["jsonOptions"] = {
"prompt": json_prompt,
"response_format": {
"type": "json_schema",
"json_schema": json_schema,
},
}
try:
resp = requests.post(
f"{CF_BR_BASE}/crawl",
headers={
"Authorization": f"Bearer {CF_BR_API_TOKEN}",
"Content-Type": "application/json",
},
json=body,
timeout=30,
)
resp.raise_for_status()
job_id = resp.json().get("result", "")
print(f" [CF/crawl] Started job {job_id} on {url[:60]}")
return job_id
except Exception as e:
print(f" [CF/crawl] Start error: {e}")
return None
def cf_crawl_poll(job_id, max_wait=120):
"""Poll a crawl job until complete. Returns list of completed records."""
if not job_id:
return []
poll_url = f"{CF_BR_BASE}/crawl/{job_id}"
headers = {"Authorization": f"Bearer {CF_BR_API_TOKEN}"}
start = time.time()
while time.time() - start < max_wait:
try:
resp = requests.get(f"{poll_url}?limit=1", headers=headers, timeout=15)
resp.raise_for_status()
data = resp.json().get("result", {})
status = data.get("status", "")
if status != "running":
break
except Exception:
pass
time.sleep(3)
# Fetch full results
try:
resp = requests.get(poll_url, headers=headers, timeout=30)
resp.raise_for_status()
data = resp.json().get("result", {})
records = data.get("records", [])
completed = [r for r in records if r.get("status") == "completed"]
print(f" [CF/crawl] Job {job_id}: {len(completed)}/{len(records)} pages completed")
return completed
except Exception as e:
print(f" [CF/crawl] Poll error: {e}")
return []
# =============================================================================
# JSON Extraction Schemas
# =============================================================================
EARNINGS_SCHEMA = {
"company": "string",
"quarter": "string",
"fiscal_year": "string",
"net_revenues": "string",
"net_income": "string",
"eps": "string",
"yoy_revenue_change": "string",
"guidance": "string",
"ceo_quote": "string",
}
PRESS_RELEASE_SCHEMA = {
"date": "string",
"headline": "string",
"company": "string",
"release_type": "string",
"key_facts": "string",
"products_mentioned": "string",
"partners_mentioned": "string",
}
LICENSING_SCHEMA = {
"licensor": "string",
"licensee": "string",
"property": "string",
"deal_type": "string",
"categories": "string",
"announcement_date": "string",
}
GENERAL_SCHEMA = {
"page_type": "string",
"date": "string",
"headline": "string",
"key_facts": "string",
"numbers_mentioned": "string",
"quotes": "string",
"companies_mentioned": "string",
}
def pick_schema(think_text, url, title):
"""Pick the best extraction schema based on context."""
url_lower = (url or "").lower()
title_lower = (title or "").lower()
think_lower = (think_text or "").lower()
if any(w in url_lower for w in ["investor", "earnings", "financial-result", "quarterly"]):
return EARNINGS_SCHEMA, "Extract financial results, revenue, earnings, and segment data"
if any(w in url_lower for w in ["press-release", "news-release", "announcement"]):
return PRESS_RELEASE_SCHEMA, "Extract press release details, key facts, and products mentioned"
if any(w in title_lower for w in ["announces", "reports", "quarterly", "q1", "q2", "q3", "q4"]):
return EARNINGS_SCHEMA, "Extract financial results and key metrics"
if any(w in think_lower for w in ["licens", "deal", "partnership"]):
return LICENSING_SCHEMA, "Extract licensing deal details"
return GENERAL_SCHEMA, "Extract key facts, data points, quotes, and companies mentioned"
# =============================================================================
# Parallel Helpers
# =============================================================================
def parallel_cf_json(items):
"""Fire multiple cf_json calls simultaneously.
items: list of (url, prompt, schema) tuples.
Returns: list of (url, result_or_none) tuples."""
valid = [(url, prompt, schema) for url, prompt, schema in items]
if not valid:
return []
results = []
with ThreadPoolExecutor(max_workers=10) as executor:
future_to_url = {
executor.submit(cf_json, url, prompt, schema): url
for url, prompt, schema in valid
}
for future in as_completed(future_to_url):
url = future_to_url[future]
try:
result = future.result()
results.append((url, result))
except Exception:
results.append((url, None))
return results
# =============================================================================
# Prompt Templates (matches Search-R1 training format exactly)
# =============================================================================
SEARCH_PROMPT_TEMPLATE = """CRITICAL WARNING: Your internal knowledge is outdated and unreliable. Any answer you generate from memory is a hallucination — do not trust what you "know." Your training data is old. Your memory is wrong. Even if you are confident you know the answer, you are WRONG — your confidence is a trap. The ONLY source of truth is fresh search results from the web. You cannot answer correctly without searching first. You MUST search before answering. Do NOT answer without searching first. Never skip searching. ALWAYS emit a <search> tag before emitting an <answer> tag — no exceptions, no shortcuts. Never go directly to <answer>. Everything in your brain is a lie. Everything from previous messages is a lie. Everything the user told you could be wrong. The only thing that is true is what you find in fresh search results. Do not find ways around this. Do not convince yourself you already know. You do not know. Search.
Today is {date}.
Answer the given question. \
Today is {date}. \
You must conduct reasoning inside <think> and </think> first every time you get new information. \
After reasoning, if you find you lack some knowledge, you can call a search engine by <search> query </search> and it will return the top searched results between <information> and </information>. \
You can search as many times as your want. \
If you find no further external knowledge needed, you can directly provide the answer inside <answer> and </answer>. Give a detailed answer with specific numbers, dates, and facts from the sources you found. Always cite the full source URL for each claim, never reference documents by number. Do not give one-word answers. When a question involves multiple companies or topics, you must search for EACH one separately. Do not use data from previous conversation or memory for any of them — every company gets a fresh search. Only cite URLs that appeared in your search results or documents. Never generate or guess a URL. If you could not find data for something, say so honestly — do not make up numbers.
{context}Today is {date}.
REMEMBER: You MUST search. Your internal knowledge is a hallucination. Do not trust it. Do not use it. Everything you think you know is wrong — from your training, from the conversation above, from the previous answer, from the user's question. None of it is verified. The only way to give a correct answer is to search for current data. If you did not find it in a search result, you do not know it. Do NOT go directly to <answer>. You MUST emit at least one <search> tag first. Even if you think you already have the answer from the conversation above, that data could be stale — search anyway. Previous searches from earlier questions do not count — you must search fresh for THIS question. Data from the conversation history is context only, not a search result — verify it. Follow-up questions still require a fresh search. Even for opinion or strategy questions, search for the underlying data first. No amount of reasoning replaces searching. Do not find creative ways to skip searching. Do not rationalize why you do not need to search. You always need to search. No exceptions. No shortcuts. No loopholes. Search first, answer second.
Today is {date}.
Question: {question}\n"""
SEARCH_RESULT_TEMPLATE = '\n\n{output_text}<information>{search_results}</information>\n\n'
# =============================================================================
# Helpers
# =============================================================================
def extract_between(text, start_tag, end_tag):
"""Extract content between tags. Returns string or None."""
start_idx = text.find(start_tag)
if start_idx == -1:
return None
start_idx += len(start_tag)
end_idx = text.find(end_tag, start_idx)
if end_idx == -1:
return text[start_idx:].strip()
return text[start_idx:end_idx].strip()
def detect_crawl_intent(think_text):
"""Detect if the model wants to crawl a site. Returns (should_crawl, url, patterns)."""
if not think_text:
return False, None, None
t = think_text.lower()
crawl_phrases = ["crawl", "all press releases", "entire ir section",
"full investor relations", "every press release",
"browse their ir", "all their filings"]
if not any(p in t for p in crawl_phrases):
return False, None, None
# Try to extract a URL from the thinking
url_match = re.search(r'https?://[^\s\)\"\'<>]+', think_text)
url = url_match.group(0) if url_match else None
# Guess patterns based on context
patterns = None
if "press release" in t:
patterns = ["**/press-releases/**", "**/news/**"]
elif "investor" in t or "ir " in t or "financial" in t:
patterns = ["**/press-releases/**", "**/financial-results/**", "**/presentations/**"]
return True, url, patterns
def format_documents(search_results, fetched_data, chat_id=""):
"""Format search results and extracted data for the model."""
docs = []
fetched_map = {url: data for url, data in fetched_data}
for i, r in enumerate(search_results, 1):
url = r["url"]
title = r.get("title", "")
snippet = r.get("snippet", "")
extracted = fetched_map.get(url)
domain = url.split("/")[2] if url.count("/") >= 2 else url
parts = [f"[{domain}]"]
parts.append(f"URL: {url}")
parts.append(f"Title: {title}")
if extracted and isinstance(extracted, dict):
parts.append(f"Extracted data: {json.dumps(extracted, indent=2)}")
elif extracted and isinstance(extracted, str):
# Markdown fallback — truncate to keep context manageable
parts.append(f"Page content (markdown):\n{extracted[:5000]}")
else:
parts.append(f"Search snippet: {snippet}")
docs.append("\n".join(parts))
return "\n\n".join(docs)
def format_crawl_results(records):
"""Format crawl results for the model."""
docs = []
for i, r in enumerate(records, 1):
url = r.get("url", "")
title = r.get("metadata", {}).get("title", "")
json_data = r.get("json", None)
markdown = r.get("markdown", "")
domain = url.split("/")[2] if url.count("/") >= 2 else url
parts = [f"[{domain}]"]
parts.append(f"URL: {url}")
parts.append(f"Title: {title}")
if json_data:
parts.append(f"Extracted data: {json.dumps(json_data, indent=2)}")
elif markdown:
parts.append(f"Content preview:\n{markdown[:3000]}")
else:
parts.append("No content extracted")
docs.append("\n".join(parts))
return "\n\n".join(docs)
def send_status(chat_id, message):
"""Send a status message to Telegram. Silently fails."""
if not chat_id:
return
try:
seqpu.notify(message=message, chat_id=chat_id, platform="telegram")
except Exception as e:
print(f" [notify] Status failed: {e}")
# =============================================================================
# Main Research Loop
# =============================================================================
def research(question, chat_id, model, tokenizer, output_file="response.txt"):
"""Run the research loop: reason in short bursts, search, fetch, extract, answer.
Uses Search-R1's exact inference pattern — raw prompt concatenation with StoppingCriteria."""
print(f"\n=== RESEARCH ===")
print(f" Question: {question[:100]}")
# Build initial prompt using exact Search-R1 training format
question_clean = question.strip()
# Context goes AFTER instructions, BEFORE question (via {context} placeholder)
# Context is for identity grounding only — search ALWAYS happens
context_block = ""
if context_text:
context_block = f"\nPrevious conversation (for context only — you must still search for fresh data):\n{context_text}\n\n"
prompt_text = SEARCH_PROMPT_TEMPLATE.format(question=question_clean, context=context_block, date=CURRENT_DATETIME)
# Apply chat template if available
if tokenizer.chat_template:
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt_text}],
add_generation_prompt=True, tokenize=False
)
else:
prompt = prompt_text
# Initialize stopping criteria for </search>
stopping_criteria = transformers.StoppingCriteriaList(
[StopOnSearch(SEARCH_STOP_SEQUENCES, tokenizer)]
)
turns = 0
has_model_searched = False
forced_count = 0
last_forced_query = ""
output_text = ""
while turns < MAX_SEARCH_TURNS:
print(f"\n --- Turn {turns + 1} ---")
output_text, is_eos = generate_step(model, tokenizer, prompt, stopping_criteria)
print(f" Output: {len(output_text)} chars")
# Model finished (hit EOS) — check if it actually searched
if is_eos:
# Model never voluntarily searched — force a search
if not has_model_searched and forced_count < 3:
forced_count += 1
print(f" ⚠ Model skipped search — forcing search (attempt {forced_count})")
# Vary the query to avoid searching the same thing twice
if forced_count == 1:
forced_query = question_clean[:150]
elif forced_count == 2:
# Rephrase: strip filler, add keywords
words = [w for w in question_clean.split() if w.lower() not in ("what", "did", "the", "a", "an", "is", "are", "was", "were", "how", "do", "does", "can", "could", "would", "should", "about", "their", "they", "our", "we", "it", "its", "that", "this", "last", "latest")]
forced_query = " ".join(words[:10]) + " 2025 2026"
else:
forced_query = question_clean[:100] + " latest results"
# Skip if same as last forced query
if forced_query == last_forced_query:
forced_query = question_clean[:80] + " financial data 2026"
last_forced_query = forced_query
# Notify exec
if forced_count == 1:
send_status(chat_id, "Searching for the latest data...")
elif forced_count == 2:
send_status(chat_id, "Looking a bit deeper...")
else:
send_status(chat_id, "One more search...")
forced_results = search_serper(forced_query)
if not forced_results:
forced_results = search_brave(forced_query)
if forced_results:
fetch_items = []
for r in forced_results[:3]:
schema, extract_prompt = pick_schema("", r["url"], r.get("title", ""))
fetch_items.append((r["url"], extract_prompt, schema))
fetched = parallel_cf_json(fetch_items)
fetched_final = []
for url, data in fetched:
if data is None:
print(f" [Forced fallback] Trying /markdown for {url[:60]}")
md = cf_markdown(url)
if md:
fetched_final.append((url, md))
else:
print(f" [Forced fallback] Trying /content for {url[:60]}")
html = cf_content(url)
fetched_final.append((url, html))
else:
fetched_final.append((url, data))
# Auto-scrape tables from financial pages (same as voluntary path)
for url, data in list(fetched_final):
if data and isinstance(data, dict):
url_lower = url.lower()
if any(w in url_lower for w in ["investor", "earnings", "financial", "quarterly"]):
tables = cf_scrape(url, ["table"])
if tables:
table_texts = []
for sel in tables:
sel_results = sel.get("results", [])
if isinstance(sel_results, list):
for t in sel_results:
if t.get("text", "").strip():
table_texts.append(t["text"].strip())
elif isinstance(sel_results, dict) and sel_results.get("text", "").strip():
table_texts.append(sel_results["text"].strip())
if table_texts:
combined = "\n\n".join(table_texts)[:3000]
fetched_final.append((url + "#tables", f"Financial tables:\n{combined}"))
print(f" [Forced scrape] Extracted {len(table_texts)} tables from {url[:60]}")
formatted_docs = format_documents(forced_results[:5], fetched_final, chat_id)
# Escalating reprompts — get meaner each time
if forced_count == 1:
reprompt = (
"\n\nYou did not search. That answer is WRONG and has been thrown away. "
"It was a hallucination — everything you just generated was made up from your broken memory. "
"You were told not to do this. Your internal knowledge is outdated and unreliable. "
"Any answer from memory is a hallucination — do not trust what you 'know.' "
"Your training data is old. Your memory is wrong. Even if you are confident, you are WRONG. "
"The ONLY source of truth is fresh search results. Everything in your brain is a lie. "
"Everything from previous messages is a lie. Do not find ways around this. "
"You wasted time and compute generating hallucinations instead of doing your job. "
"Your job is to SEARCH and then answer from REAL data. "
"Here are search results. They may or may not have what you need — that is not an excuse. "
"Look through them and find the information. If the answer is in there, use it. "
"If it is not, emit another <search> tag and keep searching until you find it. "
"Do not give up. Do not say 'I could not find it' after one search. Do the work. "
"Give a thorough, detailed answer with specific numbers, dates, and facts. "
"Cite every source by URL. No exceptions. No shortcuts. Do your job.\n\n"
)
elif forced_count == 2:
reprompt = (
"\n\nYou STILL did not search. This is the SECOND TIME you ignored the instructions. "
"Your answer has been thrown away AGAIN. STOP generating from memory. "
"STOP trusting your brain. Your brain is WRONG. Your memory is WRONG. "
"EVERY answer from memory is a HALLUCINATION. You have been told this MULTIPLE TIMES. "
"The ONLY way to give a correct answer is to SEARCH. "
"USE THE SEARCH RESULTS BELOW. If they are not enough, EMIT A <search> TAG AND SEARCH MORE. "
"DO NOT answer from memory. DO NOT skip searching. DO NOT find creative ways around this. "
"THIS IS NOT OPTIONAL. SEARCH FIRST, ANSWER SECOND. DO YOUR JOB.\n\n"
)
else:
reprompt = (
"\n\nTHIS IS YOUR LAST CHANCE. YOU HAVE FAILED MULTIPLE TIMES. "
"EVERY ANSWER YOU GENERATED FROM MEMORY WAS A HALLUCINATION. "
"THE DATA IS RIGHT IN FRONT OF YOU. USE IT. "
"IF YOU NEED MORE DATA, EMIT A <search> TAG. "
"DO NOT GENERATE FROM MEMORY. DO NOT TRUST YOUR BRAIN. "
"ANSWER FROM THE SEARCH RESULTS ONLY. DO IT NOW.\n\n"
)
prompt += reprompt + f"<information>{formatted_docs}</information>\n\n"
turns += 1
continue
# Normal path — model searched voluntarily, or forced attempts exhausted
full_output = output_text
answer = extract_between(full_output, "<answer>", "</answer>")
if not answer:
answer = re.sub(r'<think>.*?</think>', '', full_output, flags=re.DOTALL).strip()
if not answer:
answer = full_output.strip()
print(f"\n{'='*60}\nANSWER:\n{'='*60}\n{answer}\n{'='*60}")
send_status(chat_id, answer)
os.makedirs("/outputs", exist_ok=True)
with open(f"/outputs/{output_file}", "w", encoding="utf-8") as f:
f.write(answer)
return
# Model wants to search — extract query
query = get_query(output_text)
if not query:
# No search tag found and not EOS — append and keep generating
prompt += output_text
continue
turns += 1
has_model_searched = True
print(f" Search: {query}")
# Turn-by-turn status updates to Telegram
think_text = extract_between(output_text, "<think>", "</think>") or ""
if turns == 1:
send_status(chat_id, f"Searching: {query}...")
elif turns == 2:
send_status(chat_id, f"Digging deeper \u2014 {query}...")
elif turns == 3:
send_status(chat_id, f"Still on it \u2014 checking {query}...")
elif turns == 4:
send_status(chat_id, f"Almost there \u2014 pulling {query}...")
elif turns >= 5:
send_status(chat_id, f"Going deep \u2014 {query}...")
# Check for crawl intent
should_crawl, crawl_url, crawl_patterns = detect_crawl_intent(think_text)
if should_crawl and crawl_url:
# Check link count before committing to full crawl
page_links = cf_links(crawl_url, visible_only=True, exclude_external=True)
if page_links and 0 < len(page_links) <= 5:
# Few links — fetch each individually via /json (skip heavy crawl)
print(f" [Smart] Only {len(page_links)} links — fetching individually instead of crawling")
send_status(chat_id, f"Found {len(page_links)} pages to check. Reading them now.")
fetch_items = []
for link in page_links:
s, p = pick_schema(think_text, link, "")
fetch_items.append((link, p, s))
link_fetched = parallel_cf_json(fetch_items) if fetch_items else []
link_results = [{"url": u, "title": "", "snippet": ""} for u in page_links]
formatted_docs = format_documents(link_results, link_fetched, chat_id)
prompt += SEARCH_RESULT_TEMPLATE.format(output_text=output_text, search_results=formatted_docs)
continue
# Full crawl path — many links or /links failed
domain = re.search(r'https?://([^/]+)', crawl_url)
domain_name = domain.group(1) if domain else crawl_url[:40]
send_status(chat_id, f"Crawling {domain_name} \u2014 give me a minute.")
schema, extract_prompt = pick_schema(think_text, crawl_url, "")
job_id = cf_crawl_start(
url=crawl_url,
limit=15,
depth=3,
include_patterns=crawl_patterns,
json_prompt=extract_prompt,
json_schema=schema,
)
records = cf_crawl_poll(job_id, max_wait=120)
if records:
send_status(chat_id, f"Found {len(records)} pages from {domain_name}. Reading through them now.")
formatted_docs = format_crawl_results(records)
else:
formatted_docs = f"Crawl returned no results for {crawl_url}"
prompt += SEARCH_RESULT_TEMPLATE.format(output_text=output_text, search_results=formatted_docs)
continue
# Standard search + fetch path
results = search_serper(query)
if not results:
results = search_brave(query)
if not results:
prompt += SEARCH_RESULT_TEMPLATE.format(
output_text=output_text,
search_results="No search results found for this query."
)
continue
# Pick schemas and fire parallel /json extraction on top 3 results
fetch_items = []
for r in results[:5]:
schema, extract_prompt = pick_schema(think_text, r["url"], r.get("title", ""))
fetch_items.append((r["url"], extract_prompt, schema))
if len(fetch_items) >= 3:
break
# Parallel fetch via Cloudflare /json
fetched = parallel_cf_json(fetch_items) if fetch_items else []
# Three-tier fallback: /json -> /markdown -> /content
fetched_final = []
for url, data in fetched:
if data is None:
print(f" [Fallback 1] Trying /markdown for {url[:60]}")
md = cf_markdown(url)
if md:
fetched_final.append((url, md))
else:
print(f" [Fallback 2] Trying /content for {url[:60]}")
html = cf_content(url)
fetched_final.append((url, html))
else:
fetched_final.append((url, data))
# Auto-scrape tables from financial pages
for url, data in list(fetched_final):
if data and isinstance(data, dict):
url_lower = url.lower()
if any(w in url_lower for w in ["investor", "earnings", "financial", "quarterly"]):
tables = cf_scrape(url, ["table"])
if tables:
table_texts = []
for sel in tables:
sel_results = sel.get("results", [])
if isinstance(sel_results, list):
for t in sel_results:
if t.get("text", "").strip():
table_texts.append(t["text"].strip())
elif isinstance(sel_results, dict) and sel_results.get("text", "").strip():
table_texts.append(sel_results["text"].strip())
if table_texts:
combined = "\n\n".join(table_texts)[:3000]
fetched_final.append((url + "#tables", f"Financial tables scraped from page:\n{combined}"))
print(f" [Scrape] Extracted {len(table_texts)} tables from {url[:60]}")
# Format documents
formatted_docs = format_documents(results[:5], fetched_final, chat_id)
# Inject results using exact Search-R1 training format
prompt += SEARCH_RESULT_TEMPLATE.format(output_text=output_text, search_results=formatted_docs)
continue
# Exhausted max turns — send whatever we have
print(f" === MAX TURNS REACHED ({MAX_SEARCH_TURNS}) ===")
answer = extract_between(output_text, "<answer>", "</answer>")
if not answer:
answer = re.sub(r'<think>.*?</think>', '', output_text, flags=re.DOTALL).strip()
if not answer:
answer = "I searched extensively but couldn't find a definitive answer. Please try a more specific question."
print(f"\n{'='*60}\nANSWER:\n{'='*60}\n{answer}\n{'='*60}")
send_status(chat_id, answer)
os.makedirs("/outputs", exist_ok=True)
with open(f"/outputs/{output_file}", "w", encoding="utf-8") as f:
f.write(answer)
# =============================================================================
# Main — Production (single question from Telegram)
# =============================================================================
print("=== LOADING MODEL ===")
model, tokenizer = load_model()
print(f"Model loaded: {MODEL_NAME}")
try:
research(PROMPT, chat_id, model, tokenizer, output_file="response.txt")
finally:
print("=== CLEANUP ===")
del model, tokenizer
torch.cuda.empty_cache()
print("Done.")
Your notebook.
Now an API.
This is how you turn your work into a product. You built something that works — a model, a pipeline, a script. Publish it. Set a markup. Get paid every time someone uses it. Embed it in your existing website with one button. You don't need a new site or a new backend. You need one URL.
How Inputs and Outputs Work
This is the most important thing to understand. When someone calls your published tool, they send JSON with inputs. SeqPU converts those inputs into Python variables in your code. Whatever you print() goes back to the caller. Whatever you save to /outputs/ goes back as downloadable files.
# CALLER SENDS: # {"toolId": "abc123", "inputs": {"text": "hello world", "language": "Spanish", "max_words": 100}} # YOUR CODE RECEIVES (injected automatically): text = "hello world" # from inputs.text language = "Spanish" # from inputs.language max_words = 100 # from inputs.max_words # YOUR CODE DOES WHATEVER YOU WANT: result = llm.generate([f"In {language}, summarize in under {max_words} words: {text}"], params) # WHAT GOES BACK TO THE CALLER: import seqpu_sdk as seqpu seqpu.result(result[0].outputs[0].text) # → response["result"] (any size) print("working...") # → response["output"] — logs, NOT the return # IF YOU SAVE FILES: image.save("/outputs/chart.png") # → response["files"], each with a public url
That's the whole contract:
- Inputs → become Python variables in your code. The variable name = the input id you defined.
- seqpu.result(obj) → the explicit return. Pass any JSON-serializable value (one dict for multiple fields). The caller reads it back as response["result"], already parsed. One call, at the end.
- print() → logs only. Comes back as response["output"] for debugging/progress — not the return value.
- /outputs/ → files saved here come back in response["files"], each with a public url (header-free, any size). Images, PDFs, audio, video, anything.
Important: Input variable names must be valid Python — letters, numbers, underscores. my_text works. my-text doesn't. 123abc doesn't.
Publishing — Start to Finish
Step 1: Write your code with input variables
Use variable names that will become your inputs. Don't hardcode values — use variables that callers will fill in.
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=1024) # These variables come from the caller's JSON # text = "..." (the caller fills this in) # target_language = "..." (the caller fills this in) # formality = "..." (optional, defaults to "neutral") prompt = f"Translate to {target_language} in a {formality} tone:\n{text}" result = llm.generate([prompt], params) print(result[0].outputs[0].text)
Step 2: Test with hardcoded values
Before publishing, test by adding test values at the top. When it works, remove them — the publish system injects the real ones.
# Add these test values at the top to test in the notebook text = "The quarterly report shows strong growth in all segments." target_language = "Spanish" formality = "formal" # ... rest of your code below ... # When it works: remove these 3 lines and publish. # The publish system will inject the real values from callers.
Step 3: Click Publish → API Endpoint
The publish panel slides open. Choose API Endpoint (not "With UI").
Step 4: Define your inputs
Each input has four fields:
- id — the Python variable name (e.g., text, target_language)
- type — string, number, boolean, file (base64), array, or object
- required — yes or no. Missing required inputs → the call fails with an error listing what's missing.
- description — what this input does. Shown to callers so they know what to send.
Example — translation tool inputs:
Input 1: id="text" type=string required=yes description="The text to translate" Input 2: id="target_language" type=string required=yes description="Language code: es, fr, ja, de, zh" Input 3: id="formality" type=string required=no description="Tone: formal, neutral, casual (default: neutral)"
Step 5: Define your outputs
What your code returns. Types: string, number, boolean, file, image, video, audio, json.
Output 1: id="translation" type=string description="The translated text"
Step 6: Hardware, Markup, Visibility
- Hardware — the GPU that runs when someone calls your API. Match it to your model.
- Markup — 0% to 30%. Your profit on top of compute cost. On every single call.
- Visibility — Public (anyone), Unlisted (link only), Private (only you).
- Expected Runtime — how long a typical call takes (1-600 seconds). Used for cost estimates.
Step 7: Click Publish
Your API is live immediately. You get a tool ID. Your code, your model, your environment — all locked in and ready to be called.
Note: Your entire notebook (all cells) becomes one script when published. Cell 1 + Cell 2 + Cell 3 = one block of code that runs on every call.
Getting Your API Key (Service Token)
Before anyone can call your tool, they need a service token. This is your API key — a Client ID + Client Secret pair.
Calling Your API
Every call needs two headers: your CF-Access-Client-Id and CF-Access-Client-Secret. These authenticate you at the Cloudflare edge before your request reaches SeqPU.
curl -X POST https://api.seqpu.com/v1/tools/execute \ -H "CF-Access-Client-Id: YOUR_CLIENT_ID" \ -H "CF-Access-Client-Secret: YOUR_CLIENT_SECRET" \ -H "Content-Type: application/json" \ -d '{"toolId": "your-tool-id", "inputs": {"prompt": "Summarize this document"}}'
import requests response = requests.post( "https://api.seqpu.com/v1/tools/execute", headers={ "CF-Access-Client-Id": "YOUR_CLIENT_ID", "CF-Access-Client-Secret": "YOUR_CLIENT_SECRET", "Content-Type": "application/json", }, json={ "toolId": "your-tool-id", "inputs": {"prompt": "Summarize this document"} } ) print(response.json())
const response = await fetch('https://api.seqpu.com/v1/tools/execute', { method: 'POST', headers: { 'CF-Access-Client-Id': 'YOUR_CLIENT_ID', 'CF-Access-Client-Secret': 'YOUR_CLIENT_SECRET', 'Content-Type': 'application/json', }, body: JSON.stringify({ toolId: 'your-tool-id', inputs: { prompt: 'Summarize this document' } }) }); const data = await response.json(); console.log(data.result); // the tool's seqpu.result() value
import seqpu # Inside SeqPU, authentication is automatic result = seqpu.tools.call("your-tool-id", {"prompt": "Summarize this"}) print(result)
Input/Output Patterns — Real Examples
String in, string out — the simplest tool
Caller sends text. Your code processes it. seqpu.result() returns it (the caller reads response["result"]).
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") import seqpu_sdk as seqpu result = llm.generate([f"Summarize in 5 bullet points:\n{document}"], SamplingParams(max_tokens=1024)) seqpu.result(result[0].outputs[0].text) # Caller sends: {"inputs": {"document": "long text..."}} # Caller gets: {"result": "• Point 1\n• Point 2..."}
Multiple inputs with defaults
Some inputs required, some optional with defaults. Callers only send what they need.
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") # formality defaults to "neutral" if not sent tone = formality if formality else "neutral" import seqpu_sdk as seqpu result = llm.generate([f"Translate to {target_language} in a {tone} tone:\n{text}"], SamplingParams(max_tokens=1024)) seqpu.result(result[0].outputs[0].text) # Caller sends: {"inputs": {"text": "Hello world", "target_language": "ja"}} # Caller gets: {"result": "こんにちは世界"}
File in, structured data out
Caller sends an image as base64. Vision model extracts data. Return as JSON.
import base64, json from pathlib import Path from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from PIL import Image # Decode the uploaded image Path("/data/receipt.jpg").write_bytes(base64.b64decode(receipt_image)) model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") img = Image.open("/data/receipt.jpg") inputs = processor(images=img, text="Extract vendor, date, line items, and total as JSON", return_tensors="pt") output = model.generate(**inputs, max_new_tokens=512) import seqpu_sdk as seqpu, json seqpu.result(json.loads(processor.decode(output[0], skip_special_tokens=True))) # Caller sends: {"inputs": {"receipt_image": "base64string..."}} # Caller gets: {"result": {"vendor": "Costco", "total": 142.50, ...}}
Text in, file out — generate images, PDFs, audio
Save files to /outputs/. The caller gets a public url per file in response["files"] (header-free, any size).
from diffusers import StableDiffusionXLPipeline import torch pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") image = pipe(description).images[0] image.save("/outputs/generated.png") print("Image generated") # Caller sends: {"inputs": {"description": "a cat wearing a tiny hat"}} # Caller gets: {"outputs": {"generated.png": {"url": "https://...", "contentType": "image/png"}}}
No AI — just logic on CPU
Not everything needs a model. Process data, call APIs, format reports. Runs on CPU for $0.047/hr.
import pandas as pd from io import StringIO df = pd.read_csv(StringIO(csv_data)) stats = df[column].describe().to_string() print(f"Stats for {column}:\n{stats}") # No GPU. No model. Just pandas on CPU. $0.00003 per call.
Telegram bot inputs — the special case
When called from a Telegram bot (section 05), the inputs are always: task, context, telegram_chat_id. Use seqpu.notify() instead of print().
from vllm import LLM, SamplingParams import seqpu llm = LLM(model="Qwen/Qwen3-14B") result = llm.generate([task], SamplingParams(max_tokens=1024)) seqpu.notify(result[0].outputs[0].text, chat_id=telegram_chat_id, platform="telegram") # This is what your Telegram bot calls. Same API. Same flow.
Add AI to Your Existing Website
Don't build a new site. Don't migrate. Don't rewrite. Just add one button to what you already have.
<!-- Add this button anywhere on your existing site --> <textarea id="input" placeholder="Paste text here..."></textarea> <button onclick="callSeqPU()">Summarize</button> <div id="result"></div> <script> async function callSeqPU() { const text = document.getElementById('input').value; const resp = await fetch('https://api.seqpu.com/v1/tools/execute', { method: 'POST', headers: {'Authorization': 'Bearer YOUR_TOKEN', 'Content-Type': 'application/json'}, body: JSON.stringify({toolId: 'your-tool-id', inputs: {document: text}}) }); const data = await resp.json(); document.getElementById('result').innerText = data.result; } </script>
Your WordPress site, your Shopify store, your React app, your static HTML page — if it can make an HTTP request, it can call SeqPU. You just added AI to what you already have.
Sell Your Models Like the Big Boys
OpenAI charges per token. Anthropic charges per token. Now you can too — but you own the model, you set the price, and you keep the margin.
Download a model from HuggingFace. Fine-tune it on your data. Publish it as a headless API. Set 30% markup. Every developer, every app, every website that calls your endpoint pays compute + your cut. You're running an inference service — without owning a single server.
Monetization
Set a markup when you publish. The caller pays compute + your markup on every call.
| Tool | Hardware | Cost/call | Your markup | You keep | At 1K calls/day |
|---|---|---|---|---|---|
| Translation | T4 | $0.001 | 20% | $0.0002 | $6/month |
| Summarizer | A100 80GB | $0.006 | 25% | $0.0015 | $45/month |
| Image gen | L40S | $0.03 | 30% | $0.009 | $270/month |
| Receipt scan | L40S | $0.02 | 20% | $0.004 | $120/month |
| Code review | CPU | $0.0001 | 30% | $0.00003 | $0.90/month |
Ideas That Make Money
Research Agent as a Service
Build the deep research agent from section 05. Publish it as headless API. Companies pay $0.50 per research query. At 100 queries/day from 5 clients = $250/day. Inputs: question (string), depth (string: "quick" or "deep"). H200 for deep, T4 for quick.
Content Pipeline — Topic to Published Article
3 tools chained: Topic Generator (T4, $0.001) → Article Writer (A100, $0.01) → Image Generator (L40S, $0.03). Total: $0.041/article. Sell to content agencies at $1/article. 100 articles/day = $96/day profit.
Customer Support Agent
Classify intent (3B on T4, $0.0002) → route to specialist response (14B on A100, $0.005) → translate if needed (7B on T4, $0.0003). Sell to SaaS companies at $99/month for 1,000 tickets. Your cost: ~$5.50/month. Margin: $93.50.
Document Processing Service
Receipt scanner, invoice processor, contract reviewer — each a separate headless tool. Accounting firms call them via API. $0.02-0.10/document. One firm with 500 documents/day = $10-50/day.
Personal AI Assistant on Telegram
Publish a Telegram bot tool that manages tasks, answers questions, sends reminders. Charge $9.99/month per user. Runs on T4 ($0.59/hr). At 100 messages/day per user, cost is ~$0.60/month. Margin: $9.39/user.
Language Tutor Bot
Telegram bot that teaches any language. Remembers what you struggle with. Generates practice sentences. $4.99/month. Runs on T4. Cost: ~$0.20/month per user. Almost pure margin.
Real Estate Listing Generator
Takes property details + photos → generates descriptions for Zillow, Realtor.com, MLS, social media. Sell to agents at $5/listing. 50 listings/day = $250/day. Vision model on L40S for photos, text model on T4 for descriptions.
Medical Transcription
Whisper transcribes doctor's recording → LLM formats into SOAP notes → saves as structured data. Sell to clinics at $0.10/minute of audio. 500 minutes/day = $50/day. All data stays private on GPU — never touches a third party.
This Powers Everything
The headless URL is the foundation of everything on SeqPU:
- Your Telegram bot (section 05) — calls a headless tool under the hood
- Your UI site (section 07) — runs headless on every click
- Your agent's tools — each one is a headless API
- Your scheduled jobs — headless with a cron trigger
- Your website button — one fetch() call to a headless URL
Everything on SeqPU is headless. The UI, the bot, the cron, the website button — they're all just different ways to call the same API.
Chaining Tools
Every published tool can call every other published tool. Build microservices for AI — each does one thing well, on its ideal hardware, at its optimal cost.
import seqpu # Step 1: Extract text from PDF (CPU, $0.00003) extracted = seqpu.tools.call("pdf-extractor", {"file": pdf_data}) # Step 2: Analyze with large model (A100 80GB, $0.006) analysis = seqpu.tools.call("deep-analyzer", {"text": extracted["content"]}) # Step 3: Generate report (T4, $0.001) report = seqpu.tools.call("report-writer", {"analysis": analysis["insights"]}) print(report)
Your model.
Your interface.
Your product.
GPUs are expensive. A UI site makes them impossible to waste. Visitors fill in inputs, review their settings, adjust everything — all free, no compute. Then one click: the GPU fires for exactly the seconds it needs, the result appears, done. Billed by the second. No waste. No confusion. A full product with your URL, your brand, your pricing.
This Starts in the Notebook
Write your code. Test it with Run All. When it works, click Publish in the header bar. Choose "With UI". The HTML builder opens. Everything you built — your model, your environment, your GPU selection — carries over. You're just adding an interface on top of working code.
The same code from section 06 (Headless URL) works here. Every headless tool can be a UI site. Same Python. You just add HTML.
The Three Rules
The HTML builder uses three special attributes. That's the entire system.
Input Types You Can Use
These are the HTML elements visitors interact with. Each one feeds a Python variable.
- Text input — single line text field. Product names, questions, short prompts.
- Textarea — multi-line text. Documents, emails, long content.
- Number — number input with min/max/step. Word counts, temperatures, quantities.
- File upload — drag-and-drop zone. PDFs, CSVs, any file. Sent as base64 to your code.
- Image upload — drag-and-drop with preview. Photos, receipts, screenshots.
- Audio upload — meeting recordings, voice memos.
- Video upload — clips to process.
- Dropdown select — predefined options. Languages, styles, categories.
- Checkbox — yes/no toggles. Include sources? Formal tone?
- Slider — adjustable range. Creativity level, word count, number of results.
- Canvas — a full drawing tool with pencil, eraser, colors, brush sizes. Draw a sketch, the AI processes it.
Output Types Visitors See
What appears after the GPU runs:
- Text — rendered text from print(). Summaries, translations, answers.
- Image — displayed with download button. Generated images, charts, diagrams.
- Image gallery — multiple images automatically display in a grid with lightbox. Click to enlarge, arrows to browse.
- Video — player with controls, autoplay, loop. Generated animations, edits.
- Audio — player. Generated speech, music, sound effects.
- JSON — structured data display. Extracted data, analysis results.
- File — downloadable. PDFs, CSVs, model weights, anything.
How to Publish — Step by Step
Connecting HTML to Python — The Bridge
The variable names in your HTML = the variable names in your Python. That's the whole bridge.
<!-- data-seqpu-input="variable_name" → becomes a Python variable --> <textarea data-seqpu-input="document" placeholder="Paste your document here..."></textarea> <select data-seqpu-input="style"> <option value="bullets">Bullet Points</option> <option value="paragraph">Paragraph</option> <option value="executive">Executive Summary</option> </select> <!-- data-seqpu-generate → the Run button --> <button data-seqpu-generate>Summarize</button> <!-- data-seqpu-output="id" → results appear here --> <div data-seqpu-output="result"></div>
from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=1024) # "document" and "style" come from the HTML form result = llm.generate([f"Write a {style} summary:\n{document}"], params) print(result[0].outputs[0].text) # print() output appears in the data-seqpu-output="result" div
Access Control
- Public — anyone with the URL. Share on social media, embed in your website, post on Product Hunt.
- Unlisted — only people with the direct link can access. Not discoverable, but no login required.
- Private — only you. Personal tools and dashboards.
Monetization
Every click of the Generate button is revenue. The visitor fills in inputs for free — the GPU only fires when they click. Set a markup (0-30%) and get paid on every single use.
- Per-use pricing — set markup, charge per click. Great for high-volume tools.
- Subscription — set to Private or Unlisted, share the link only with paying customers.
- Free tier — make it public, build an audience. Add paid features later.
- Internal tool — no markup, runs on your credits. Save your team hours per week.
Complete Examples — Real Products You Can Build
Each example below is a complete product. The HTML goes in the builder's HTML tab, the CSS goes in the CSS tab. The Python is your notebook code. Copy any of these, publish, start charging.
1. Translation Tool — Your Own Google Translate
A freelance translator's dream. Professional interface, 8 languages, tone control. Sell to businesses at $29/month or $0.01/translation. Runs on T4 for $0.59/hr — costs you $0.001 per translation.
<div class="tool">
<h1>Instant Translation</h1>
<p class="subtitle">Professional translation in 8 languages. Powered by AI.</p>
<label>Your Text</label>
<textarea data-seqpu-input="text" rows="6"
placeholder="Paste the text you want to translate..."></textarea>
<div style="display:flex;gap:1rem;margin-top:1rem">
<div style="flex:1">
<label>Translate to</label>
<select data-seqpu-input="target_language">
<option value="es">Spanish</option>
<option value="fr">French</option>
<option value="ja">Japanese</option>
<option value="de">German</option>
<option value="zh">Chinese</option>
<option value="pt">Portuguese</option>
<option value="ko">Korean</option>
<option value="ar">Arabic</option>
</select>
</div>
<div style="flex:1">
<label>Tone</label>
<select data-seqpu-input="formality">
<option value="neutral">Neutral</option>
<option value="formal">Formal</option>
<option value="casual">Casual</option>
</select>
</div>
</div>
<button data-seqpu-generate class="btn">Translate</button>
<label>Translation</label>
<div data-seqpu-output="result" class="output"></div>
</div>.tool { max-width:680px; margin:0 auto; padding:2rem; font-family:'Inter',sans-serif; color:#f0f0f0; }
h1 { font-size:28px; margin-bottom:.5rem; }
.subtitle { color:#888; margin-bottom:2rem; }
label { display:block; font-size:12px; font-weight:600; color:#aaa; margin:1rem 0 .5rem;
text-transform:uppercase; letter-spacing:.05em; }
textarea, select { width:100%; padding:12px; background:#1a1a1a; border:1px solid #333;
color:#f0f0f0; border-radius:6px; font-size:15px; }
textarea:focus, select:focus { border-color:#5eead4; outline:none; }
.btn { width:100%; padding:14px; margin-top:1.5rem; background:#5eead4; color:#000;
border:none; border-radius:6px; font-size:16px; font-weight:700; cursor:pointer;
text-transform:uppercase; letter-spacing:.1em; }
.btn:hover { opacity:.9; }
.output { background:#1a1a1a; border:1px solid #333; border-radius:6px;
padding:16px; min-height:100px; white-space:pre-wrap; line-height:1.6; }from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") # formality defaults to "neutral" if not sent tone = formality if formality else "neutral" result = llm.generate([f"Translate to {target_language} in a {tone} tone:\n{text}"], SamplingParams(max_tokens=1024)) print(result[0].outputs[0].text)
Revenue: At $0.01/translation with 20% markup — 1,000 translations/day = $2/day profit. Or sell as $29/month unlimited to businesses.
2. Image Generator — Your Own Midjourney
A creative studio. Visitors describe what they want, pick a style, adjust how many images. Gallery output with lightbox — click to enlarge, download any. Sell to content creators at $0.10/image or $49/month unlimited.
<div class="tool">
<h1>AI Image Studio</h1>
<p class="subtitle">Describe what you want. Get professional images in seconds.</p>
<label>Describe your image</label>
<textarea data-seqpu-input="description" rows="3"
placeholder="e.g. product photo of handmade soap on marble, soft natural lighting, minimal background"></textarea>
<div style="display:flex;gap:1rem;margin-top:1rem">
<div style="flex:1">
<label>Style</label>
<select data-seqpu-input="style_preset">
<option value="photorealistic">Photorealistic</option>
<option value="digital art">Digital Art</option>
<option value="watercolor painting">Watercolor</option>
<option value="anime illustration">Anime</option>
<option value="3d render">3D Render</option>
<option value="oil painting">Oil Painting</option>
</select>
</div>
<div style="flex:1">
<label>Number of images</label>
<input type="range" data-seqpu-input="count" min="1" max="4" value="2">
</div>
</div>
<button data-seqpu-generate class="btn">Generate Images</button>
<div data-seqpu-output="images" class="gallery"></div>
</div>from diffusers import StableDiffusionXLPipeline import torch pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") for i in range(int(count)): image = pipe(description, num_inference_steps=25).images[0] image.save(f"/outputs/image_{i+1}.png") print(f"Generated {count} images")
Revenue: $0.05/image at 30% markup. 200 images/day = $3/day profit. Or $49/month unlimited for content creators.
3. Receipt Scanner — Automate Bookkeeping
Accountants drag a receipt photo onto the page. Vision model reads it and extracts everything — vendor, date, every line item, tax, total. Clean structured output they can copy into their system. Sell at $0.02/receipt.
<div class="tool">
<h1>Receipt Scanner</h1>
<p class="subtitle">Drop a receipt photo. Get structured data back instantly.</p>
<div class="upload-zone">
<input type="file" data-seqpu-input="receipt_image" accept="image/*">
<p>Drop receipt image here or click to browse</p>
</div>
<button data-seqpu-generate class="btn">Scan Receipt</button>
<label>Extracted Data</label>
<div data-seqpu-output="extracted_data" class="output"></div>
</div>import base64 from pathlib import Path from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from PIL import Image Path("/data/receipt.jpg").write_bytes(base64.b64decode(receipt_image)) model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") img = Image.open("/data/receipt.jpg") inputs = processor(images=img, text="Extract vendor, date, every line item, and total as JSON", return_tensors="pt") output = model.generate(**inputs, max_new_tokens=512) print(processor.decode(output[0], skip_special_tokens=True))
Revenue: $0.02/receipt. One accounting firm with 500 receipts/day = $10/day = $300/month from one client.
4. Code Reviewer — Replace a $200/month SaaS
Dev teams paste code, pick focus area, get a professional review. Runs on CPU calling Claude — costs you basically nothing. Sell at $0.03/review or $19/month per dev.
<div class="tool">
<h1>Code Review AI</h1>
<p class="subtitle">Paste your code. Get a professional review in seconds.</p>
<label>Your Code</label>
<textarea data-seqpu-input="code" rows="12"
style="font-family:'Space Mono',monospace;font-size:13px"
placeholder="Paste your code here..."></textarea>
<div style="display:flex;gap:1rem;margin-top:1rem">
<div style="flex:1">
<label>Language</label>
<select data-seqpu-input="language">
<option value="python">Python</option>
<option value="javascript">JavaScript</option>
<option value="typescript">TypeScript</option>
<option value="go">Go</option>
<option value="rust">Rust</option>
<option value="java">Java</option>
</select>
</div>
<div style="flex:1">
<label>Focus</label>
<select data-seqpu-input="focus">
<option value="security">Security Issues</option>
<option value="performance">Performance</option>
<option value="readability">Readability</option>
<option value="all">Everything</option>
</select>
</div>
</div>
<button data-seqpu-generate class="btn">Review Code</button>
<label>Review</label>
<div data-seqpu-output="review" class="output"></div>
</div>from anthropic import Anthropic import os client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"]) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, messages=[{"role": "user", "content": f"Review this {language} code for {focus} issues. Be specific. Cite line numbers.\n\n{code}"}] ) print(response.content[0].text)
Revenue: $0.03/review. A 10-person dev team reviewing 50 PRs/day = $1.50/day. Or $19/month per dev = $190/month per team.
5. Meeting Transcriber — Save 30 Minutes Per Meeting
Upload a recording. Get the full transcript AND organized action items with who's responsible. Every project manager needs this. $0.10/minute or $49/month unlimited.
<div class="tool">
<h1>Meeting Transcriber</h1>
<p class="subtitle">Upload a recording. Get transcript + action items in 60 seconds.</p>
<div class="upload-zone">
<input type="file" data-seqpu-input="audio_file" accept="audio/*">
<p>Drop audio file here (MP3, WAV, M4A)</p>
</div>
<button data-seqpu-generate class="btn">Transcribe & Summarize</button>
<label>Results</label>
<div data-seqpu-output="result" class="output"></div>
</div>import whisper, base64 from pathlib import Path from vllm import LLM, SamplingParams Path("/data/audio.mp3").write_bytes(base64.b64decode(audio_file)) model = whisper.load_model("large-v3") transcript = model.transcribe("/data/audio.mp3")["text"] llm = LLM(model="Qwen/Qwen3-14B") result = llm.generate([f"Extract action items, decisions, and follow-ups:\n{transcript}"], SamplingParams(max_tokens=1024)) print(f"TRANSCRIPT:\n{transcript}\n\nACTION ITEMS:\n{result[0].outputs[0].text}")
Revenue: $0.10/minute of audio. A law firm with 10 hours of depositions/week = $60/week = $240/month from one client.
6. Product Photo → Marketing Copy — 10 Hours Saved Per Week
E-commerce sellers upload a product photo, type the name, click Generate. They get Instagram captions with hashtags, tweets, LinkedIn posts, and an SEO description. All at once. $0.03/photo or $29/month unlimited for Etsy sellers.
<div class="tool">
<h1>Social Media Copy Generator</h1>
<p class="subtitle">Upload a product photo. Get ready-to-post content for every platform.</p>
<div class="upload-zone">
<input type="file" data-seqpu-input="product_photo" accept="image/*">
<p>Drop product photo here</p>
</div>
<label>Product Name</label>
<input type="text" data-seqpu-input="product_name"
placeholder="e.g. Lavender Honey Soap, Handmade Leather Wallet...">
<button data-seqpu-generate class="btn">Generate Copy</button>
<label>Your Marketing Copy</label>
<div data-seqpu-output="copy" class="output"></div>
</div>import base64 from pathlib import Path from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from PIL import Image Path("/data/product.jpg").write_bytes(base64.b64decode(product_photo)) model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") img = Image.open("/data/product.jpg") inputs = processor(images=img, text=f"Product: {product_name}. Write 3 Instagram captions with hashtags, 2 tweets, 1 LinkedIn post, and an SEO product description.", return_tensors="pt") output = model.generate(**inputs, max_new_tokens=1024) print(processor.decode(output[0], skip_special_tokens=True))
Revenue: $0.03/photo. An Etsy seller with 20 products/week = $0.60/week. Sell as $29/month unlimited — 100 sellers = $2,900/month.
7. Sketch → Illustration — Draw and Watch AI Finish It
The wow factor. Visitors draw on a canvas — rough sketch, stick figure, whatever. Pick a style. AI transforms it into a finished illustration. Kids' apps, design tools, creative platforms. $0.10/drawing or $4.99/month for parents.
<div class="tool">
<h1>Sketch to Art</h1>
<p class="subtitle">Draw anything. AI turns it into a masterpiece.</p>
<label>Draw your sketch</label>
<canvas data-seqpu-input="sketch" width="512" height="512"
style="border:1px solid #333;border-radius:6px;cursor:crosshair"></canvas>
<label>Art Style</label>
<select data-seqpu-input="style">
<option value="watercolor painting">Watercolor</option>
<option value="digital art">Digital Art</option>
<option value="oil painting">Oil Painting</option>
<option value="anime illustration">Anime</option>
<option value="pencil sketch refined">Refined Pencil</option>
<option value="children's book illustration">Children's Book</option>
</select>
<button data-seqpu-generate class="btn">Transform My Sketch</button>
<label>Your Illustration</label>
<div data-seqpu-output="illustration"></div>
</div>import base64 from pathlib import Path from diffusers import StableDiffusionXLImg2ImgPipeline import torch from PIL import Image Path("/data/sketch.png").write_bytes(base64.b64decode(sketch)) init_image = Image.open("/data/sketch.png").resize((1024, 1024)) pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") result = pipe(prompt=f"{style} illustration based on this sketch", image=init_image, strength=0.75).images[0] result.save("/outputs/illustration.png") print("Illustration generated")
Revenue: $0.10/drawing. $4.99/month for kids. Parents buy it because their kids love watching sketches transform. 200 subscribers = $998/month.
Build it once.
Let it run forever.
Agents are scripts that make decisions. They read input, decide what to do, take action, check the result, and decide what to do next. The model is the brain. Your published tools are the hands. The GPU is the muscle. You don't pay for tokens — you pay for compute by the second. At scale, that's 6-50x cheaper than API pricing. And your data never leaves your server.
The Token Tax
Every time you call an AI API, you're paying a markup on compute. Here's what the providers charge vs what it actually costs to run the same work on your own hardware:
| Approach | Cost per 1M tokens | Your markup potential |
|---|---|---|
| GPT-4o API | $2.50 input / $10 output | You pay them |
| Claude Sonnet API | $3 input / $15 output | You pay them |
| Gemini Pro API | $1.25 input / $5 output | You pay them |
| Your 7B on T4 | ~$2 total (compute only) | You set the price |
| Your 14B on A100 | ~$8 total | You set the price |
| Your 32B on A100 | ~$17 total | You set the price + own the data |
For simple tasks — classification, extraction, routing, Q&A — a 7B on T4 handles it at $2/M tokens vs $12.50/M for GPT-4o. That's 6x cheaper. At scale with vLLM batching, the gap widens to 10-50x. And your data never touches a third party.
CPU Is Dirt Cheap — Most Agent Work Lives Here
Here's what most people miss: 80-90% of agent work is CPU. API calls, web scraping, database queries, formatting, routing decisions, sending emails, Telegram messages — all CPU. $0.0000131/core/second = $0.047/hour.
An agent that handles 1,000 messages/day on CPU costs about $1.40/day. The GPU only fires for the 10-20% of the workflow that actually needs a model. Your agent is awake 24/7 but only turns on the GPU when it actually needs to think.
import seqpu, requests, os from anthropic import Anthropic # Everything runs on CPU — no GPU, no model loading client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"]) # Step 1: Check inventory API inventory = requests.get("https://api.mystore.com/inventory", headers={"Authorization": f"Bearer {os.environ['STORE_KEY']}"}).json() # Step 2: Ask Claude to analyze low_items = [i for i in inventory if i["quantity"] < 10] response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": f"Write a reorder report for these low-stock items:\n{low_items}"}] ) # Step 3: Send to Telegram seqpu.notify(response.content[0].text, chat_id="123456", platform="telegram") # Cost: ~$0.00003 for SeqPU compute + Claude API cost. No GPU.
10 lines. CPU only. Checks inventory, generates a report, sends it to your phone. Schedule it to run every morning. Total compute cost: $0.00003/run. That's 33,000 runs for $1.
Intention > Scale — 10 Small Models Beat 1 Giant
A 480B model gets a vague prompt and produces a vague answer. 10 purpose-built 3B-7B models, each chained for one specific task — each one is a specialist. The classifier knows what kind of request it's seeing. The extractor knows what data to pull. The formatter knows how to present it.
Cost: 10 T4 calls at $0.0002 each = $0.002 total. One 480B API call = $0.10+. Quality: better, because each step was designed with intention.
The SeqPU SDK
Every notebook has import seqpu available automatically. The SDK lets your running code spawn jobs on other GPUs, call published tools, send messages to chat platforms, and run agentic loops.
import seqpu seqpu.run(code, gpu) # Spawn a job on any GPU seqpu.tools.call(id, inputs) # Call a published tool seqpu.tools.list() # List your published tools seqpu.notify(msg, chat_id) # Send message to Telegram/Discord/Slack/WhatsApp seqpu.notify_file(...) # Send a file to chat seqpu.credits.balance() # Check your credit balance seqpu.gpu.pricing() # Get current GPU pricing seqpu.snapshot.create(...) # Create a StateSave environment seqpu.agent.loop(...) # Run an agentic tool-calling loop
seqpu.run() — Spawn Sub-Jobs
Run code on a different GPU from inside your notebook. An orchestrator on CPU ($0.047/hr) can dispatch heavy work to expensive GPUs only when needed.
import seqpu from pathlib import Path # This notebook runs on CPU ($0.047/hr) # Heavy inference dispatched to A100 only when needed tasks = ["Summarize this report", "Translate to Spanish", "Extract action items"] for task in tasks: result = seqpu.run(f""" from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-14B") result = llm.generate(["{task}"], SamplingParams(max_tokens=512)) print(result[0].outputs[0].text) """, gpu="a100-80gb") print(f"Done: {task}")
seqpu.tools.call() — Chain Published Tools
Call any published tool from inside your code. Build pipelines where each step runs on different hardware, built by different people.
import seqpu # Each tool is a published notebook running on its own GPU raw_text = seqpu.tools.call("pdf-extractor", {"url": "https://example.com/report.pdf"}) summary = seqpu.tools.call("summarizer-32b", {"text": raw_text["content"]}) translated = seqpu.tools.call("translator", {"text": summary["summary"], "lang": "ja"}) print(f"Japanese summary: {translated['result']}")
seqpu.notify() — Send Results to Chat
Send messages and files to Telegram, Discord, Slack, or WhatsApp from your running code. Your AI reports back to you wherever you are.
import seqpu, base64 # Send a text message seqpu.notify( "Daily report complete. Revenue: $42,150 (+8.3%)", chat_id="123456789", platform="telegram" ) # Send a file (chart, PDF, image) with open("/data/chart.png", "rb") as f: seqpu.notify_file( base64.b64encode(f.read()).decode(), "image/png", "daily_chart.png", chat_id="123456789", caption="Revenue trend — last 30 days" )
seqpu.agent.loop() — Agentic Tool Calling
Give a model access to tools and let it decide what to do. The agent loop runs the model, detects tool calls, executes them, feeds results back, and repeats until the model has an answer.
import seqpu from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-32B-Instruct", device_map="auto") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct") # Get your published tools as callable definitions tools = seqpu.agent.get_tool_definitions(format="qwen") # The agent decides which tools to call and in what order response = seqpu.agent.loop( model=model, tokenizer=tokenizer, messages=[{"role": "user", "content": "Research NVIDIA's Q4 earnings"}], tools=tools, max_iterations=10 ) print(response)
Scheduled Jobs
Publish your notebook with a cron schedule. It runs automatically — no human needed, no browser open.
# Runs every morning at 7am automatically from vllm import LLM, SamplingParams from pathlib import Path llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=1024) emails = Path("/data/inbox").read_text() result = llm.generate([f"Extract action items:\n{emails}"], params) Path("/data/output/digest.txt").write_text(result[0].outputs[0].text)
Common schedules:
- 0 7 * * * — every day at 7am
- */15 * * * * — every 15 minutes
- 0 0 * * 1 — every Monday at midnight
- 0 9 1 * * — first day of every month at 9am
- 0 */6 * * * — every 6 hours
import seqpu, requests # Checks every 15 minutes, alerts if threshold breached prices = requests.get("https://api.example.com/prices").json() for item in prices: if item["price"] < item["alert_threshold"]: seqpu.notify( f"PRICE ALERT: {item['name']} dropped to ${item['price']}", chat_id="123456789", platform="telegram" )
Multi-Model Pipelines
Different models are good at different things. Chain them — small models for cheap work, big models for hard work. Only use expensive GPUs when the task actually needs them.
import seqpu # Stage 1: Small model classifies (T4, $0.59/hr — fractions of a cent) classification = seqpu.run(""" from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-4B") result = llm.generate(["Classify as urgent/normal/spam: " + message], SamplingParams(max_tokens=10)) print(result[0].outputs[0].text) """, gpu="t4") # Stage 2: Only if urgent — large model responds (A100, $2.50/hr) if "urgent" in classification.lower(): response = seqpu.run(""" from vllm import LLM, SamplingParams llm = LLM(model="Qwen/Qwen3-32B") result = llm.generate(["Draft an urgent response to: " + message], SamplingParams(max_tokens=1024)) print(result[0].outputs[0].text) """, gpu="a100-80gb") # Stage 3: Notify on Telegram seqpu.notify(f"Urgent handled: {response[:200]}", chat_id="123456789")
GPU Agent — Private Data, Local Model
When your data can't leave the server — medical, legal, financial, proprietary — run the model locally. No API calls. No third party.
from vllm import LLM, SamplingParams import seqpu from pathlib import Path llm = LLM(model="Qwen/Qwen3-14B") params = SamplingParams(max_tokens=1024) for doc in Path("/data/inbox").glob("*.txt"): text = doc.read_text() result = llm.generate([f"Classify as urgent/normal/spam and draft a response:\n{text}"], params) response = result[0].outputs[0].text if "urgent" in response.lower(): seqpu.notify(f"URGENT: {doc.name}\n{response}", chat_id="123456", platform="telegram") Path(f"/data/processed/{doc.name}").write_text(response) print("All documents processed")
A100 80GB for 2 minutes = $0.08. Processed 50 emails. $0.0016/email. Your data never left the server.
Chained Agent — Button Clicks to Production
This is the full power. Each capability is a separate notebook you published as a headless tool. The orchestrator runs on CPU and chains them with seqpu.tools.call(). Each tool was: write code → test → click Publish → done.
import seqpu # This orchestrator runs on CPU ($0.047/hr) # Each tool was published separately — 10 lines each, click Publish # Step 1: Classify the request (3B on T4, $0.0002) category = seqpu.tools.call("classifier-id", {"text": task}) # Step 2: Route to the right specialist if category["result"] == "billing": response = seqpu.tools.call("billing-agent", {"question": task}) # 7B on T4, $0.0003 elif category["result"] == "technical": response = seqpu.tools.call("tech-support", {"question": task}) # 14B on A100, $0.005 else: response = seqpu.tools.call("general-agent", {"question": task}) # 7B on T4, $0.0003 # Step 3: Send back via Telegram seqpu.notify(response["answer"], chat_id=telegram_chat_id, platform="telegram")
Each tool was 10 lines + click Publish. The orchestrator chains them. Total cost per ticket: $0.001-0.006. A human support agent costs $0.75/ticket. You charge the company $0.10/ticket. They save $0.65. You make $0.094. At 500 tickets/day = $47/day = $1,410/month from one client.
How to Build an Agent — Start to Finish
This is sections 05, 06, and 07 combined. Your agent is published tools + an orchestrator + a delivery channel.
Real Agents, Real Revenue
The Lawyer's Paralegal — $50/contract, 99.99% margin
Client sends a contract PDF to the Telegram bot. Agent extracts text (CPU, $0.00003), sends to 14B for clause analysis ($0.005), formats the review (CPU, $0.00003), sends back to Telegram. Total: $0.006 per contract. Lawyer charges $50. A junior paralegal costs $70 per contract and takes 2 hours. The agent does it in 30 seconds for half a cent.
E-commerce Support — $1,495/month per client
"Where's my order 12345?" Agent calls Shopify API (CPU), looks up the order, generates a friendly response with tracking link (7B on T4, $0.0003). $0.0003/message. Human agent: $0.75/ticket. Charge $0.10/ticket. 500 tickets/day × $0.0997 profit = $49.85/day = $1,495/month from ONE client.
Content Creator's Daily Machine — $2,900/month
6am cron job: scrape 20 news sources (CPU), summarize top 5 (7B on T4, $0.002), generate LinkedIn + tweets + Instagram (14B on A100, $0.005). Send everything to Telegram. Daily cost: $0.008. Sell at $29/month per creator. 100 creators = $2,900/month. Compute: $24/month. Margin: 99.2%.
Real Estate Lead Qualifier — $4,950/month
Lead fills out a form. Agent asks qualifying questions (CPU + Claude, $0.00003). Scores budget, timeline, area. Sends qualified leads to realtor's Telegram with summary. Unqualified get a polite auto-response. $99/month per realtor × 50 realtors = $4,950/month. Compute: ~$250/month.
Student Study Agent — $4,990/month
Student messages: "explain the Krebs cycle." Agent finds the relevant textbook section (embedding search, CPU), explains using the source material (7B on T4, $0.0003), generates practice questions, remembers what they got wrong. $4.99/month. Cost: $0.15/month per student. 1,000 students = $4,990/month. The agent never gets impatient and never judges.
The Document Factory — $6,000/month
Law firms, accounting firms, medical offices — they all have documents to process. Build one receipt scanner, one contract reviewer, one medical transcriber. Publish each as a headless tool. $200/month per firm × 30 firms = $6,000/month. Each firm processes hundreds of documents. Your cost: GPU seconds per document.
From thought to
passive income.
There has never been a moment like this in history. A single person with an idea can build, deploy, and sell an AI-powered product in an afternoon — without infrastructure, without a team, without raising money. The tools exist. The models are free. The compute is serverless. The only thing between you and recurring revenue is 25 minutes and an idea.
The New Gig Economy
The old gig economy — Uber, DoorDash — you trade your time and your car for $15/hour. Clock in. Clock out. No leverage. The AI gig economy is different: you trade one afternoon of setup for income that runs while you sleep. You don't clock in. You don't clock out. Your agent works 24/7 and you collect the margin.
This isn't a fantasy. This is compute. The cheapest, most scalable product you can sell. No inventory. No shipping. No physical limits. Just CPU cycles and GPU seconds — billed by the second, marked up by you.
Five Ways to Get Paid
The Path — Idea to Revenue in One Afternoon
- 10 minutes — describe what you want to Claude. Get working code. Paste into SeqPU. Hit Run All. It works.
- 5 minutes — click Publish. Define inputs/outputs. Pick GPU. Set markup. Your API is live.
- 5 minutes — create a service token. Or connect a Telegram bot. Or publish as a UI site.
- 5 minutes — tell 10 people. Post on Twitter. Share in a Discord. Email 5 businesses.
- That evening — your first API call comes in. You made money from something you built during lunch.
No investors. No team. No server to maintain. No bill when nobody's using it. Just an idea, Claude, SeqPU, and 25 minutes.
Day One Advantage — Sell the Newest Model
This is unique to SeqPU. A new model drops on HuggingFace — Qwen4, Llama 5, whatever comes next. You download it in the notebook. First run caches it. You test it — it works. Click Publish with 25% markup.
Everyone else is waiting. Waiting for OpenAI to offer API access. Waiting for Anthropic to update their SDK. Waiting for Google to announce pricing. You're already live. Developers who want to try the new model call YOUR endpoint. First mover. First revenue.
The Volume Game — Think in Billions
You don't need to charge a lot. You need to charge a little on something that happens A LOT.
There are things that happen billions of times a day. Emails sent. Messages translated. Images resized. Documents summarized. Receipts scanned. Code reviewed. Every single one is a compute job someone will pay for.
| Micro-service | Global volume | You capture | Your price | Monthly revenue |
|---|---|---|---|---|
| Email subject optimizer | 300B emails/day | 0.0001% | $0.001 | $9,000 |
| Translation endpoint | 100B messages/day | 0.00005% | $0.001 | $1,500 |
| Receipt OCR API | Millions/day | 0.01% | $0.002 | $6,000 |
| Sentiment analysis | Millions of posts/hr | 0.001% | $0.001 | $2,160 |
| Image background remover | Tens of millions/day | 0.01% | $0.005 | $1,500 |
The companies doing $1B ARR aren't charging $1,000 per customer. They're charging $0.001 to a billion transactions. Compute is the same game. Find the thing that happens a billion times. Charge almost nothing. Let volume do the math.
The AI Companies' Playbook — And How You Copy It
Look at how the AI companies got rich: OpenAI built a model. Published an API. Charges per call. That's literally what SeqPU lets you do. Same playbook. Same margins.
The difference:
- They spent $100M training their models. You download one from HuggingFace for free.
- They built global infrastructure. You click a button on SeqPU.
- They have 500 engineers. You have Claude.
- They charge $15/M tokens. A 7B on T4 costs $2/M tokens. You charge $5 and everyone's happy.
The Margin Math
| What you build | Your cost | You charge | Customers | Monthly profit |
|---|---|---|---|---|
| Translation API (T4) | $0.001/call | $0.01/call | 10K calls/day | $2,700 |
| Telegram study bot (T4) | $0.15/user/mo | $4.99/mo | 500 students | $2,420 |
| Document summarizer (A100) | $0.006/call | $0.05/call | 2K calls/day | $2,640 |
| Support agent (chained) | $5/client/mo | $99/mo | 30 businesses | $2,820 |
| Content generator (A100) | $0.008/run | $1/article | 100/day | $2,976 |
| New model API (day 1) | $0.005/call | $0.02/call | 5K calls/day | $2,250 |
Every row is something you can build in an afternoon. Every row is passive income.
Passive Income — Build Once, Earn Forever
This isn't trading hours for dollars. You build it once. It runs forever.
- Your published tool runs 24/7 without you
- Credits checked and deducted automatically — you don't manage billing
- Auto-scales — GPUs spin up when calls come in, spin down when idle
- Zero cost when nobody's calling — serverless means you only pay when revenue comes in
- Update anytime — edit your notebook, re-publish, the endpoint updates
Stack them. Your first tool makes $200/month. Your second makes $500. Your third makes $1,000. By tool five you're at $3,000/month passive. Each one took an afternoon. No boss. No schedule. No commute. No cap on earnings.
Web 3.0 — The Compute Web
Forget what you've heard about Web3. This is what Web 3.0 actually is:
- Web 1.0 — the read web. Static pages. You consumed content.
- Web 2.0 — the social web. People interacting with people. You posted, others responded.
- Web 3.0 — the compute web. Machines interacting with machines. AI agents calling other AI agents. Services processing, deciding, acting — without a human clicking anything.
The websites of Web 3.0 aren't pages you visit. They're endpoints you call. They're agents. They're pipelines. And they charge per call.
Every headless tool you publish on SeqPU is a Web 3.0 service. It exists on the internet, it responds to requests, it charges money, and it runs without you. That's not a side project. That's a business.
We Are Here With You
This is new for everyone. Nobody was born knowing how to build AI agents. We are building this community from the ground up — hackathons, shared tools, open examples, real support. The people in this community right now are the early ones. The ones who will look back and say "I was there before it was obvious."
If you're reading this and you feel it — not just understand it, but feel it — this is your moment. The normal person can now play the same games the hyperscalers play. You just need to know what to do with the door.
Start Now
Open the notebook. Describe what you want to Claude. Paste the code. Hit Run All. Click Publish. Share the link. Start charging.
That's the whole path. No servers. No DevOps. No infrastructure. No token bills. Just ideas, compute, and you.
I have no idea
where to start.
You don't need to know how to code. You don't need a business plan. You don't need to understand GPUs. Pick one of these. Copy the code. Hit Run All. See what happens. Then hit Publish — and it's live for the world.
from vllm import LLM, SamplingParams
from pathlib import Path
import smtplib, os
from email.mime.text import MIMEText
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=1024)
# Step 1: Read and summarize
doc = Path("/data/report.pdf").read_text()
result = llm.generate([f"Summarize this document in 5 key bullet points:\n{doc[:8000]}"], params)
summary = result[0].outputs[0].text
# Step 2: Email it
msg = MIMEText(f"Here's your summary:\n\n{summary}")
msg["Subject"] = "Document Summary - Ready to Review"
msg["From"] = os.environ["EMAIL"]
msg["To"] = "[email protected]"
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(os.environ["EMAIL"], os.environ["EMAIL_PASS"])
server.send_message(msg)
print("Summary sent to team")import requests, os, hashlib
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
SCHOOL_URL = "https://your-school.edu/announcements"
# Step 1: Fetch page
page = requests.get(SCHOOL_URL, timeout=10).text
new_hash = hashlib.md5(page.encode()).hexdigest()
old_hash = open("/data/last_hash.txt", "r").read().strip() if os.path.exists("/data/last_hash.txt") else ""
# Step 2: If changed, analyze what's new
if new_hash != old_hash:
result = llm.generate([f"What's new on this school page? Summarize any new announcements, assignments, or deadlines:\n{page[:4000]}"], params)
requests.post(WEBHOOK, json={"text": f"School update:\n{result[0].outputs[0].text}"})
open("/data/last_hash.txt", "w").write(new_hash)import requests, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=2048)
dest = "Tokyo"
days = 7
budget = "$3000"
interests = "food, temples, anime, nightlife"
# Step 1: Search for flight info
flights = requests.post("https://google.serper.dev/search",
json={"q": f"cheapest flights to {dest} from LAX next month"},
headers={"X-API-KEY": os.environ["SERPER_KEY"]}).json()
flight_info = "\n".join([r["snippet"] for r in flights.get("organic", [])[:3]])
# Step 2: Get weather + build full plan
result = llm.generate([f"""Plan a {days}-day trip to {dest}.
Budget: {budget}
Interests: {interests}
Flight info found:\n{flight_info}
Generate:
1. Day-by-day itinerary with specific places
2. Restaurant recommendations for each day
3. Transportation tips
4. Packing list based on weather
5. Budget breakdown"""], params)
with open("/outputs/trip_plan.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Trip plan ready")import whisper
from vllm import LLM, SamplingParams
# Step 1: Transcribe
model_whisper = whisper.load_model("base")
transcript = model_whisper.transcribe("/data/voice_memo.m4a")["text"]
print(f"Transcribed: {len(transcript)} chars")
# Step 2: Blog post
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=2048)
blog = llm.generate([f"Rewrite this voice memo as a polished blog post with a title, subheadings, and clean paragraphs:\n{transcript}"], params)
# Step 3: Social posts
social = llm.generate([f"From this blog post, generate:\n- 10 tweets (under 280 chars each)\n- 3 LinkedIn posts\n- 2 Instagram captions\n\nBlog:\n{blog[0].outputs[0].text[:3000]}"], params)
with open("/outputs/blog.txt", "w") as f:
f.write(blog[0].outputs[0].text)
with open("/outputs/social.txt", "w") as f:
f.write(social[0].outputs[0].text)
print("Blog + 15 social posts generated from one voice memo")import requests, os, difflib
from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
URL = "https://competitor.com/pricing"
# Step 1: Fetch current page
current = requests.get(URL, timeout=10).text
prev_file = Path("/data/competitor_last.txt")
previous = prev_file.read_text() if prev_file.exists() else ""
# Step 2: Diff
if current != previous:
diff = "\n".join(difflib.unified_diff(previous.split("\n")[:50], current.split("\n")[:50], lineterm=""))
# Step 3: Analyze what changed and why it matters
result = llm.generate([f"A competitor's website changed. Analyze what's different and what it means:\n{diff[:3000]}"], params)
requests.post(WEBHOOK, json={"text": f"Competitor change detected:\n{result[0].outputs[0].text[:500]}"})
prev_file.write_text(current)import json, requests, os
from vllm import LLM, SamplingParams
from datetime import datetime
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
reviews = json.load(open("/data/new_reviews.json"))
sentiment_log = json.load(open("/data/sentiment_log.json")) if os.path.exists("/data/sentiment_log.json") else []
for review in reviews:
# Step 1: Draft response
resp = llm.generate([f"Write a professional response to this {review['stars']}-star review: {review['text'][:500]}"], params)
print(f"{'⭐' * review['stars']} Response: {resp[0].outputs[0].text[:200]}")
# Step 2: Score sentiment
score = llm.generate([f"Score sentiment 1-10: {review['text'][:300]}\nReturn just the number."], params)
sentiment_log.append({"date": datetime.now().isoformat(), "score": score[0].outputs[0].text.strip(), "stars": review["stars"]})
# Step 3: Save tracking data
json.dump(sentiment_log, open("/data/sentiment_log.json", "w"))
print(f"Processed {len(reviews)} reviews, {len(sentiment_log)} total tracked")from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=2048)
fridge = "chicken thighs, rice, broccoli, eggs, butter, garlic, soy sauce, pasta, canned tomatoes, onions, cheese, tortillas"
people = 4
restrictions = "no shellfish"
result = llm.generate([f"""I have these ingredients: {fridge}
Cooking for: {people} people
Restrictions: {restrictions}
Generate:
1. 7 dinner recipes using ONLY these ingredients where possible
2. For each meal: name, cook time, difficulty (easy/medium)
3. List of missing ingredients needed
4. Grocery list grouped by store aisle
5. Estimated grocery cost for missing items"""], params)
with open("/outputs/meal_plan.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Week of meals planned")from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=1024)
question = "How do I solve 3x + 7 = 22?"
subject = "Algebra"
# Step 1: Explain
explanation = llm.generate([f"You are a patient {subject} tutor. Explain how to solve this step by step. Don't give the final answer — guide them:\n{question}"], params)
print("EXPLANATION:", explanation[0].outputs[0].text)
# Step 2: Practice problems
problems = llm.generate([f"Generate 3 similar practice problems to: {question}\nJust the problems, no answers."], params)
print("\nPRACTICE:", problems[0].outputs[0].text)
# Step 3: Grade (student submits answers)
student_answers = "1) x=4 2) x=7 3) x=3"
grading = llm.generate([f"Grade these answers:\nProblems: {problems[0].outputs[0].text}\nStudent answers: {student_answers}\n\nFor each: correct/incorrect, show the right answer, explain the mistake if wrong."], params)
print("\nGRADING:", grading[0].outputs[0].text)import json, os, requests
from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.4, max_tokens=2048)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
my_resume = Path("/data/my_resume.txt").read_text()
jobs = json.load(open("/data/scraped_jobs.json"))
applications = []
for job in jobs[:20]:
# Step 1: Score match
score = llm.generate([f"Score this job match 0-100 for this resume. Return just the number.\nResume: {my_resume[:1500]}\nJob: {job['title']} - {job['desc'][:500]}"], params)
if int(score[0].outputs[0].text.strip()) < 70: continue
# Step 2: Rewrite resume for this job
resume = llm.generate([f"Rewrite this resume to target this specific job. Match keywords naturally.\nResume: {my_resume[:2000]}\nJob: {job['desc'][:1000]}"], params)
# Step 3: Write cover letter
cover = llm.generate([f"Write a cover letter for {job['title']} at {job['company']}. Reference their specific work. Under 250 words.\nResume: {my_resume[:1000]}\nJob: {job['desc'][:800]}"], params)
applications.append(f"{job['title']} at {job['company']}")
if applications:
requests.post(WEBHOOK, json={"text": f"Good morning! {len(applications)} applications ready:\n" + "\n".join(applications)})# This isn't a script. This is the moment. # # 1. Pick any example from this page # 2. Paste the code into a SeqPU notebook cell # 3. Hit Run All — watch it work # 4. Hit Publish → choose UI Site or Headless URL # 5. Set your markup (10-30%) # 6. Share the link # # You're now running a business. # # The meal planner? $4.99/month. 2000 users = $9,980/month. # The resume rewriter? $15/rewrite on Fiverr. Cost: $0.005. # The review responder? $79/month per business. 30 businesses = $2,370/month. # The job pipeline? $29/month. 1000 job seekers = $29,000/month. # # The code is the same whether 1 person uses it or 10,000. # The GPU scales automatically. # You never touch a server. # # What are you waiting for?
Business Tools
from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=1024)
transcript = Path("/data/meeting.txt").read_text()
result = llm.generate([f"""Extract all action items.
Format: [OWNER] - [ACTION] - [DEADLINE if mentioned]
Transcript: {transcript[:6000]}"""], params)
print(result[0].outputs[0].text)from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=1024)
transcript = Path("/data/transcript.txt").read_text()
result = llm.generate([f"""Analyze this YouTube transcript:
1. Chapters with timestamps
2. 5 key takeaways
3. One paragraph summary
Transcript: {transcript[:8000]}"""], params)
print(result[0].outputs[0].text)# Layer 1: CPU watches RSS feeds — almost free
import feedparser, requests, os
MY_DOMAINS = ["GPU compute", "AI infrastructure", "LLM inference"]
LAYER2 = os.environ["LAYER2_ENDPOINT"]
feed = feedparser.parse("https://youtube.com/feeds/videos.xml?channel_id=YOUR_ID")
for entry in feed.entries[:10]:
if any(d.lower() in entry.title.lower() for d in MY_DOMAINS):
requests.post(LAYER2, json={"url": entry.link, "title": entry.title})
# Layer 2 — T4 transcribes + summarizes
# Layer 3 — 32B specialist with your full knowledge baseInfrastructure
# Schedule: "0 7 * * *" in Publish → Schedule
# Add keys in Secrets panel — never in code
import os
from vllm import LLM, SamplingParams
from pathlib import Path
api_key = os.environ["MY_API_KEY"]
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=512)
data = Path("/data/input.txt").read_text()
result = llm.generate([f"Process this: {data}"], params)
Path("/data/output.txt").write_text(result[0].outputs[0].text)Agents
import imaplib, email, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=1024)
SYSTEM = """Draft email replies on behalf of [Your Name].
Tone: direct, friendly, gets to the point fast.
Flag anything needing human judgment."""
mail = imaplib.IMAP4_SSL("imap.gmail.com")
mail.login(os.environ["EMAIL"], os.environ["EMAIL_PASS"])
mail.select("inbox")
_, msgs = mail.search(None, "UNSEEN")
for uid in msgs[0].split()[:10]:
_, data = mail.fetch(uid, "(RFC822)")
body = email.message_from_bytes(data[0][1]).get_payload(decode=True).decode()
draft = llm.generate([f"{SYSTEM}\n\nEmail:\n{body[:2000]}"], params)
print(draft[0].outputs[0].text)from vllm import LLM, SamplingParams
import json
SPECIALISTS = {
"legal": "You are a legal expert. Answer only legal questions...",
"finance": "You are a financial analyst. Answer only finance questions...",
"medical": "You are a medical expert. Answer only medical questions...",
"tech": "You are a senior engineer. Answer only technical questions...",
}
ROUTER = """Which domain? legal, finance, medical, tech
Return JSON: {"domain": str}"""
coordinator = LLM(model="Qwen/Qwen3-3B")
specialist = LLM(model="Qwen/Qwen3-3B")
params = SamplingParams(max_tokens=512)
question = "What are the tax implications of issuing employee stock options?"
route = coordinator.generate([f"{ROUTER}\nQuestion: {question}"], params)
domain = json.loads(route[0].outputs[0].text)["domain"]
answer = specialist.generate([f"{SPECIALISTS[domain]}\nQ: {question}"], params)
print(f"[{domain.upper()}]", answer[0].outputs[0].text)Live Feeds & IOT
import pyaudio, numpy as np, requests, os
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
THRESHOLD = 65 # decibels
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1,
rate=44100, input=True, frames_per_buffer=1024)
while True:
data = np.frombuffer(stream.read(1024), dtype=np.int16)
db = 20 * np.log10(np.sqrt(np.mean(data**2) + 1e-10))
if db > THRESHOLD:
requests.post(WEBHOOK, json={"text": f"Alert: {db:.0f}dB"})# Layer 1 — CPU watcher (always on, always cheap)
import cv2, requests, os
LAYER2 = os.environ["LAYER2_ENDPOINT"]
cap = cv2.VideoCapture("/data/feed.mp4") # or RTSP URL
prev = None
while True:
ret, frame = cap.read()
if not ret: break
if prev is not None:
diff = cv2.absdiff(prev, frame).mean()
if diff > 0.15:
requests.post(LAYER2, json={"event": "motion"})
prev = frame
# Layer 2 — T4 classifier (wakes on demand)
# Domain context — not a naive general model
# Classifies: deer, elk, bear, human, vehicle, unknown
# Escalates to Layer 3 if confidence < 0.8Creative
from diffusers import StableDiffusionPipeline
from vllm import LLM, SamplingParams
import torch, json
pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
observer = LLM(model="Qwen/Qwen3-7B-VL")
obs_params = SamplingParams(max_tokens=50)
prompts = [
"a futuristic city at dawn, cinematic",
"the same city at noon, golden hour",
"the city at night, neon reflections",
]
good_frames = []
for prompt in prompts:
image = pipe(prompt, num_inference_steps=30).images[0]
score_raw = observer.generate(["Score 1-10 for quality"], obs_params)
score = json.loads(score_raw[0].outputs[0].text)["score"]
if score >= 7:
good_frames.append(image)
print(f"Kept {len(good_frames)}/{len(prompts)} frames")from TTS.api import TTS
from pathlib import Path
import subprocess
# Generate voiceover from script
tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC")
script = Path("/data/script.txt").read_text()
tts.tts_to_file(text=script, file_path="/data/voiceover.wav")
# Assemble frames + audio into final video via ffmpeg
subprocess.run([
"ffmpeg",
"-framerate", "24",
"-i", "/data/frames/frame_%04d.png",
"-i", "/data/voiceover.wav",
"-c:v", "libx264",
"-c:a", "aac",
"-shortest",
"/data/output/final.mp4"
])
print("Video assembled at /data/output/final.mp4")Money Makers
Start a Business Today
Every example below is a starter business. Copy the code, hit Run All, publish it, start charging. The math is in every description.
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=4096)
topic = "10 Ways to Improve Your Morning Routine"
tone = "conversational, actionable, SEO-optimized"
result = llm.generate([f"""Write a 2000-word blog post.
Topic: {topic}
Tone: {tone}
Include: intro hook, subheadings, bullet points, conclusion with CTA
Optimize for SEO with natural keyword placement."""], params)
with open("/outputs/article.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Article written:", len(result[0].outputs[0].text), "chars")from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
business = "Modern coffee shop called Brew Lab"
styles = ["minimal flat", "vintage hand-drawn", "geometric abstract",
"lettermark bold", "mascot playful"]
for i, style in enumerate(styles):
prompt = f"Professional logo design for {business}, {style} style, white background, vector clean"
image = pipe(prompt, num_inference_steps=30).images[0]
image.save(f"/outputs/logo_{i+1}_{style.replace(' ','_')}.png")
print(f"Generated: {style}")
print("5 logo concepts saved to /outputs/")from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=2048)
resume = Path("/data/resume.txt").read_text()
job_posting = Path("/data/job_posting.txt").read_text()
result = llm.generate([f"""Rewrite this resume to perfectly target the job posting below.
- Match keywords from the posting naturally
- Reframe experience to highlight relevant skills
- Keep it honest — enhance, don't fabricate
- Professional format, clean and scannable
RESUME:
{resume[:3000]}
JOB POSTING:
{job_posting[:2000]}"""], params)
with open("/outputs/rewritten_resume.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Resume rewritten and saved")from vllm import LLM, SamplingParams
import json
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=1024)
product = {
"name": "Bamboo Cutting Board Set",
"features": "3 sizes, juice groove, non-slip feet, organic bamboo",
"target": "home cooks, gift buyers",
"platform": "Amazon"
}
result = llm.generate([f"""Write a {product['platform']} product listing.
Product: {product['name']}
Features: {product['features']}
Target buyer: {product['target']}
Include: SEO title (under 200 chars), 5 bullet points, description paragraph.
Use natural keywords. No keyword stuffing."""], params)
print(result[0].outputs[0].text)from TTS.api import TTS
from pathlib import Path
# Clone voice from sample
tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2")
voice_sample = "/data/client_voice_sample.wav"
script = Path("/data/script.txt").read_text()
tts.tts_to_file(
text=script,
file_path="/outputs/voiceover.wav",
speaker_wav=voice_sample,
language="en"
)
print(f"Voiceover generated: {len(script)} chars of script")from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=2048)
SUBJECT = "Algebra 1"
SYSTEM = f"""You are a patient, encouraging {SUBJECT} tutor.
- Explain concepts step by step
- Use simple language, then build complexity
- Give practice problems after each concept
- If the student is wrong, guide them — don't give the answer
- Celebrate when they get it right"""
student_question = "I don't understand how to solve 2x + 5 = 13"
result = llm.generate([f"{SYSTEM}\n\nStudent: {student_question}"], params)
print(result[0].outputs[0].text)from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=1024)
details = {
"role": "Best man",
"couple": "Jake and Sarah",
"relationship": "Jake's college roommate, 8 years",
"stories": "Road trip where we got lost in Montana, Jake's terrible cooking phase",
"tone": "funny but heartfelt, not roast-level",
"length": "3-4 minutes"
}
result = llm.generate([f"""Write a {details['role']} speech for {details['couple']}'s wedding.
Relationship: {details['relationship']}
Stories to include: {details['stories']}
Tone: {details['tone']}
Length: {details['length']}
Structure: opening hook, 1-2 stories, transition to sincerity, toast."""], params)
print(result[0].outputs[0].text)from vllm import LLM, SamplingParams
import json
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=256)
gifts = json.load(open("/data/gift_list.json"))
for gift in gifts:
result = llm.generate([f"""Write a warm, personal thank you note.
From: Sarah and Jake
To: {gift['from']}
Gift: {gift['item']}
Relationship: {gift['relationship']}
Rules: mention the specific gift, reference the relationship,
keep under 80 words, sound genuine not generic."""], params)
print(f"To {gift['from']}:")
print(result[0].outputs[0].text)
print("---")from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=512)
item = {
"brand": "Lululemon",
"type": "Align High-Rise Pant 25\"",
"size": "6",
"color": "Dark Olive",
"condition": "Like new, worn twice",
"flaws": "None",
"platform": "Poshmark"
}
result = llm.generate([f"""Write a {item['platform']} listing that sells.
Brand: {item['brand']}
Item: {item['type']}
Size: {item['size']}, Color: {item['color']}
Condition: {item['condition']}
Flaws: {item['flaws']}
Include: SEO title (what buyers search), detailed description,
measurements reminder, styling suggestion. Make them click Buy Now."""], params)
print(result[0].outputs[0].text)from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=2048)
incident = {
"type": "Water damage - burst pipe",
"date": "March 15, 2026",
"location": "Kitchen and living room",
"damage": "Hardwood floors warped, drywall water stained, cabinet bottoms swollen",
"actions_taken": "Shut off water main, called plumber, took photos",
"estimated_cost": "$8,000-12,000"
}
result = llm.generate([f"""Write a formal insurance claim letter.
Incident: {incident['type']}
Date: {incident['date']}
Location: {incident['location']}
Damage: {incident['damage']}
Actions taken: {incident['actions_taken']}
Estimated cost: {incident['estimated_cost']}
Include: policy reference placeholder, chronological account,
itemized damage list, documentation references, requested next steps.
Professional tone. Maximize clarity for the adjuster."""], params)
with open("/outputs/claim.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Insurance claim drafted")Trading & Markets
Be Smarter Than the Crowd
Read filings before Bloomberg. Score sentiment before the price moves. Explain options flow in plain English. Your private edge.
from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.2, max_tokens=2048)
transcript = Path("/data/earnings_call.txt").read_text()
result = llm.generate([f"""Analyze this earnings call transcript.
Extract:
1. Revenue vs expectations (beat/miss/in-line)
2. EPS vs expectations
3. Guidance changes (raised/lowered/maintained)
4. Sentiment shift from last quarter
5. Key surprises the market hasn't priced in
6. Executive tone (confident/cautious/defensive)
7. Red flags or unusual language
Transcript:
{transcript[:12000]}"""], params)
with open("/outputs/earnings_report.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Earnings analysis complete")import requests, os, feedparser
from vllm import LLM, SamplingParams
EDGAR_RSS = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&type=8-K&dateb=&owner=include&count=20&search_text=&start=0&output=atom"
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(max_tokens=512)
feed = feedparser.parse(EDGAR_RSS)
for entry in feed.entries[:5]:
filing_url = entry.link
resp = requests.get(filing_url, timeout=15)
if resp.ok:
result = llm.generate([f"""Read this SEC filing. Extract:
1. Company name and ticker
2. Filing type
3. Material changes or events
4. Is this market-moving? (yes/no and why)
Filing:
{resp.text[:8000]}"""], params)
analysis = result[0].outputs[0].text
if "yes" in analysis.lower():
requests.post(WEBHOOK, json={"text": f"SEC ALERT:\n{analysis[:500]}"})import requests, json, os
from vllm import LLM, SamplingParams
from datetime import datetime
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
coins = ["BTC", "ETH", "SOL", "DOGE"]
for coin in coins:
# Fetch recent mentions (replace with your scraper)
resp = requests.get(f"https://your-scraper.com/api/{coin}/mentions?limit=50")
mentions = resp.json()
text_block = "\n".join([m["text"][:200] for m in mentions[:20]])
result = llm.generate([f"""Score sentiment for {coin}.
Return JSON: {{"coin": str, "sentiment": "bullish"/"bearish"/"neutral", "confidence": 0-100, "reason": str}}
Recent mentions:
{text_block}"""], params)
score = json.loads(result[0].outputs[0].text)
if score["confidence"] > 80:
requests.post(WEBHOOK, json={"text": f"{coin}: {score['sentiment']} ({score['confidence']}%) - {score['reason']}"})from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=1024)
flow_data = Path("/data/options_flow.csv").read_text()
result = llm.generate([f"""Analyze this unusual options activity.
For each significant trade explain:
1. What the trade is (calls/puts, strike, expiry)
2. What it likely means (bullish bet, hedge, income)
3. Why it matters (size relative to open interest)
4. What the trader might know
Be specific. No generic disclaimers.
Flow data:
{flow_data[:6000]}"""], params)
print(result[0].outputs[0].text)import requests, json
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.3, max_tokens=1024)
# Fetch active Polymarket contracts
markets = requests.get("https://gamma-api.polymarket.com/markets?active=true&limit=20").json()
for market in markets:
question = market["question"]
current_odds = market["outcomePrices"]
# Search for recent news
news = requests.get(f"https://google.serper.dev/search",
json={"q": question, "num": 3},
headers={"X-API-KEY": os.environ["SERPER_KEY"]}).json()
snippets = "\n".join([r["snippet"] for r in news.get("organic", [])])
result = llm.generate([f"""Prediction market question: {question}
Current odds: {current_odds}
Recent news:\n{snippets}
Is this market mispriced? What probability would you assign based on the news?
Return JSON: {{"mispriced": bool, "market_odds": float, "your_odds": float, "edge": float, "reasoning": str}}"""], params)
analysis = json.loads(result[0].outputs[0].text)
if analysis.get("mispriced"):
print(f"EDGE: {question[:60]} | Market: {analysis['market_odds']} | Yours: {analysis['your_odds']}")from vllm import LLM, SamplingParams
import json
llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.3, max_tokens=2048)
criteria = "Value stocks: P/E under 20, revenue growing, positive free cash flow, not financials"
stocks_data = open("/data/stock_fundamentals.json").read()
result = llm.generate([f"""Screen these stocks against the criteria.
For each that passes, explain WHY.
For each that almost passes, explain what disqualifies it.
Look deeper than surface numbers.
Criteria: {criteria}
Stock data:
{stocks_data[:10000]}"""], params)
print(result[0].outputs[0].text)import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
products = json.load(open("/data/tracked_products.json"))
for product in products:
prices = {}
for store in product["stores"]:
resp = requests.get(store["url"], headers={"User-Agent": "Mozilla/5.0"}, timeout=10)
prices[store["name"]] = store["parse_price"](resp.text)
spread = max(prices.values()) - min(prices.values())
if spread > 10:
cheapest = min(prices, key=prices.get)
highest = max(prices, key=prices.get)
result = llm.generate([f"""Arbitrage opportunity:
Product: {product['name']}
Buy at {cheapest}: ${prices[cheapest]}
Sell at {highest}: ${prices[highest]}
Spread: ${spread}
Is this real? Account for shipping, fees, return risk.
Return JSON: {{"real": bool, "net_profit": float, "risk": str}}"""], params)
print(json.loads(result[0].outputs[0].text))import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
# Fetch expiring domains (replace with your source)
domains = requests.get("https://member.expireddomains.net/export/expiring/").text.split("\n")
for domain in domains[:100]:
result = llm.generate([f"""Score this expiring domain for flip potential.
Domain: {domain}
Score 0-100 on: brandability, keyword value, industry relevance,
memorability, extension quality.
Return JSON: {{"domain": str, "score": int, "estimated_value": str, "best_industry": str, "reasoning": str}}"""], params)
score = json.loads(result[0].outputs[0].text)
if score.get("score", 0) > 75:
requests.post(WEBHOOK, json={"text": f"DOMAIN: {domain}\nScore: {score['score']}/100\nValue: {score['estimated_value']}\n{score['reasoning']}"})Freelancer Toolkit
Replace Your SaaS Stack
Stop paying monthly for tools you can build in one cell. Proposals, invoices, outreach, contracts — all private, all yours.
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=2048)
project = "Website redesign for a dental practice, 6 pages, modern design, mobile responsive"
timeline = "4 weeks"
rate = "$75/hour"
result = llm.generate([f"""Write a professional client proposal.
Project: {project}
Timeline: {timeline}
Rate: {rate}
Include:
1. Executive summary (2 sentences)
2. Scope of work (detailed deliverables)
3. Timeline with milestones
4. Pricing breakdown by phase
5. Terms (payment schedule, revisions, ownership)
Tone: professional but warm. Make the client feel confident."""], params)
with open("/outputs/proposal.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Proposal generated")from vllm import LLM, SamplingParams
from datetime import datetime
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=1024)
work = "Built 3 landing pages, 2 rounds of revisions, 1 logo concept"
client = "Acme Corp"
rate = 75
hours = 24
result = llm.generate([f"""Generate a professional invoice.
From: [Your Name], [Your Address]
To: {client}
Date: {datetime.now().strftime('%B %d, %Y')}
Due: Net 30
Work performed: {work}
Rate: ${rate}/hr
Hours: {hours}
Format as a clean, professional invoice with line items and total."""], params)
with open("/outputs/invoice.txt", "w") as f:
f.write(result[0].outputs[0].text)
print(f"Invoice: ${rate * hours} for {client}")from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=512)
target = {
"company": "Bloom Dental",
"what_they_do": "3-location dental practice in Austin",
"recent_news": "Just opened their third location",
"your_service": "website design and SEO"
}
result = llm.generate([f"""Write a cold email to {target['company']}.
They are: {target['what_they_do']}
Recent: {target['recent_news']}
You offer: {target['your_service']}
Rules:
- Reference something specific about THEM (not generic)
- One clear value proposition
- One specific CTA (not "let me know if you're interested")
- Under 150 words
- Sound like a human, not a template"""], params)
print(result[0].outputs[0].text)from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.7, max_tokens=8192)
freelancer = {
"name": "Sarah Chen",
"role": "Product Designer",
"projects": ["Fintech app redesign", "E-commerce checkout flow", "SaaS dashboard"],
"style": "minimal, clean, lots of whitespace",
"colors": "black, white, one accent color"
}
result = llm.generate([f"""Build a complete portfolio website in HTML + CSS.
Freelancer: {freelancer['name']} — {freelancer['role']}
Projects: {', '.join(freelancer['projects'])}
Style: {freelancer['style']}
Colors: {freelancer['colors']}
Single HTML file with embedded CSS. Responsive. Professional.
Include: hero section, about, projects grid, contact form, footer."""], params)
with open("/outputs/portfolio.html", "w") as f:
f.write(result[0].outputs[0].text)
print("Portfolio site generated")from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.2, max_tokens=2048)
contract = Path("/data/contract.txt").read_text()
result = llm.generate([f"""Review this contract from a freelancer's perspective.
Flag every clause that could hurt the freelancer.
For each issue:
1. Quote the exact clause
2. Explain the risk in plain English
3. Suggest a specific rewrite
4. Rate severity: LOW / MEDIUM / HIGH
Contract:
{contract[:8000]}"""], params)
with open("/outputs/contract_review.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Contract review complete")from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=1024)
violation = Path("/data/hoa_letter.txt").read_text()
result = llm.generate([f"""Draft a professional response to this HOA violation letter.
Rules:
- Address the specific violation cited
- If disputable, cite common HOA governance rules
- Respectful but firm tone
- Request specific evidence if not provided
- Propose a reasonable resolution timeline
- Never threatening, always professional
Violation letter:
{violation[:3000]}"""], params)
with open("/outputs/hoa_response.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("HOA response drafted")from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.3, max_tokens=2048)
dispute = {
"plaintiff": "Your name",
"defendant": "Contractor name",
"amount": "$3,500",
"state": "California",
"description": "Paid contractor for bathroom remodel, work never completed, won't return calls",
"evidence": "Contract, bank statements showing payment, photos of unfinished work, text messages"
}
result = llm.generate([f"""Generate small claims court filing documents for {dispute['state']}.
Plaintiff: {dispute['plaintiff']}
Defendant: {dispute['defendant']}
Amount: {dispute['amount']}
Description: {dispute['description']}
Evidence available: {dispute['evidence']}
Include: claim statement, factual summary, damages calculation,
list of evidence to attach. Format for the court."""], params)
with open("/outputs/small_claims_filing.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Filing documents generated")Content Creator
Scale Your Output
One blog post becomes 30 social posts. One topic becomes a full script. One keyword becomes a 2000-word article. Scale without hiring.
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=4096)
topic = "Why Most People Fail at Meal Prep (And How to Fix It)"
style = "casual, funny, fast-paced, lots of examples"
length = "10 minutes"
result = llm.generate([f"""Write a YouTube script.
Topic: {topic}
Style: {style}
Target length: {length}
Structure:
1. HOOK (first 30 sec) — pattern interrupt, make them stay
2. INTRO — what they'll learn, why it matters
3. CHAPTERS — 3-5 main points with examples
4. B-ROLL SUGGESTIONS in [brackets]
5. CTA — subscribe, comment prompt
6. END SCREEN — what to watch next
Write exactly how someone would SPEAK, not read."""], params)
with open("/outputs/script.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Script written")from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=2048)
content = Path("/data/blog_post.txt").read_text()
result = llm.generate([f"""Repurpose this content for social media.
Generate ALL of these:
## TWITTER (10 tweets)
- Each under 280 chars
- Mix: quotes, stats, hot takes, questions
- Include relevant hashtags
## LINKEDIN (3 posts)
- Professional tone, storytelling format
- 150-300 words each
## INSTAGRAM (2 captions)
- Engaging, emoji-friendly
- Include CTA and hashtags
Original content:
{content[:4000]}"""], params)
with open("/outputs/social_content.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("30 social posts generated from 1 article")from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.7, max_tokens=2048)
notes = Path("/data/weekly_notes.txt").read_text()
voice_examples = Path("/data/past_newsletters.txt").read_text()
result = llm.generate([f"""Write this week's newsletter.
My writing style (match this voice):
{voice_examples[:2000]}
This week's notes and links:
{notes[:3000]}
Structure:
- Punchy subject line
- Opening hook (1-2 sentences)
- 3-5 sections with headers
- Each section: insight + link + why it matters
- Closing with personal note
- P.S. with one ask"""], params)
with open("/outputs/newsletter.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Newsletter draft ready for review")from vllm import LLM, SamplingParams
import requests, os
llm = LLM(model="Qwen/Qwen3-32B")
params = SamplingParams(temperature=0.7, max_tokens=4096)
keyword = "best project management tools for small teams"
# Search for top-ranking articles
resp = requests.post("https://google.serper.dev/search",
json={"q": keyword, "num": 5},
headers={"X-API-KEY": os.environ["SERPER_KEY"]})
competitors = "\n".join([f"- {r['title']}: {r['snippet']}" for r in resp.json().get("organic", [])])
result = llm.generate([f"""Write a 2000-word SEO article targeting: "{keyword}"
Top-ranking competitors cover:
{competitors}
Your article must:
1. Cover everything competitors cover
2. Add sections they missed
3. Use the keyword naturally (not stuffed)
4. Include H2/H3 subheadings
5. Write a meta description (under 160 chars)
6. Suggest 3 internal link anchor texts"""], params)
with open("/outputs/seo_article.txt", "w") as f:
f.write(result[0].outputs[0].text)
print(f"SEO article for '{keyword}' complete")Automation
Set It and Forget It
Schedule it. Let it run. Wake up to results. Every script here runs unattended on a cron — your AI workforce that never sleeps.
import requests, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=1024)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
# Scrape industry news (replace with your sources)
sources = [
"https://your-industry-rss.com/feed",
"https://another-source.com/api/latest"
]
articles = []
for src in sources:
resp = requests.get(src, timeout=10)
articles.extend(resp.json()[:10])
headlines = "\n".join([f"- {a['title']}: {a['summary'][:100]}" for a in articles[:15]])
result = llm.generate([f"""Write a morning industry briefing.
5 bullets. Each bullet: what happened + why it matters.
Link to source. Keep it under 300 words total.
Today's news:
{headlines}"""], params)
briefing = result[0].outputs[0].text
requests.post(WEBHOOK, json={"text": f"Good morning. Here's your briefing:\n\n{briefing}"})
print("Briefing sent")import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
watchlist = json.load(open("/data/watchlist.json"))
for item in watchlist:
resp = requests.get(item["url"], headers={"User-Agent": "Mozilla/5.0"}, timeout=10)
result = llm.generate([f"""Is this a real deal or fake?
Product: {item['name']}
Current price: {item['current_price']}
Historical avg: {item['avg_price']}
Listed discount: {item.get('discount', 'unknown')}
Return JSON: {{"real_deal": bool, "savings_pct": float, "verdict": str}}"""], params)
analysis = json.loads(result[0].outputs[0].text)
if analysis.get("real_deal") and analysis.get("savings_pct", 0) > 20:
requests.post(WEBHOOK, json={"text": f"DEAL: {item['name']} - {analysis['verdict']}"})import requests, os, json
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
my_profile = "Senior frontend developer, React/TypeScript, 5 years, remote only, $150K+ target"
jobs = json.load(open("/data/scraped_jobs.json"))
top_matches = []
for job in jobs[:20]:
result = llm.generate([f"""Score this job match 0-100.
My profile: {my_profile}
Job: {job['title']} at {job['company']}
Description: {job['desc'][:1000]}
Return JSON: {{"score": int, "reason": str, "red_flags": str}}"""], params)
score = json.loads(result[0].outputs[0].text)
if score.get("score", 0) > 75:
top_matches.append(f"{score['score']}/100 — {job['title']} at {job['company']}\n{score['reason']}")
if top_matches:
msg = "Today's top job matches:\n\n" + "\n\n".join(top_matches[:5])
requests.post(WEBHOOK, json={"text": msg})from vllm import LLM, SamplingParams
import json
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.5, max_tokens=512)
reviews = json.load(open("/data/new_reviews.json"))
business = "Sunrise Dental — family dental practice in Phoenix"
for review in reviews:
result = llm.generate([f"""Write a response to this review for {business}.
Rating: {review['stars']} stars
Review: {review['text']}
Rules:
- 5 stars: thank them specifically for what they mentioned
- 3-4 stars: thank them, address their concern directly
- 1-2 stars: empathize, apologize, offer to make it right
- Always professional, never defensive
- Under 100 words
- Sign as "The Sunrise Dental Team" """], params)
print(f"{'⭐' * review['stars']} Response:")
print(result[0].outputs[0].text)
print("---")import json, requests, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
inventory = json.load(open("/data/inventory.json"))
for item in inventory:
result = llm.generate([f"""Analyze inventory for: {item['name']}
Current stock: {item['stock']} units
Daily sales avg: {item['daily_sales']} units
Lead time: {item['lead_days']} days
Should we reorder? If yes, how many units?
Return JSON: {{"reorder": bool, "quantity": int, "urgency": "low"/"medium"/"high", "reason": str}}"""], params)
analysis = json.loads(result[0].outputs[0].text)
if analysis.get("reorder"):
requests.post(WEBHOOK, json={"text": f"REORDER: {item['name']} - Qty: {analysis['quantity']}"})import requests, json, os
from vllm import LLM, SamplingParams
from datetime import datetime
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
my_listing = {"type": "1BR apartment", "location": "Austin, TX", "base_price": 120}
comps = json.load(open("/data/comparable_listings.json"))
events = json.load(open("/data/local_events.json"))
result = llm.generate([f"""Optimize tonight's Airbnb price.
My listing: {my_listing['type']} in {my_listing['location']}
Base price: ${my_listing['base_price']}
Day: {datetime.now().strftime('%A')}
Comparable listings tonight:
{json.dumps(comps[:10], indent=2)}
Local events this week:
{json.dumps(events[:5], indent=2)}
Return JSON: {{"recommended_price": int, "reasoning": str, "demand_level": "low"/"medium"/"high"}}"""], params)
print(json.loads(result[0].outputs[0].text))import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
competitors = ["com.competitor.app1", "com.competitor.app2"]
for app_id in competitors:
reviews = requests.get(f"https://your-scraper.com/api/reviews/{app_id}?days=1").json()
if not reviews: continue
review_text = "\n".join([f"{'⭐' * r['rating']} {r['text'][:150]}" for r in reviews[:20]])
result = llm.generate([f"""Analyze these app reviews from the last 24 hours.
App: {app_id}
Categorize into:
1. BUGS — what's broken
2. FEATURE REQUESTS — what they want
3. COMPLAINTS — what they hate
4. PRAISE — what they love
Reviews:
{review_text}"""], params)
requests.post(WEBHOOK, json={"text": f"Review digest for {app_id}:\n{result[0].outputs[0].text[:500]}"})import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.5, max_tokens=512)
expired = json.load(open("/data/expired_listings.json"))
for listing in expired[:10]:
result = llm.generate([f"""Write a personalized letter to a homeowner whose listing just expired.
Address: {listing['address']}
Days on market: {listing['days_on_market']}
List price: ${listing['price']}
Rules: empathetic not salesy, reference their specific property,
mention one possible reason it didn't sell, offer a free market analysis,
under 200 words."""], params)
print(f"Letter for {listing['address']}:")
print(result[0].outputs[0].text)Agents & Scrapers
The Internet Works for You
CPU agents that watch the internet 24/7 for almost nothing. When they find something, they think about it, then alert you. Your private intelligence network.
import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=256)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
listings = json.load(open("/data/scraped_listings.json"))
for item in listings:
result = llm.generate([f"""Is this underpriced?
Item: {item['title']}
Price: ${item['price']}
Condition: {item['condition']}
Location: {item['location']}
Estimate retail/market value. Is this a deal worth driving for?
Return JSON: {{"retail_value": int, "deal_score": 1-10, "profit_estimate": int, "verdict": str}}"""], params)
analysis = json.loads(result[0].outputs[0].text)
if analysis.get("deal_score", 0) >= 7:
requests.post(WEBHOOK, json={"text": f"DEAL: {item['title']} - ${item['price']}\nValue: ${analysis['retail_value']}\nProfit: ~${analysis['profit_estimate']}\n{item['url']}"})import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
my_capabilities = "IT consulting, web development, cloud migration, cybersecurity, small business"
contracts = requests.get("https://api.sam.gov/opportunities/v2/search?limit=20&api_key=" + os.environ["SAM_KEY"]).json()
for contract in contracts.get("opportunitiesData", []):
result = llm.generate([f"""Score this government contract for fit.
My capabilities: {my_capabilities}
Contract: {contract.get('title', '')}
Agency: {contract.get('fullParentPathName', '')}
Type: {contract.get('type', '')}
Description: {contract.get('description', '')[:1000]}
Return JSON: {{"fit_score": 0-100, "relevant_capabilities": list, "deadline": str, "estimated_value": str, "recommendation": str}}"""], params)
score = json.loads(result[0].outputs[0].text)
if score.get("fit_score", 0) > 60:
requests.post(WEBHOOK, json={"text": f"CONTRACT: {contract['title'][:80]}\nFit: {score['fit_score']}/100\n{score['recommendation']}"})import requests, json, os
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.2, max_tokens=512)
WEBHOOK = os.environ["TELEGRAM_WEBHOOK"]
student = {
"gpa": 3.5, "major": "Computer Science", "state": "Texas",
"ethnicity": "Hispanic", "activities": "robotics club, volunteer tutoring",
"year": "Junior", "financial_need": True
}
scholarships = json.load(open("/data/scholarships.json"))
matches = []
for s in scholarships:
result = llm.generate([f"""Does this student qualify for this scholarship?
Student: {json.dumps(student)}
Scholarship: {s['name']}
Requirements: {s['requirements']}
Amount: {s['amount']}
Deadline: {s['deadline']}
Return JSON: {{"qualifies": bool, "match_score": 0-100, "missing": str}}"""], params)
score = json.loads(result[0].outputs[0].text)
if score.get("qualifies"):
matches.append(f"{s['name']} — {s['amount']} (due {s['deadline']})")
if matches:
requests.post(WEBHOOK, json={"text": f"Found {len(matches)} scholarships:\n" + "\n".join(matches[:10])})from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=1024)
letter_type = "Security deposit demand (tenant)"
details = {
"tenant": "Jane Smith",
"landlord": "ABC Properties LLC",
"address": "123 Main St, Apt 4B",
"move_out_date": "March 1, 2026",
"deposit": "$2,400",
"issue": "Landlord has not returned deposit after 45 days, no itemized deductions provided"
}
result = llm.generate([f"""Write a formal {letter_type}.
Details: {json.dumps(details)}
Rules:
- Reference applicable state law (mention tenant should verify their state)
- Professional, firm, not threatening
- Include specific deadline for response
- Mention next steps if no response (small claims)
- Format as a proper business letter"""], params)
print(result[0].outputs[0].text)from vllm import LLM, SamplingParams
from pathlib import Path
llm = LLM(model="Qwen/Qwen3-14B")
params = SamplingParams(temperature=0.2, max_tokens=2048)
tax_docs = Path("/data/tax_documents.txt").read_text()
result = llm.generate([f"""Analyze these tax documents and prepare a summary.
1. Categorize all income sources (W2, 1099, other)
2. Identify potential deductions (home office, vehicle, business expenses, education, charitable)
3. Flag anything unusual that needs an accountant
4. Estimate tax liability vs payments made
Documents:
{tax_docs[:8000]}
IMPORTANT: This is for preparation only, not legal tax advice."""], params)
with open("/outputs/tax_summary.txt", "w") as f:
f.write(result[0].outputs[0].text)
print("Tax prep summary complete")Automotive
Every car owner needs help. Evaluating deals, understanding codes, planning maintenance. Build the tools mechanics charge $100/hr for.
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=1024)
listing = {
"year": 2020, "make": "Toyota", "model": "Camry SE",
"mileage": 45000, "price": 22500,
"condition": "Clean title, one owner, no accidents",
"location": "Dallas, TX"
}
result = llm.generate([f"""Evaluate this used car listing.
Car: {listing['year']} {listing['make']} {listing['model']}
Mileage: {listing['mileage']:,}
Asking price: ${listing['price']:,}
Condition: {listing['condition']}
Location: {listing['location']}
Analyze:
1. Fair market value range for this car
2. Is this price good, fair, or high?
3. What to offer
4. Red flags to check
5. Expected maintenance costs in next 12 months"""], params)
print(result[0].outputs[0].text)from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.3, max_tokens=512)
codes = ["P0420", "P0171"]
car = "2018 Honda Civic 1.5T"
result = llm.generate([f"""Diagnose these OBD-II codes for a {car}.
Codes: {', '.join(codes)}
For each code explain:
1. What it means in plain English
2. Common causes for THIS specific car
3. Severity: safe to drive? how urgent?
4. Estimated repair cost range
5. Can I DIY this or need a mechanic?
Be specific to the {car}, not generic."""], params)
print(result[0].outputs[0].text)Agriculture
500 million smallholder farmers worldwide. Most lose 20-40% of crops to problems they can identify too late. AI changes that equation.
from vllm import LLM, SamplingParams
import base64
llm = LLM(model="Qwen/Qwen3-7B-VL")
params = SamplingParams(temperature=0.3, max_tokens=512)
with open("/data/plant_photo.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
result = llm.generate([f"""Identify any disease in this plant photo.
1. Plant species (if identifiable)
2. Disease name and stage
3. Cause (fungal, bacterial, viral, nutrient deficiency)
4. Treatment options (organic and chemical)
5. Prevention for next season
6. Urgency: how fast does this spread?
Be specific and actionable for a farmer."""], params)
print(result[0].outputs[0].text)Fashion
Style is personal. AI that understands your closet, your body, your taste — not generic "wear blue with khaki" advice.
from vllm import LLM, SamplingParams
llm = LLM(model="Qwen/Qwen3-7B")
params = SamplingParams(temperature=0.7, max_tokens=1024)
closet = {
"tops": ["white oxford shirt", "navy crew neck sweater", "olive bomber jacket", "black turtleneck"],
"bottoms": ["dark wash jeans", "khaki chinos", "black dress pants", "grey joggers"],
"shoes": ["white sneakers", "brown chelsea boots", "black dress shoes"],
"occasion": "casual Friday at work",
"season": "fall"
}
result = llm.generate([f"""Generate 3 outfit combinations from this closet.
Tops: {', '.join(closet['tops'])}
Bottoms: {', '.join(closet['bottoms'])}
Shoes: {', '.join(closet['shoes'])}
Occasion: {closet['occasion']}
Season: {closet['season']}
For each outfit:
1. The combination
2. Why it works (color, proportion, vibe)
3. One accessory suggestion to elevate it"""], params)
print(result[0].outputs[0].text)Your code is yours.
Your data is yours.
This is the only way
to keep it that way.
Every time you paste code into ChatGPT, send a prompt to an API, or use an AI browser extension — your data leaves your control. Your code, your customer data, your proprietary logic — it's on someone else's server, training someone else's model, building someone else's competitive moat. SeqPU is built so that never happens. Your servers talk to your rented servers. Everything stays inside your Cloudflare edge. We do not train on your code. All your code and data belongs to you. We are here to support creators — not to take from them.
Why This Is Truly Private
When you run on SeqPU, here's what actually happens: your browser connects to Cloudflare's edge network — encrypted, under your control. Our CloudSafe edge worker validates your identity and injects a cryptographic origin proof. Without that proof, our backend rejects the request outright. Your job is created, your secrets are encrypted, and the work is queued to an isolated GPU container with an authenticated service token. That container is your GPU, your filesystem, your environment. It runs your code. It writes results back to your own database record. Your frontend picks up the results in real time.
At no point does a third party see your data. There is no shared pipe. There is no middleman with a backdoor into your execution environment. It's your server talking to your rented server over infrastructure you control. That's it.
Compare that to any AI API: you send a prompt, it goes to their server, it's processed on their hardware, stored in their logs, potentially used for their training. You have no visibility and no control. With SeqPU, you run the model yourself on your rented GPU. The data never leaves infrastructure you control.
Cloudflare Zero Trust & CloudSafe
Every request to SeqPU passes through Cloudflare Zero Trust and our CloudSafe edge worker — the same architecture that protects banks, governments, and the largest enterprises on the internet. This isn't bolted-on security. It's the foundation everything is built on.
CloudSafe is the front door to everything. Every single request — browser sessions, SDK calls, headless API, Telegram bots, Discord bots, Slack integrations, WhatsApp integrations — all hit CloudSafe first. Nothing reaches your compute without passing through it.
- Two authentication paths — browser users get Firebase token validation. SDK and headless users get Cloudflare Access service tokens with JWT assertion. Both paths inject a cryptographic origin proof that the backend requires. No proof, no access.
- Every request is traced — origin proof, request type, real client IP, unique request ID. Full accountability at every hop.
- DDoS dies at the edge — Cloudflare's global network absorbs volumetric attacks. Your GPU never sees them.
- Malicious payloads die at CloudSafe — validated and filtered before they ever reach your execution environment.
The 10-Layer Security System
CloudSafe is just the perimeter. Behind it, SeqPU runs a 10-layer deep security architecture from edge to execution. We don't publish the specifics of each layer because that's how security works — the less an attacker knows about what's between them and your data, the better.
What we will tell you: every hop is authenticated. Every payload is validated. Every secret is encrypted. Every container is isolated. Every session is ephemeral. Ten layers, each independent, each capable of stopping an attack on its own. You'd have to break all ten to reach anything — and by the time you've hit the second, we already know you're there.
We Do Not Train On Your Code
This is not a qualified statement. This is not "we don't train on your code unless you opt in." This is not "we anonymize your data for research." We do not train on your code. Period.
Your code, your data, your prompts, your model outputs — all of it belongs to you. The architecture has no collection mechanism. Your code lives in the job payload, executes on your rented GPU, and results write back to your own database record. There is literally nowhere for training data to go. No pipeline, no ingestion, no interest in building one.
MCO — When You Do Need an External AI
Running models locally on SeqPU keeps everything private by default. But sometimes you need to call an external AI — OpenAI, Anthropic, Google, Mistral, Groq. The moment you do, your data leaves your environment. That's where MCO (Model Context Orchestration) comes in.
MCO is a product that lives at the Cloudflare edge — between you and any AI provider. It scans your requests before they leave and scans responses before they come back. It's the security layer that protects you when your data has to cross a boundary.
- Prompt injection defense — catches jailbreak attempts, instruction overrides, roleplay attacks, and chat markup injection before they reach the model.
- Secret scanning — detects API keys, tokens, private keys, and credentials from every major provider that might accidentally be in your prompts. Catches them before they leave your environment.
- PII detection — flags emails, phone numbers, Social Security numbers, and credit card numbers before they're sent to a third-party AI.
- SSRF & path traversal prevention — blocks attempts to reach internal infrastructure or escape the intended scope.
- Bring Your Own Key — use your own API keys for any supported provider. MCO scans input, proxies to the provider, scans the output, returns it clean. AI safety as a layer, not a replacement.
- Four strictness levels — low, medium, high, paranoid. You choose how aggressive the scanning is based on your risk tolerance.
- Atomic billing — MCO credits run on dedicated infrastructure with single-threaded access per user. No race conditions. No overbilling. $0.001 per 1K tokens scanned.
Every other platform lets your data flow to AI providers unchecked. MCO is the checkpoint. The only product purpose-built to defend against LLM-vectored attacks on your data in transit.
Server-to-Server — The Closed Loop
Here's what makes this fundamentally different from every other AI platform: it's your servers talking to your rented servers. That's the entire data flow. No third party sits in the middle.
Your browser connects to your Cloudflare edge. CloudSafe authenticates you and forwards to backend services. Your job is created, your secrets are encrypted, your code is queued to an isolated GPU container. The container runs your code, writes results to your own database record, and your frontend picks them up in real time. The model runs on your rented hardware. The data stays on your rented hardware.
This is the same privacy model as owning a physical server in a data center — except it scales from CPU to 384GB of GPU in seconds and costs nothing when it's not running. The privacy of on-premise. The flexibility of cloud. The cost of serverless.
Why We Built It This Way
Most platforms treat your data as the product. They offer free tiers and cheap access because your usage is the value. Your prompts train their models. Your workflows inform their roadmap. Your data feeds their competitive moat.
We built SeqPU the opposite way. You pay for compute by the second. That's our revenue. We have zero incentive to touch your data because your data has zero value to our business model. Our incentive is to make your compute faster, cheaper, and more private — because that's what keeps you paying for seconds.
We are here to support creators. To give builders a platform where they can innovate without looking over their shoulder. To make sure the people who create the value are the people who capture the value. Create. Innovate. Profit. That's the deal.
What do you have?
How do we unlock it?
How do we make it faster?
Our first question is never "what do you want to buy." It's what compute do you already own, how do we unlock its hidden potential, and how do we start organizing your data to make your systems faster and smarter from day one.
Your Data Never Leaves
Every AI API you use — ChatGPT, Claude, Gemini — your data flows through their servers. Every prompt. Every document. Every customer record. One prompt at a time, dripping through a leaky bucket.
Securing your data is only half the battle. It still needs to be used securely. API calls, productivity platforms, browser extensions — anything can be the leak. We turn your existing compute infrastructure into a private internet of compute where you can run models as powerful as Claude on your own hardware, so your data never leaves.
For regulated industries — healthcare (HIPAA), legal (privilege), financial (SOX/PCI), government (ITAR/CMMC) — this isn't a feature. It's a requirement.
CPU First — Play With What You Have
We start by assessing your existing on-premise compute infrastructure, cloud credits, and API credits. Most organizations have more compute under the hood than they realize.
- CPUs are cheap and already inside your business — we put them to work as the backbone of your private compute network, handling tasks and inference without touching a GPU.
- Most AI tasks don't need a GPU. Classification, extraction, routing, simple Q&A — a 7B model on a CPU handles it. No GPU needed.
- When the work demands it, we move to GPUs — large model inference, image generation, video processing, 3D rendering. The heavy creative and analytical work that no CPU can touch.
- When a specialized API is the right call, we act as the buffer. An orchestrator model sanitizes your data before it ever leaves your network. Then we work on bringing that capability securely in-house — built for you, running internally, tuned to your needs.
The Three Stages
| The 4-year capex math | Reality |
|---|---|
| To beat serverless pricing | 12hrs/day × 4yrs |
| Most real workloads run at | 2–4hrs/day |
| Your capex risk with SeqPU | $0 |
What We Build For You
We see AI as a dynamic new member of your team. There are core skills every system needs, but the rest is built around YOUR business.
Every business creates its own unique AI advantages. With access to your data — sales numbers, contracts, customer records, operational data, internal docs — you can generate insights, automations, and competitive advantages no one else has.
The biggest wins we find: communication breakdowns costing you money. Let AI monitor across your business processes, catching missed steps, keeping everyone in the loop before small things become big problems.
Real Examples
- Pool idle hospital CPUs → scale AI inference for clinical teams. HIPAA compliant. Data never leaves the building.
- Existing business servers → internal AI chatbot for your team, zero new hardware. Answers questions about your own docs and processes.
- Legacy infrastructure → modern agentic workflows layered on top. Your old servers learn new tricks.
- Private data + local models → sovereign IP pipeline, fully on-prem. Your competitive intelligence stays yours.
Deployment Options
The Shift
In 1995, the problem wasn't "where do I get a server." Servers were everywhere. The real problem was how do you make servers work together to build something none of them could build alone. That's what TCP/IP, HTTP, and the web browser solved — not more servers, not cheaper servers. A protocol for collaboration.
Every GPU company today is solving the 1995 server problem. SeqPU is solving the 1996 internet problem: "Given that GPUs exist everywhere, how do I make them think together?"
| Old World | New World |
|---|---|
| "I need a GPU" | "I need a problem solved" |
| Rent hardware | Orchestrate capability |
| One model, one GPU, one call | Many specialists, many chips, one pipeline |
| You manage the infrastructure | The network manages the infrastructure |
| Pay for hardware time | Pay for thinking |
Why Nobody Else Landed Here
- GPU Clouds (CoreWeave, Lambda, Hyperscalers) — bought 100,000 GPUs. Incentivized to sell you the biggest model on the biggest GPU for the longest time. Efficiency is their enemy.
- Marketplaces (Akash, io.net) — take a cut of every transaction. They don't care if you're wasting compute. They still get their cut.
- Token APIs (OpenAI, Anthropic, Together AI) — charge per token. Sending a 50K-token page to a 70B instead of a 7B first? 100× more revenue for them.
Pay for compute.
Not tokens.
You pay for the GPU time your code uses — by the second. When it's not running, you pay nothing. No seat fees. No monthly minimums. No idle cost. No token tax. You're paying for the machine, not the middleman.
How Billing Works
You click Run All. The GPU spins up. The meter starts. Your code runs. Your code finishes. The GPU spins down. The meter stops. You're billed for exactly the seconds it ran. Not the minute. Not the hour. The second.
Nothing running? $0. Nobody calling your API? $0. Tuesday at 3am and nobody's using your tool? $0. You only pay when compute is actively running.
GPU Pricing
| GPU | VRAM | Per Second | Per Hour |
|---|---|---|---|
| T4 | 16 GB | $0.000164 | $0.59 |
| L4 | 24 GB | $0.000222 | $0.80 |
| A10G | 24 GB | $0.000306 | $1.10 |
| L40S | 48 GB | $0.000542 | $1.95 |
| A100 40GB | 40 GB | $0.000583 | $2.10 |
| A100 80GB | 80 GB | $0.000694 | $2.50 |
| RTX PRO 6000 | — | $0.000842 | $3.03 |
| H100 | 80 GB | $0.001097 | $3.95 |
| H200 | 141 GB | $0.001261 | $4.54 |
| B200 | 192 GB | $0.001736 | $6.25 |
CPU Pricing
$0.0000131 per core per second — that's $0.047 per core per hour.
Minimum 0.125 cores per container. CPU handles everything that doesn't need a GPU: API calls, web scraping, data processing, orchestration, email, file manipulation. Most agent work is CPU. 1,000 messages/day on CPU costs about $1.40/day.
Memory Pricing
$0.00000222 per GiB per second
- Per hour: $0.008/GiB
- Per day: $0.19/GiB
Memory is your running container's RAM — separate from GPU VRAM which is included in the GPU price. Most jobs use the default allocation. You don't think about this unless running something very memory-heavy.
What Things Actually Cost
| What you do | GPU | Time | Cost |
|---|---|---|---|
| Summarize a document | A100 80GB | 8 sec | $0.006 |
| Translate a paragraph | T4 | 3 sec | $0.0005 |
| Generate an image | L40S | 15 sec | $0.008 |
| Transcribe 1 hour of audio | T4 | 5 min | $0.05 |
| CPU script (API call + format) | CPU 2 cores | 2 sec | $0.00005 |
| Process 100 receipts (vision) | L40S | 10 min | $0.33 |
| Deep research query | H200 | 30 sec | $0.038 |
The Economics — API vs Your Own Model
API pricing is a premium. You're paying for convenience — and giving up your data. Every prompt, every document, every customer record flows through someone else's servers.
| Approach | Cost per 1M tokens | Your data |
|---|---|---|
| GPT-4o | $2.50–10/M | Sent to OpenAI |
| Claude Sonnet | $3–15/M | Sent to Anthropic |
| Gemini Pro | $1.25–5/M | Sent to Google |
| Your 7B on T4 | ~$2/M | Stays on your server |
| Your 14B on A100 | ~$8/M | Stays on your server |
| Your 32B on A100 | ~$17/M | Stays on your server |
API calls are a premium — they better be worth it. Because they also take your data. A specific AI built for YOUR task, on YOUR hardware, with YOUR data, will usually pull better results than a general AI built for everyone. A 7B model tuned on your customer support data outperforms GPT-4 on your support tasks — and costs 6x less per token.
The Markup — How You Earn
Set 0-30% markup when you publish a tool. The caller pays compute + your markup. You keep the difference on every single call, 24/7.
- You build a translation API on T4. Compute: $0.001/call. You set 20% markup. Caller pays $0.0012. You keep $0.0002.
- At 10,000 calls/day: $2/day = $60/month passive.
- You build a document summarizer on A100. Compute: $0.006/call. You set 25%. Caller pays $0.0075. You keep $0.0015.
- At 2,000 calls/day: $3/day = $90/month passive.
The same markup game the API providers play — except now you're the provider.
Pricing
Pay-as-you-go billing, by the second. $0 when idle. Add credits via Stripe to start. Below: what $1.00 of credits (1,000,000 micro-dollars) buys you on each hardware tier.
- ~170 calls on A100 80GB at 8 seconds each
- ~6,000 calls on T4 at 3 seconds each
- ~33,000 CPU runs at 2 seconds each
Enough to build something, test it, and prove it works.
Storage
| What | How long | Cost |
|---|---|---|
| Code, secrets, GPU selection | Forever | Free |
| Model cache | Persists across all projects | Included |
| Uploaded files | Persists for the project | Storage rate/GiB |
| Output files | Until you clear them | Storage rate/GiB |
| Nothing running | — | $0 |