← back · Case 03 of 4

內容自動化產線
The Content Automation Pipeline

定位:本案是「元證據」——你現在看的這個作品集網站、它的相關內容、就是這套系統產出的延伸。

🇹🇼 中文

一句話

一個人、一台 Mac mini、把「每天產一支跨三平台社群影片」這件事、做成一條我幾乎不用插手的產線——到目前為止已自主產出並上架 32 支經過機器自我查證的影片。

問題

我的本業是工務工程師、真正能碰個人專案的、一天只有一到三小時。

社群經營的殘酷現實是:它要「持續」才有用、但「持續」恰恰最吃時間——選題、寫稿、配音、做圖、剪輯、三個平台各自上傳、還要回頭確認有沒有發成功、有沒有重複發。

任何一步要人盯、這件事就撐不過兩個禮拜。

所以我沒有把它當「內容工作」做、我把它當「一條要被自動化的產線」來設計。

怎麼用 AI 落地

我把整條鏈拆成可被機器接手的關卡、每一關都有明確的輸入、輸出與守門條件:

  1. 選題:四支柱主題輪替、自動排出下一支該做什麼。
  2. 腳本:依一套敘事骨架生稿、再由我做最後的語氣與事實把關(人不退場的那一關)。
  3. 配音:用我本人聲音的克隆模型轉語音、斷句先過我耳朵才往下走。
  4. 畫面:文字卡 + 逐段生成的背景圖、生圖強制帶負面條件、每張機器 + 人雙重檢查、杜絕亂生人物或文字。
  5. 雙重守門:發布前先過「內容守門員」(紅線檢查)與「規格守門員」(尺寸 / 格式 / 長度、各平台規則不同)。
  6. 發布 + 自我查證閉環:發到 Facebook / YouTube / Instagram、發完機器再回讀線上貼文、確認它真的存在、標題沒被改、才寫進已發清單;內建防重複發機制、避免同一支被發兩次。

關鍵設計原則:人只留在機器無法替代的那 10%——也就是最終的判斷與品味(語氣對不對、事實真不真、這題能不能碰)。其餘 90% 全部交給流程。

量化結果

用到的技術

語音克隆 TTS、文生圖(本機擴散模型)、無頭瀏覽器截圖 + 影片合成、各平台官方 API、API 回讀自證、防重入(idempotency)設計、排程。整條線單人設計、單人維運。

這個案例證明什麼

我不是工程師出身。但我能把一件「聽起來要一個小團隊每天做」的事、拆解成一條一個人就能養、而且養得起的自動化系統。AI 落地的本事、不在會不會寫程式、在於分得清哪一步該交給機器、哪一步必須留給人。

🇬🇧 English

One-liner

One person, one Mac mini — I turned "publish a short social video across three platforms every day" into a pipeline I barely have to touch. To date it has autonomously produced and shipped 32 machine-self-verified videos.

The Problem

My day job is a site engineer. The time I can spend on personal projects is one to three hours a day.

Social media has a brutal rule: it only works if it's consistent — and consistency is exactly what eats time. Pick a topic, write the script, voice it, make the visuals, edit, upload to three platforms separately, then go back and confirm each one actually posted and wasn't duplicated.

If any step needs a human watching it, the whole thing dies within two weeks.

So I didn't treat this as "content work." I designed it as a production line to be automated.

How I Landed AI on It

I broke the chain into machine-ownable stages, each with a clear input, output, and gate:

  1. Topic — four content pillars on rotation; the system queues what's next.
  2. Script — drafted against a narrative framework, then a final human pass for tone and factual accuracy (the step a human never leaves).
  3. Voice — text-to-speech using a clone of my own voice; phrasing is approved by ear before anything proceeds.
  4. Visuals — text cards over per-segment generated backgrounds; image generation runs with hard negative constraints and a machine-plus-human check on every frame to prevent stray people or text.
  5. Dual gatekeeping — before publishing, a content gate (red-line checks) and a spec gate (size/format/length, different per platform).
  6. Publish + self-verification loop — posts to Facebook / YouTube / Instagram, then reads each live post back via API to confirm it truly exists and the caption wasn't altered before logging it; a built-in idempotency guard prevents the same video being posted twice.

Core principle: the human stays only in the 10% a machine can't replace — final judgment and taste (is the tone right, is the fact true, is this topic safe to touch). The other 90% is owned by the process.

Measurable Results

Tech Used

Voice-clone TTS, text-to-image (local diffusion), headless-browser capture + video composition, official platform APIs, API read-back verification, idempotency design, scheduling. Designed and operated solo.

What This Case Proves

I'm not a trained engineer. But I can take something that "sounds like it needs a small team doing it daily" and break it into an automated system one person can run — and afford to run. The skill of landing AI isn't whether you can code; it's knowing exactly which step belongs to the machine, and which must stay with a human.

← Case 02 · Case 04 — 20 Years of Site Judgment →