Files
Momento/_bmad-output/implementation-artifacts/spec-ci-cd-pipeline-improvement.md
Antigravity 65e722a184
Some checks failed
CI / Lint, Test & Build (push) Waiting to run
Deploy to Production / Build and Deploy (push) Has been cancelled
fix: disable noisy lint rules, exclude .venv-i18n, 0 errors 0 warnings
2026-05-16 23:38:11 +00:00

99 lines
6.8 KiB
Markdown

---
title: 'CI/CD Pipeline Improvement'
type: 'chore'
created: '2026-05-16'
status: 'in-progress'
context:
- '{project-root}/.gitea/workflows/deploy.yaml'
- '{project-root}/memento-note/package.json'
- '{project-root}/docker-compose.yml'
---
<frozen-after-approval reason="human-owned intent — do not modify unless human renegotiates">
## Intent
**Problem:** The CI/CD pipeline (`.gitea/workflows/deploy.yaml`) deploys directly on push to main with zero validation — no lint, no tests, no build check. A broken push causes immediate downtime on the production server (192.168.1.190). There is no rollback mechanism and no notification when deployments succeed or fail.
**Approach:** Add a CI validation pipeline (lint + typecheck + unit tests + build) that runs before the deploy pipeline. Add automatic rollback on deploy failure. Send Telegram notifications on deploy success/failure. Keep the push-to-main trigger.
## Boundaries & Constraints
**Always:**
- All CI steps must run in Gitea Actions (self-hosted runner, ubuntu-24.04)
- Deploy remains on push to main (same trigger)
- Never use destructive DB commands in CI
- Keep SSH-based deploy to 192.168.1.190
- Use existing npm scripts where available (`npm run build`, `npm run test:unit`)
**Ask First:**
- Adding new npm dependencies (e.g. ESLint packages)
- Changing the Docker build process
- Modifying the production server entrypoint
**Never:**
- No cloud CI providers (GitHub Actions, CircleCI, etc.) — self-hosted Gitea only
- No deployment to a different server
- No E2E (Playwright) tests in CI — too heavy for the runner, keep local only
- No modification to the Dockerfile or docker-compose.yml structure
## I/O & Edge-Case Matrix
| Scenario | Input / State | Expected Output / Behavior | Error Handling |
|----------|--------------|---------------------------|----------------|
| Push to main (all green) | Valid code, lint clean, tests pass, build OK | CI runs → deploy → health check → Telegram success notification | N/A |
| Push to main (lint fail) | Code with lint errors | CI fails at lint step, deploy does NOT run, Telegram failure notification | Pipeline stops, no deploy |
| Push to main (tests fail) | Lint passes but unit tests fail | CI fails at test step, deploy does NOT run, Telegram failure notification | Pipeline stops, no deploy |
| Push to main (build fail) | Lint+tests pass but `next build` fails | CI fails at build step, deploy does NOT run, Telegram failure notification | Pipeline stops, no deploy |
| Deploy succeeds but app unhealthy | App returns 5xx after 180s | Health check fails → rollback to previous container → Telegram failure notification | Rollback via `docker tag` + restore |
| Deploy succeeds, app healthy | HTTP < 500 within 180s | Telegram success notification with app version/timestamp | N/A |
| Manual workflow_dispatch | User clicks "Run" in Gitea | Same pipeline as push to main | Same error handling |
</frozen-after-approval>
## Code Map
- `.gitea/workflows/deploy.yaml` — Current deploy pipeline (SSH-based, single job)
- `.gitea/workflows/ci.yaml`**NEW** CI validation pipeline (lint + test + build)
- `memento-note/package.json` — Needs `lint` script added
- `memento-note/eslint.config.mjs`**NEW** ESLint flat config
- `memento-note/tsconfig.json` — Already has `strict: true`
## Tasks & Acceptance
**Execution:**
- [ ] `memento-note/eslint.config.mjs` — Create ESLint flat config with Next.js + TypeScript rules (no Prettier — keep it simple, lint-only)
- [ ] `memento-note/package.json` — Add `"lint": "eslint . --ext .ts,.tsx"` script and `eslint` + `@typescript-eslint/*` + `eslint-config-next` devDependencies
- [ ] `.gitea/workflows/ci.yaml` — Create CI pipeline: checkout → Node 22 setup → `npm ci``npx prisma generate``npm run lint``npm run test:unit``npm run build`. Triggered on push to main and on pull_request. Uses Gitea cache for node_modules.
- [ ] `.gitea/workflows/deploy.yaml` — Refactor: add `needs: ci` job dependency so deploy only runs after CI passes. Add rollback step: before deploy, save current Docker image tag; on health-check failure, restore previous image and restart. Add Telegram notification step (success + failure) using `curl` to Telegram Bot API with `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` secrets.
- [ ] `.gitea/workflows/deploy.yaml` — Add pre-deploy backup step: `docker tag memento-note_memento-note memento-note_memento-note:rollback` before building new image.
**Acceptance Criteria:**
- Given a push to main with lint errors, when CI runs, then the pipeline fails at lint and deploy does NOT execute
- Given a push to main with failing unit tests, when CI runs, then the pipeline fails at tests and deploy does NOT execute
- Given a push to main with valid code, when CI passes, then deploy runs and Telegram receives a success notification
- Given a deploy where the app fails health check, when rollback triggers, then the previous Docker image is restored and the app returns to its pre-deploy state
- Given a push to a non-main branch (or PR), when CI runs, then lint+test+build execute but deploy does NOT trigger
## Design Notes
**ESLint config strategy:** Use the flat config format (`eslint.config.mjs`) with Next.js core-web-vitals + TypeScript strict rules. No Prettier integration — the project doesn't use it and adding it now would create 500+ formatting noise commits. Focus on actual code quality: unused vars, type errors, React hooks rules, import ordering.
**Rollback strategy:** Before each deploy, tag the running Docker image as `:rollback`. On health-check failure, retag `:rollback` back to the active tag and restart. This is lightweight and doesn't require a separate registry.
**Telegram notification:** Use a simple `curl` POST to `https://api.telegram.org/bot{TOKEN}/sendMessage` with `chat_id` and a formatted message. The bot token and chat ID are stored as Gitea secrets (`TELEGRAM_BOT_TOKEN`, `TELEGRAM_CHAT_ID`). The user creates a bot via @BotFather and gets the chat ID by messaging the bot then querying `getUpdates`.
**Two-workflow architecture:** `ci.yaml` runs on all branches and PRs. `deploy.yaml` runs only on main push and `workflow_dispatch`, with `needs: [ci]` to gate on CI passing. This means PRs get fast feedback (lint/test/build in ~2-3 min) while deploys get the full safety net.
## Verification
**Commands:**
- `cd memento-note && npm run lint` — expected: 0 exit code (or only pre-existing warnings)
- `cd memento-note && npm run test:unit` — expected: all tests pass
- `cd memento-note && npm run build` — expected: build succeeds
**Manual checks:**
- Push a branch with a lint error → verify CI fails in Gitea UI
- Push to main with valid code → verify Telegram receives notification
- Verify rollback Docker image exists on server after deploy (`docker images | grep rollback`)