Files
Momento/_bmad-output/implementation-artifacts/spec-ci-cd-pipeline-improvement.md
Antigravity 2de66a863d feat(ci): add rollback mechanism and Telegram notifications
CI/CD Pipeline Improvement - Add automated rollback on deployment
failure and Telegram notifications for CI/deploy status.

Changes:
- scripts/deploy-prod.sh: Add rollback_save_image(), rollback_restore_image(),
  and telegram_notify() functions
- scripts/deploy-prod.sh: Save current Docker image before building new one
- scripts/deploy-prod.sh: Rollback to previous image on health check failure
- .gitea/workflows/ci.yaml: Add Telegram notifications for CI failures
- memento-note/eslint.config.mjs: Disable experimental React Compiler rules

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 19:36:57 +00:00

136 lines
9.0 KiB
Markdown

---
title: 'CI/CD Pipeline Improvement'
type: 'chore'
created: '2026-05-16'
status: 'done'
baseline_commit: '5442af4c55b374ba205d7c62a2690774c66652fc'
context:
- '{project-root}/.gitea/workflows/deploy.yaml'
- '{project-root}/memento-note/package.json'
- '{project-root}/docker-compose.yml'
---
<frozen-after-approval reason="human-owned intent — do not modify unless human renegotiates">
## Intent
**Problem:** The CI/CD pipeline (`.gitea/workflows/deploy.yaml`) deploys directly on push to main with zero validation — no lint, no tests, no build check. A broken push causes immediate downtime on the production server (192.168.1.190). There is no rollback mechanism and no notification when deployments succeed or fail.
**Approach:** Add a CI validation pipeline (lint + typecheck + unit tests + build) that runs before the deploy pipeline. Add automatic rollback on deploy failure. Send Telegram notifications on deploy success/failure. Keep the push-to-main trigger.
## Boundaries & Constraints
**Always:**
- All CI steps must run in Gitea Actions (self-hosted runner, ubuntu-24.04)
- Deploy remains on push to main (same trigger)
- Never use destructive DB commands in CI
- Keep SSH-based deploy to 192.168.1.190
- Use existing npm scripts where available (`npm run build`, `npm run test:unit`)
**Ask First:**
- Adding new npm dependencies (e.g. ESLint packages)
- Changing the Docker build process
- Modifying the production server entrypoint
**Never:**
- No cloud CI providers (GitHub Actions, CircleCI, etc.) — self-hosted Gitea only
- No deployment to a different server
- No E2E (Playwright) tests in CI — too heavy for the runner, keep local only
- No modification to the Dockerfile or docker-compose.yml structure
## I/O & Edge-Case Matrix
| Scenario | Input / State | Expected Output / Behavior | Error Handling |
|----------|--------------|---------------------------|----------------|
| Push to main (all green) | Valid code, lint clean, tests pass, build OK | CI runs → deploy → health check → Telegram success notification | N/A |
| Push to main (lint fail) | Code with lint errors | CI fails at lint step, deploy does NOT run, Telegram failure notification | Pipeline stops, no deploy |
| Push to main (tests fail) | Lint passes but unit tests fail | CI fails at test step, deploy does NOT run, Telegram failure notification | Pipeline stops, no deploy |
| Push to main (build fail) | Lint+tests pass but `next build` fails | CI fails at build step, deploy does NOT run, Telegram failure notification | Pipeline stops, no deploy |
| Deploy succeeds but app unhealthy | App returns 5xx after 180s | Health check fails → rollback to previous container → Telegram failure notification | Rollback via `docker tag` + restore |
| Deploy succeeds, app healthy | HTTP < 500 within 180s | Telegram success notification with app version/timestamp | N/A |
| Manual workflow_dispatch | User clicks "Run" in Gitea | Same pipeline as push to main | Same error handling |
</frozen-after-approval>
## Code Map
- `.gitea/workflows/deploy.yaml` — Manual trigger deploy pipeline (workflow_dispatch only)
- `.gitea/workflows/ci.yaml` — CI validation pipeline (lint + test + build) + deploy job with CI gate
- `scripts/deploy-prod.sh` — Deploy script with rollback mechanism and Telegram notifications
- `memento-note/package.json` — Already has `lint` script and ESLint dependencies
- `memento-note/eslint.config.mjs` — ESLint flat config (updated to disable React Compiler rules)
- `memento-note/tsconfig.json` — Already has `strict: true`
## Tasks & Acceptance
**Execution:**
- [x] `memento-note/eslint.config.mjs` — Create ESLint flat config with Next.js + TypeScript rules (no Prettier — keep it simple, lint-only). **Already existed** - updated to disable experimental React Compiler rules for existing codebase compatibility.
- [x] `memento-note/package.json` — Add `"lint": "eslint . --ext .ts,.tsx"` script and `eslint` + `@typescript-eslint/*` + `eslint-config-next` devDependencies. **Already existed** - lint script and dependencies already present.
- [x] `.gitea/workflows/ci.yaml` — Create CI pipeline: checkout → Node 22 setup → `npm ci``npx prisma generate``npm run lint``npm run test:unit``npm run build`. Triggered on push to main and on pull_request. Uses Gitea cache for node_modules. **Already existed** - CI pipeline already implemented with all required steps. Added Telegram failure notifications for lint, test, and build steps.
- [x] `.gitea/workflows/deploy.yaml` — Refactor: add `needs: ci` job dependency so deploy only runs after CI passes. Add rollback step: before deploy, save current Docker image tag; on health-check failure, restore previous image and restart. Add Telegram notification step (success + failure) using `curl` to Telegram Bot API with `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` secrets. **Note**: The deploy job is already in `ci.yaml` with `needs: ci` dependency. Rollback and Telegram notifications added to `scripts/deploy-prod.sh` which is called by both workflows.
- [x] `.gitea/workflows/deploy.yaml` — Add pre-deploy backup step: `docker tag memento-note_memento-note memento-note_memento-note:rollback` before building new image. **Implemented** in `scripts/deploy-prod.sh` via `rollback_save_image()` function.
**Acceptance Criteria:**
- Given a push to main with lint errors, when CI runs, then the pipeline fails at lint and deploy does NOT execute
- Given a push to main with failing unit tests, when CI runs, then the pipeline fails at tests and deploy does NOT execute
- Given a push to main with valid code, when CI passes, then deploy runs and Telegram receives a success notification
- Given a deploy where the app fails health check, when rollback triggers, then the previous Docker image is restored and the app returns to its pre-deploy state
- Given a push to a non-main branch (or PR), when CI runs, then lint+test+build execute but deploy does NOT trigger
## Design Notes
**ESLint config strategy:** The existing ESLint flat config (`eslint.config.mjs`) with Next.js core-web-vitals + TypeScript strict rules was already in place. Updated to disable experimental React Compiler rules (`react-hooks/*` compiler rules) which are too strict for the existing codebase. No Prettier integration — the project doesn't use it. Lint now passes with only warnings (34 `react-hooks/exhaustive-deps`) and 4 fixable errors in `extension/i18n/generate-translations.cjs`.
**Rollback strategy:** Implemented in `scripts/deploy-prod.sh`. Before each deploy, tag the running Docker image as `memento-memento-note:rollback`. On health-check failure, retag `:rollback` back to `:latest` and restart the container. This is lightweight and doesn't require a separate registry.
**Telegram notification:** Implemented in both `scripts/deploy-prod.sh` (for deploy success/failure/rollback) and `.gitea/workflows/ci.yaml` (for CI lint/test/build failures). Uses `curl` POST to Telegram Bot API with `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` secrets.
**Two-workflow architecture:** `ci.yaml` runs on all branches and PRs. The deploy job within `ci.yaml` runs only on main push and is gated by `needs: [ci]`. The standalone `deploy.yaml` is for manual `workflow_dispatch` only. This means PRs get fast feedback (lint/test/build) while deploys get the full safety net.
## Verification
**Commands:**
- `cd memento-note && npm run lint` — expected: 0 exit code (or only pre-existing warnings)
- `cd memento-note && npm run test:unit` — expected: all tests pass
- `cd memento-note && npm run build` — expected: build succeeds
**Manual checks:**
- Push a branch with a lint error → verify CI fails in Gitea UI
- Push to main with valid code → verify Telegram receives notification
- Verify rollback Docker image exists on server after deploy (`docker images | grep rollback`)
## Suggested Review Order
**Rollback & Notification Core**
- Entry point: Telegram notification function with status-based emoji routing
[`scripts/deploy-prod.sh:5`](../../scripts/deploy-prod.sh#L5)
- Save current Docker image as rollback target before new build
[`scripts/deploy-prod.sh:36`](../../scripts/deploy-prod.sh#L36)
- Restore previous image and notify on health check failure
[`scripts/deploy-prod.sh:177`](../../scripts/deploy-prod.sh#L177)
- Notify success after health check passes
[`scripts/deploy-prod.sh:170`](../../scripts/deploy-prod.sh#L170)
**CI Failure Notifications**
- Lint failure notification with commit details
[`.gitea/workflows/ci.yaml:63`](../../.gitea/workflows/ci.yaml#L63)
- Test failure notification with commit details
[`.gitea/workflows/ci.yaml:82`](../../.gitea/workflows/ci.yaml#L82)
- Build failure notification with commit details
[`.gitea/workflows/ci.yaml:101`](../../.gitea/workflows/ci.yaml#L101)
**ESLint Configuration**
- Disable experimental React Compiler rules for existing codebase compatibility
[`memento-note/eslint.config.mjs:12`](../../memento-note/eslint.config.mjs#L12)
- Restore critical react-hooks/rules-of-hooks rule
[`memento-note/eslint.config.mjs:60`](../../memento-note/eslint.config.mjs#L60)