Skip to main content

Command Palette

Search for a command to run...

Beginner's Guide to Building a Professional CI/CD Pipeline from Scratch

Updated
27 min read
Beginner's Guide to Building a Professional CI/CD Pipeline from Scratch

Project: Week 1 → CI/CD Foundations (Node.js + GraphQL)

Repository: Push1697/devops-portfolio


In the world of DevOps, a pipeline isn't just a script that runs tests — it's the factory floor of your software delivery. A well-architected pipeline ensures that code flows from a developer's laptop to production reliably, securely, and rapidly.

This guide walks you through building a complete CI/CD pipeline for a basic Node.js application using GitHub Actions, Docker, and AWS from scratch. Every file, every keyword, every config is explained so you can build the same pipeline yourself without ever opening the GitHub repo.

Whether you're a beginner or refining up your skills, this is the kind of pipeline you'd find behind any serious production deployment.


Table of Contents

  1. Architecture Overview

  2. Project Setup — Build the Application

  3. Docker Hub Setup — Create Your Token

  4. GitHub Repository Secrets

  5. AWS Infrastructure Setup

  6. The Pipeline — Complete Walkthrough

  7. Branch Protection & Governance

  8. Troubleshooting — Every Error We Hit

  9. Summary


1. Architecture Overview

Before diving into code, let's understand the flow. We aren't just "deploying code"; we are orchestrating a software supply chain.

The Pipeline Stages

StageWhat It DoesKey Tools
BuildClean install (npm ci), syntax checkNode.js 20, npm
TestRun unit/integration testsJest / npm test
SecurityDependency audit + static analysisnpm audit, CodeQL
DockerBuild, scan, and push container imageDocker Buildx, Trivy
DeployPull & run on EC2 via SSM (with rollback)AWS OIDC, SSM

Pipeline Visualization

GitHub Actions Pipeline — All 5 stages passed. Build (7s) → Test (13s) and Security (1m 10s) run in parallel → Docker (1m 11s) → Deploy (8s)

The complete pipeline DAG in GitHub Actions. Notice how Test and Security run in parallel after Build, and Docker + Deploy only trigger on the main branch. Total pipeline time: ~2m 49s.

Key design choice: Test and Security are independent of each other. By running them in parallel (both use needs: build), we cut pipeline time without sacrificing quality gates.


2. Project Setup — Build the Application

Before anything CI/CD, you need a working application. Here's exactly what to build.

Step 1: Initialize the Node.js Project

mkdir week1-cicd && cd week1-cicd
npm init -y
npm install express express-graphql graphql

What each package does:

PackagePurpose
expressWeb framework — handles HTTP routing and middleware
express-graphqlAdds a /graphql endpoint with the GraphiQL IDE
graphqlCore library for defining schemas, types, and resolvers

Your package.json should look like this:

{
  "name": "node-ci-demo",
  "version": "1.0.0",
  "description": "Sample Node.js app for CI/CD demo",
  "main": "server.js",
  "scripts": {
    "start": "node server.js"
  },
  "license": "MIT",
  "dependencies": {
    "express": "^5.2.1",
    "express-graphql": "^0.12.0",
    "graphql": "^15.10.1"
  }
}

💡 Important: After running npm install, a package-lock.json file is generated. You must commit this file — the pipeline uses npm ci which requires it.

Step 2: Create the Data File (MOCK_DATA.json)

[
  { "id": 1, "firstName": "Asha", "lastName": "Iyer", "email": "asha.iyer@example.com", "password": "pass1234" },
  { "id": 2, "firstName": "Noah", "lastName": "Cole", "email": "noah.cole@example.com", "password": "pass2345" },
  { "id": 3, "firstName": "Mina", "lastName": "Khan", "email": "mina.khan@example.com", "password": "pass3456" },
  { "id": 4, "firstName": "Luis Kumar", "lastName": "Santos", "email": "luis.santos@example.com", "password": "pass4567" }
]

Step 3: Create the Server (server.js)

The server code sets up:

EndpointTypeWhat It Does
/HTMLInteractive user search UI
/graphqlGraphQLFull GraphQL API with queries + mutations
/rest/getAllUsersRESTReturns all users as JSON
/api/users/search?q=RESTSearch users by name or email

Here's the core structure of server.js:

const express = require("express");
const graphql = require("graphql");
const { graphqlHTTP } = require("express-graphql");

const app = express();
const PORT = 5000;
const userData = require("./MOCK_DATA.json");

const {
  GraphQLObjectType,
  GraphQLSchema,
  GraphQLList,
  GraphQLInt,
  GraphQLString,
} = graphql;

// Define the User type (what fields a "User" has)
const UserType = new GraphQLObjectType({
  name: "User",
  fields: () => ({
    id: { type: GraphQLInt },
    firstName: { type: GraphQLString },
    lastName: { type: GraphQLString },
    email: { type: GraphQLString },
    password: { type: GraphQLString },
  }),
});

// Define queries (how to READ data)
const RootQuery = new GraphQLObjectType({
  name: "RootQueryType",
  fields: {
    getAllUsers: {
      type: new GraphQLList(UserType),
      args: { id: { type: GraphQLInt } },
      resolve() { return userData; },         // Returns all users
    },
    findUserById: {
      type: UserType,
      args: { id: { type: GraphQLInt } },
      resolve(parent, args) {
        return userData.find((item) => item.id === args.id);
      },
    },
  },
});

// Define mutations (how to WRITE data)
const Mutation = new GraphQLObjectType({
  name: "Mutation",
  fields: {
    createUser: {
      type: UserType,
      args: {
        firstName: { type: GraphQLString },
        lastName: { type: GraphQLString },
        email: { type: GraphQLString },
        password: { type: GraphQLString },
      },
      resolve(parent, args) {
        const newUser = {
          id: userData.length + 1,
          ...args,
        };
        userData.push(newUser);
        return newUser;
      },
    },
  },
});

// Build the schema and mount GraphQL + REST endpoints
const schema = new GraphQLSchema({ query: RootQuery, mutation: Mutation });

app.use("/graphql", graphqlHTTP({ schema, graphiql: true }));

app.get("/rest/getAllUsers", (req, res) => { res.send(userData); });

app.get("/api/users/search", (req, res) => {
  const query = (req.query.q || "").toLowerCase().trim();
  const results = query
    ? userData.filter((u) =>
        [u.firstName, u.lastName, u.email]
          .join(" ").toLowerCase().includes(query)
      )
    : userData;
  res.json(results);
});

// The "/" route serves an HTML page with a search UI (omitted for brevity)

app.listen(PORT, () => { console.log("Server running"); });

Test it locally:

npm start
# Open http://localhost:5000

Application running at 15.207.102.213:5000 — The User Search interface showing 8 users, with a search bar and total count

The deployed application running on EC2. Notice the URL — this is the public IP of our AWS instance on port 5000, exactly what the pipeline deploys to.

Step 4: Create the Dockerfile (Multi-Stage Distroless Build)

This is where most beginners just write FROM node and call it a day. We don't do that.

# Stage 1: Dependencies — use a full Node image to install packages
FROM node:20-alpine3.18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev --ignore-scripts \
    && npm cache clean --force \
    && rm -rf /root/.npm /tmp/*

# Stage 2: Runtime — copy only what's needed into a minimal image
FROM gcr.io/distroless/nodejs20-debian12:nonroot
WORKDIR /app
COPY --from=builder --chown=nonroot:nonroot /app/node_modules ./node_modules
COPY --chown=nonroot:nonroot server.js .
COPY --chown=nonroot:nonroot MOCK_DATA.json .
EXPOSE 5000
CMD ["server.js"]

Line-by-line explanation:

LineWhat It Does
FROM node:20-alpine3.18 AS builderStage 1 uses Alpine Linux (small) to install deps. Named builder for reference
COPY package*.json ./Copies both package.json and package-lock.json into the container
npm ci --omit=devClean install, production dependencies only — skips devDependencies
--ignore-scriptsSkips lifecycle scripts (postinstall, etc.) — reduces attack surface
npm cache clean --forceRemoves npm cache — smaller image layer
FROM gcr.io/distroless/nodejs20-debian12:nonrootStage 2 uses Google's Distroless — no shell, no package manager, no OS utils
COPY --from=builderCopies node_modules from Stage 1 into the final image
--chown=nonroot:nonrootFiles owned by non-root user — container never runs as root
CMD ["server.js"]Distroless uses exec form (no shell), so we pass the filename directly

💡 Result: Our final image is 51.6 MB instead of ~1 GB with a standard node:20 base. You can verify this in Docker Hub — smaller image = faster pulls = faster deployments.

Step 5: Create the .dockerignore

node_modules
npm-debug.log
.git
.env
tests/
*.test.js
coverage/
README.md

Why this matters: Without .dockerignore, Docker copies your entire node_modules (which gets rebuilt inside), .git history, and test files into the build context — making builds slower and images larger.

Step 6: Commit Everything

git add .
git commit -m "feat: add Node.js app with GraphQL + REST endpoints"
git push origin main

⚠️ Don't forget package-lock.json! The pipeline's npm ci command requires it. If missing, the build fails immediately.


3. Docker Hub Setup — Create Your Token

Before the pipeline can push images, you need a Docker Hub account and an access token.

Step 1: Navigate to Personal Access Tokens

Go to Docker Hub → Account SettingsSecurityPersonal access tokens.

Docker Hub — Personal Access Tokens page showing "Generate new token" button

The Docker Hub PAT settings page. If this is your first token, you'll see the "Generate new token" button.

Step 2: Create the Access Token

Click Generate new token and fill in:

FieldValueWhy
Access token descriptiongithub_actionsIdentifies what this token is used for
Expiration date30 days (or more)Set based on your security policy
Access permissionsRead & WritePush requires Write; Read pulls images

Docker Hub — Creating a PAT with description "github_actions", 30-day expiry, Read & Write permissions

Fill in exactly as shown. The "Read & Write" permission lets your pipeline push images to Docker Hub.

Step 3: Copy and Save the Token

After clicking Generate, Docker Hub shows your token once. Copy it immediately.

Example: dckr_pat_ABC123DEF456GHI789JKL0MN

⚠️ This token will never be shown again. If you lose it, you must generate a new one.

Why a Personal Access Token (PAT) over your password?

  • Can be revoked without changing your Docker Hub password

  • Can be scoped to specific permissions (Read, Write, Delete)

  • Can be audited — you see when it was last used

  • If compromised, your account password stays safe


4. GitHub Repository Secrets

Now we add all credentials to GitHub so the pipeline can use them securely.

How to Add a Secret

  1. Go to your GitHub repo → SettingsSecrets and variablesActions

  2. Click New repository secret

  3. Enter the Name and Value

  4. Click Add secret

Secrets You Need to Add

Secret NameValuePurpose
DOCKERHUB_USERNAMEYour Docker Hub username (e.g., deviltalks)Docker login
DOCKERHUB_TOKENThe PAT from Step 3 aboveDocker login
AWS_ROLE_ARNarn:aws:iam::123456789012:role/github-actions-oidc-roleOIDC authentication
EC2_INSTANCE_IDi-0xxxxxxxxxxxxxxxxxSSM deployment target

💡 How GitHub Secrets work: Values are encrypted at rest using libsodium sealed boxes. They are never printed in logs — even if your pipeline does echo ${{ secrets.DOCKERHUB_TOKEN }}, it shows ***. They cannot be read by forks or PRs from forks.


5. AWS Infrastructure Setup

This is where most tutorials skip. But in real life, infrastructure is where 90% of issues happen.

5.1 AWS OIDC Configuration (No Static Access Keys!)

We do not use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Those are long-lived credentials — if they leak, an attacker has access until you manually rotate them. That's unacceptable.

Instead, we use OpenID Connect (OIDC):

GitHub Actions ──► (JWT token) ──► AWS STS ──► Temporary credentials (15 min)

AWS says: "I trust GitHub Actions from this specific repo to assume this specific role*, and only for a short time."*

Step-by-step setup:

  1. Go to AWS IAM → Identity ProvidersAdd provider

  2. Select OpenID Connect

  3. Provider URL: https://token.actions.githubusercontent.com

  4. Audience: sts.amazonaws.com

  5. Click Add provider

Then create the IAM Role:

  1. Go to IAM → RolesCreate role

  2. Trusted entity type: Web identity

  3. Identity provider: token.actions.githubusercontent.com

  4. Audience: sts.amazonaws.com

  5. Click Next

  6. Attach permission: AmazonSSMManagedInstanceCore

  7. Role name: github-actions-oidc-role

  8. Click Create role

Update the Trust Policy (IAM → Roles → your role → Trust relationships → Edit):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:Push1697/devops-portfolio:*"
        }
      }
    }
  ]
}

⚠️ Critical: The sub condition locks this to your exact repository. Without it, any GitHub repo could assume your AWS role.

Copy the Role ARN (e.g., arn:aws:iam::450070307294:role/github-actions-oidc-role) → save it as the AWS_ROLE_ARN GitHub Secret.

5.2 EC2 Instance Preparation

Your EC2 deployment target needs four things:

RequirementHow to Set It Up
Docker installedsudo yum install docker -y && sudo systemctl enable --now docker
SSM Agent runningPre-installed on Amazon Linux. Verify: systemctl status amazon-ssm-agent
IAM Instance RoleAttach a role with AmazonSSMManagedInstanceCore policy (see below)
Security GroupAllow inbound TCP on port 5000 from 0.0.0.0/0

How to attach AmazonSSMManagedInstanceCore to your EC2:

  1. Go to AWS EC2 → Instances → Select your instance

  2. Look for IAM Role in the details panel

  3. If no role is attached: ActionsSecurityModify IAM role

  4. Select a role that has AmazonSSMManagedInstanceCore attached (or create one)

  5. Click Update IAM role

To verify the role's permissions:

  1. Go to IAM → Roles → Click the role name

  2. Under Permissions, confirm AmazonSSMManagedInstanceCore is listed

  3. If missing: Add permissionsAttach policies → Search for AmazonSSMManagedInstanceCore

💡 Why SSM instead of SSH? With SSM Run Command, you don't need to open port 22 to the internet. No SSH keys to manage, no bastion hosts. All commands are logged in CloudTrail. It's the modern, secure way to run commands on EC2.


6. The Pipeline — Complete ci-cd-pipeline.yml Walkthrough

This file lives at .github/workflows/ci-cd-pipeline.yml. Let's dissect every single line.

6.1 Name, Triggers & Permissions

name: week-1 CI-CD Pipeline

on:
  push:
    branches: ["main", "develop"]
  pull_request:
    branches: ["main"]
  workflow_dispatch:

permissions:
  contents: read
  id-token: write
  security-events: write

Line-by-line:

LineWhat It Does
name: week-1 CI-CD PipelineDisplay name shown in GitHub Actions tab
on.push.branches: ["main", "develop"]Runs on every push to main or develop branches
on.pull_request.branches: ["main"]Runs when a PR is opened/updated against main — validates before merge
workflow_dispatch:Adds a "Run workflow" button in GitHub UI for manual triggers
permissions.contents: readAllows the workflow to read (checkout) your repository code
permissions.id-token: writeCritical for OIDC — lets GitHub generate a JWT token for AWS auth
permissions.security-events: writeRequired for CodeQL to upload scan results to GitHub's Security tab

⚠️ #1 mistake beginners make: Forgetting id-token: write. Without it, the OIDC handshake with AWS fails silently and the deploy job errors with a cryptic permissions message.

6.2 Global Environment Variables

env:
  NODE_VERSION: "20"
  APP_DIR: "week1-cicd"
  IMAGE_NAME: ${{ secrets.DOCKERHUB_USERNAME }}/node-ci-demo
  CONTAINER_NAME: "node-ci-demo"
  APP_PORT: "5000"
  AWS_REGION: "ap-south-1"
  AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
  EC2_INSTANCE_ID: ${{ secrets.EC2_INSTANCE_ID }}
VariableValueWhy It's Here
NODE_VERSION"20"Used by setup-node — change once, applies everywhere
APP_DIR"week1-cicd"Our app lives in a subdirectory, not the repo root
IMAGE_NAME<username>/node-ci-demoFull Docker image name — constructed from secrets
CONTAINER_NAME"node-ci-demo"Name of the Docker container on EC2
APP_PORT"5000"Port the app listens on
AWS_REGION"ap-south-1"Mumbai region — change to your deploy region
AWS_ROLE_ARNFrom secretsThe IAM role ARN for OIDC auth
EC2_INSTANCE_IDFrom secretsTarget EC2 instance for deployment

💡 DRY Principle: If you ever need to change the Node version, image name, or AWS region, you change it in one place — not scattered across 5 jobs. as this practice creates simplicity and manageability of your pipeline if it is complex.

6.3 Concurrency Control

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
KeywordWhat It Does
group:Creates a unique group per workflow + branch combo
cancel-in-progressIf you push 3 commits rapidly, only the latest pipeline runs — older ones are cancelled

Why this matters: This saves GitHub Actions minutes (and money). Without it, 3 rapid pushes = 3 concurrent pipelines fighting over resources.

6.4 Job 1: Build

jobs:
  build:
    name: Build
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: npm
          cache-dependency-path: ${{ env.APP_DIR }}/package-lock.json

      - name: Install dependencies
        run: npm ci
        working-directory: ${{ env.APP_DIR }}

      - name: Validate server entry
        run: node -c server.js
        working-directory: ${{ env.APP_DIR }}

      - name: Lint (if configured)
        run: npm run lint --if-present
        working-directory: ${{ env.APP_DIR }}

Step-by-step explanation:

StepWhat It DoesWhy
actions/checkout@v4Clones your repository into the GitHub runnerWithout this, the runner VM has no code
actions/setup-node@v4Installs Node.js 20 on the runnercache: npm reuses previously downloaded packages
cache-dependency-pathPoints to week1-cicd/package-lock.jsonTells the cache which lockfile to hash for cache invalidation
npm ciClean install from lockfileDeterministic — fails if lockfile is missing or out of sync
node -c server.jsSyntax check only — parses the file without executing itCatches typos, missing brackets, or syntax errors instantly
npm run lint --if-presentRuns linting only if a lint script exists in package.json--if-present prevents failure if no linter is configured

What is runs-on: ubuntu-latest? This tells GitHub to run this job on a fresh Ubuntu virtual machine. Each job gets its own clean VM — nothing carries over between jobs unless you explicitly share artifacts.

What is working-directory? Our app lives inside week1-cicd/, not the repo root. This keyword tells each step to cd into that folder before running the command.

Why npm ci instead of npm install?

Featurenpm installnpm ci
Uses package-lock.jsonOptionalRequired
Modifies lockfileYesNever
Deletes node_modules firstNoYes
DeterministicNoYes
SpeedSlowerFaster

6.5 Job 2: Test

  test:
    name: Test
    runs-on: ubuntu-latest
    needs: build

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: npm
          cache-dependency-path: ${{ env.APP_DIR }}/package-lock.json

      - name: Install dependencies
        run: npm ci
        working-directory: ${{ env.APP_DIR }}

      - name: Run tests (if configured)
        run: npm test --if-present
        working-directory: ${{ env.APP_DIR }}

Key keyword — needs: build:

This creates a dependency chain. The test job waits for build to succeed before starting. If the build fails, test never runs — saving compute time and money.

💡 Why does each job re-checkout and re-install? Each GitHub Actions job runs on a separate VM. The VM from the Build job is destroyed when it finishes. So Test needs its own checkout and install. This is by design — isolation prevents contamination between jobs.

npm test --if-present: Runs the test script if it exists in package.json, otherwise skips gracefully. This is useful in early-stage projects where tests haven't been written yet.

6.6 Job 3: Security (Defense in Depth)

  security:
    name: Security
    runs-on: ubuntu-latest
    needs: build

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: npm
          cache-dependency-path: ${{ env.APP_DIR }}/package-lock.json

      - name: Install dependencies
        run: npm ci
        working-directory: ${{ env.APP_DIR }}

      - name: Run npm audit
        run: npm audit --audit-level=moderate || true
        working-directory: ${{ env.APP_DIR }}

      - name: CodeQL init
        uses: github/codeql-action/init@v4
        with:
          languages: javascript

      - name: CodeQL analyze
        uses: github/codeql-action/analyze@v4

Notice: Both test and security have needs: build. They are independent of each other, so they run in parallel. This is visible in the pipeline visualization above.

We use three layers of security scanning (Defense in Depth):

Layer 1: npm audit        → Known vulnerabilities in npm packages
Layer 2: GitHub CodeQL     → Static analysis of YOUR code (injections, logic bugs)
Layer 3: Trivy            → OS-level CVEs in the Docker image (runs in Docker job)

Keyword breakdown:

Keyword / FlagWhat It Does
npm audit --audit-level=moderateOnly flag vulnerabilities rated moderate or higher
`true`Don't fail the job on audit warnings — log them but continue
codeql-action/initDownloads and initializes the CodeQL analysis engine for JavaScript
codeql-action/analyzeRuns the actual scan and uploads results to GitHub's Security tab

💡 Why || true on npm audit? In a real production pipeline, you might want npm audit to fail the build. Here we use || true so advisory-level warnings don't block deploys during development. In stricter environments, remove the || true.

6.7 Job 4: Docker Build, Scan & Push

This is the longest job. Let's break it down step by step.

  docker:
    name: Docker
    runs-on: ubuntu-latest
    needs: [test, security]
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
KeywordWhat It Does
needs: [test, security]Waits for both Test and Security to pass before starting
if: github.event_name == 'push'Only runs on direct pushes, not on pull requests
github.ref == 'refs/heads/main'Only runs on the main branch — feature branches never push images

Step 1 → Extract Docker Metadata (Smart Tagging):

      - name: Extract Docker metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=sha,prefix={{branch}}-
            type=raw,value=latest,enable={{is_default_branch}}

This automatically generates Docker tags for your image:

Tag RuleExample OutputPurpose
type=ref,event=branchmainTags with branch name
type=ref,event=prpr-42Tags for pull requests
type=sha,prefix={{branch}}-main-c260cb1Branch + short commit SHA
type=raw,value=latestlatestOnly on the default branch

The SHA tag (main-c260cb1) is critical — it lets you trace any running container back to the exact Git commit that built it.

Step 2 → Docker Hub Login:

      - name: Log in to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

This authenticates to Docker Hub using the secrets we configured earlier. The password field uses the PAT (Personal Access Token), not your actual Docker Hub password.

Step 3 → Set Up Docker Buildx:

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

What is Buildx? It's Docker's extended build tool. It enables multi-platform builds and, critically, caching support via GitHub Actions cache. Without Buildx, you can't use cache-from / cache-to.

Step 4 → Build and Push:

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: ${{ env.APP_DIR }}
          file: ${{ env.APP_DIR }}/Dockerfile
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
KeywordWhat It Does
context: ${{ env.APP_DIR }}Build context is the week1-cicd/ directory
file: ${{ env.APP_DIR }}/DockerfilePath to our multi-stage Dockerfile
push: truePushes the built image to Docker Hub
tags: ${{ steps.meta.outputs.tags }}Uses tags generated by the metadata step
cache-from: type=ghaPulls cached layers from GitHub Actions cache
cache-to: type=gha,mode=maxSaves all layers to cache for future builds

💡 Why caching matters: Without caching, Docker rebuilds every layer from scratch (~2-3 minutes). With type=gha caching, unchanged layers are reused — builds drop to ~30 seconds.

Docker Hub — Repository showing tags main-c260cb1 and latest, 51.6 MB, 121 pulls

Docker Hub showing our pushed image with 5 tags. The main-c260cb1 tag maps to a specific Git commit. Repository size: 51.6 MB (thanks to our distroless multi-stage build).

Step 5 → Trivy Container Scan:

      - name: Trivy scan (optional)
        uses: aquasecurity/trivy-action@master
        continue-on-error: true
        with:
          image-ref: ${{ env.IMAGE_NAME }}:latest
          format: sarif
          output: trivy-results.sarif

      - name: Upload Trivy results
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: trivy-results.sarif
KeywordWhat It Does
continue-on-error: trueTrivy findings don't block the pipeline — they're informational
format: sarifSARIF is a standard format that GitHub understands for its Security tab
if: always()Upload results even if the Trivy scan step finds vulnerabilities

Build logs showing Docker buildx command with cache flags, metadata labels, and an error: "invalid tag node-ci-demo:main"

An actual failed Docker build from our pipeline. The error invalid tag "/node-ci-demo:main" happened because DOCKERHUB_USERNAME was empty — the secret wasn't set yet. After adding the secret, this was resolved.

6.8 Job 5: Deploy with Automated Rollback

  deploy:
    name: Deploy
    runs-on: ubuntu-latest
    needs: docker
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'

    steps:
      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ env.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Deploy via SSM Run Command
        run: |
          aws ssm send-command \
            --document-name "AWS-RunShellScript" \
            --targets "Key=instanceids,Values=${{ env.EC2_INSTANCE_ID }}" \
            --parameters 'commands=["set -e","CURRENT_IMAGE=$(docker inspect -f {{.Image}} ${{ env.CONTAINER_NAME }} 2>/dev/null || echo none)","docker pull ${{ env.IMAGE_NAME }}:latest","docker stop ${{ env.CONTAINER_NAME }} || true","docker rm ${{ env.CONTAINER_NAME }} || true","if docker run -d --name ${{ env.CONTAINER_NAME }} -p ${{ env.APP_PORT }}:5000 ${{ env.IMAGE_NAME }}:latest; then echo Deploy succeeded; else echo Deploy failed, rolling back; [ \"$CURRENT_IMAGE\" != none ] && docker run -d --name ${{ env.CONTAINER_NAME }} -p ${{ env.APP_PORT }}:5000 $CURRENT_IMAGE; exit 1; fi"]' \
            --comment "Deploy node-ci-demo" \
            --region ${{ env.AWS_REGION }}

Step 1 — OIDC Authentication:

The aws-actions/configure-aws-credentials action:

  1. Requests a JWT token from GitHub's OIDC provider

  2. Sends it to AWS STS (Security Token Service)

  3. AWS validates the token against the trust policy

  4. Returns temporary credentials (valid for ~15 minutes)

Step 2 — SSM Run Command (The Deployment Script):

Here's what the script does, broken into readable steps:

# 1. Strict mode — exit immediately if any command fails
set -e

# 2. Save the currently running image hash (for rollback)
CURRENT_IMAGE=$(docker inspect -f '{{.Image}}' node-ci-demo 2>/dev/null || echo none)

# 3. Pull the latest image from Docker Hub
docker pull deviltalks/node-ci-demo:latest

# 4. Stop and remove the old container (ignore errors if it doesn't exist)
docker stop node-ci-demo || true
docker rm node-ci-demo || true

# 5. Start the new container — if it fails, rollback!
if docker run -d --name node-ci-demo -p 5000:5000 deviltalks/node-ci-demo:latest; then
  echo "✅ Deploy succeeded"
else
  echo "❌ Deploy failed — rolling back!"
  # Restore the previous working image
  if [ "$CURRENT_IMAGE" != "none" ]; then
    docker run -d --name node-ci-demo -p 5000:5000 $CURRENT_IMAGE
  fi
  exit 1
fi

What makes this deployment script production-grade:

FeatureHow We Do It
Image hash captureSaves CURRENT_IMAGE before doing anything destructive
Automatic rollbackIf new container fails to start, restores previous image
Zero-downtime recoveryRollback is automatic — no manual intervention needed
set -eAny command failure stops the script — no partial deploys
`true` on stop/rmGracefully handles first-ever deployment (no container to stop)

SSM Run Command deployment logs showing the full AWS SSM send-command JSON response with CommandId, DocumentName, masked secrets, and deployment script commands

AWS Systems Manager executing our deployment script via SSM Run Command. You can see the full command JSON response including the CommandId, the deployment commands with masked secrets (***), the target EC2 instance, and the "Pending" status. Notice how IMAGE_NAME and AWS_ROLE_ARN are masked → GitHub never exposes secrets in logs.


7. Branch Protection & Governance

A pipeline is only as good as the rules protecting it. Without branch protection, anyone with repo access could push directly to main and bypass all checks.

Setting Up Branch Protection Rules

Go to GitHub → SettingsBranchesAdd rule → Branch name pattern: main

Enable these settings:

SettingWhat It Does
Require pull request before mergingForces code review — prevents accidental pushes
Require status checks to passBlocks merge if Build / Test / Security / Docker fails
Require code reviews (1+ approver)At least one peer must approve the PR
Dismiss stale pull request approvalsRe-review required if new commits are added after approval
Require branches to be up to datePrevents merge conflicts in production
Restrict who can pushOnly specific users/teams can bypass (rarely used)

What happens when someone tries to push directly to main:

$ git push origin main
remote: error: GH006: Protected branch update failed
remote: error: Required status check "Build" is expected
remote: error: At least 1 approving review is required

💡 Without branch protection: Any developer with push access can push broken code directly to production. With it: Every change must pass CI checks AND be reviewed by a peer before merging.

Skipping CI (When You Need To)

Sometimes you push documentation changes or .md edits that don't need a full pipeline run. Add [skip ci] to your commit message:

git commit -m "docs: update README formatting [skip ci]"
git push origin main

GitHub Actions recognizes [skip ci], [ci skip], [no ci], or [skip actions] in the commit message and will not trigger any workflows for that push.

⚠️ Use sparingly. Only skip CI for documentation-only changes. Never skip CI for code changes — that defeats the entire purpose of the pipeline.

The Complete Governance Model

Protection LayerWhat It Prevents
Branch protectionDirect pushes to main → forces PR + review
PR status checksMerging PRs with failing Build / Test / Security
OIDC (no static keys)Leaked AWS credentials → tokens auto-expire in 15 min
GitHub SecretsCredentials in code → encrypted, masked in logs, no fork access
npm audit + CodeQLVulnerable dependencies and code-level security flaws
TrivyOS-level vulnerabilities in the Docker image
Automated rollbackFailed deployments staying live → auto-restores previous version
Audit trailUntraceable changes → every commit, PR, and deploy is logged

8. Troubleshooting — Every Error We Hit

These aren't hypothetical — these are errors we actually encountered while building this pipeline. Every fix is documented.

❌ Error 1: npm ci fails — "package-lock.json not found"

npm ERR! The `npm ci` command can only install with an existing package-lock.json

Root cause: You ran npm install locally but never committed package-lock.json.

Fix:

npm install                # generates package-lock.json
git add package-lock.json
git commit -m "chore: add lockfile for CI reproducibility"
git push origin main

❌ Error 2: Docker push — "denied: requested access to the resource is denied"

ERROR: denied: requested access to the resource is denied

Root cause: One (or more) of these:

  • DOCKERHUB_TOKEN is your password, not a PAT

  • PAT lacks Write permission

  • DOCKERHUB_USERNAME doesn't match the image name prefix (e.g., image is deviltalks/node-ci-demo but username is deviltalks2)

Fix:

  1. Go to Docker Hub → Account Settings → Security → Delete old token

  2. Generate a new PAT with Read & Write permissions

  3. Update the DOCKERHUB_TOKEN secret in GitHub

  4. Verify IMAGE_NAME starts with your exact Docker Hub username


❌ Error 3: Docker build — "invalid tag: invalid reference format"

ERROR: failed to build: invalid tag "/node-ci-demo:main": invalid reference format

Root cause: The DOCKERHUB_USERNAME secret is empty or not set. The IMAGE_NAME env var becomes /node-ci-demo (leading slash) instead of deviltalks/node-ci-demo.

Fix:

  1. Go to GitHub → Settings → Secrets → Actions

  2. Verify DOCKERHUB_USERNAME exists and has a value

  3. Re-trigger the pipeline:

git commit --allow-empty -m "retry: fix docker username secret"
git push origin main

💡 This error is visible in the build logs screenshot above — line 222 shows the exact error.


❌ Error 4: Deploy fails — SSM "Command failed"

Root cause: Usually one of:

  • SSM Agent is stopped on EC2

  • EC2 instance doesn't have the AmazonSSMManagedInstanceCore IAM policy

  • Port 5000 is blocked in Security Group

  • Instance is in a different region than AWS_REGION

Fix:

# On the EC2 instance — restart SSM Agent:
sudo systemctl restart amazon-ssm-agent
sudo systemctl status amazon-ssm-agent

# Verify Docker is running:
sudo systemctl status docker

In AWS Console:

  1. IAM: Verify the EC2 instance role has AmazonSSMManagedInstanceCore attached

  2. EC2 → Security Groups: Edit inbound rules → Add TCP 5000 from 0.0.0.0/0

  3. Verify region: Make sure AWS_REGION in your workflow matches the instance's actual region


❌ Error 5: OIDC fails — "Not authorized to perform sts:AssumeRoleWithWebIdentity"

Error: Not authorized to perform sts:AssumeRoleWithWebIdentity

Root cause: The sub claim in your IAM trust policy doesn't match your repo.

Fix:

  1. Verify the trust policy has the correct repo:
"token.actions.githubusercontent.com:sub": "repo:Push1697/devops-portfolio:*"
  1. Verify permissions.id-token: write is set in your workflow

  2. Verify the OIDC Identity Provider exists in IAM with the correct audience (sts.amazonaws.com)


❌ Error 6: Docker login fails — "Username required"

Run docker/login-action@v3
Error: Username required

Root cause: Secret name in the workflow doesn't match what's stored in GitHub. This is case-sensitive!

Fix:

  1. Check your workflow file:
# ✅ Correct — matches the secret name exactly
password: ${{ secrets.DOCKERHUB_TOKEN }}

# ❌ Wrong — different secret name
password: ${{ secrets.DOCKER_SECRET_KEY }}
  1. In GitHub Secrets, verify the exact names: DOCKERHUB_USERNAME and DOCKERHUB_TOKEN

❌ Error 7: Trivy upload fails — "Path does not exist: trivy-results.sarif"

Error: Path does not exist: trivy-results.sarif

Root cause: The Trivy scan step was skipped or failed (often because the Docker build failed first), so no SARIF file was generated.

Fix: This is usually a downstream effect of another error. Fix the Docker build first, and the Trivy upload will work. The if: always() on the upload step ensures it runs even when Trivy fails, but it can't upload a file that doesn't exist.


9. Summary

This isn't a toy pipeline. It's a production-grade delivery system that handles:

PillarHow We Address It
SecurityOIDC (no static keys), GitHub Secrets, npm audit, CodeQL, Trivy
Reliabilitynpm ci for deterministic builds, automated tests
RecoverabilityAutomated rollback on deploy failure
TraceabilityDocker tags tied to Git commit SHAs
EfficiencyParallel jobs, Docker layer caching (type=gha), concurrency
GovernanceBranch protection, PR reviews, status checks, audit trail

By following this guide step by step, you've built something real — not a tutorial demo, but the same patterns used in production systems at companies shipping code daily.


Built as part of the DevOps Portfolio — Week 1: CI/CD Foundations.