Developing E2E Smoke Tests for Production Monitoring
Smoke tests—minimal set of e2e tests that verify critical paths of the application. Their goal is not exhaustive testing, but quick detection that "something broke" after deploy or due to external incident. Run every 5–15 minutes on production.
Difference from Regular E2E Tests
Regular e2e tests run in CI before deploy and can take 10–30 minutes. Smoke tests for monitoring:
- Run continuously on production (synthetic monitoring)
- Execute in 1–3 minutes
- Cover only critical happy paths
- Trigger alert immediately on failure (PagerDuty, Slack, email)
Tool Selection
Playwright—recommended for most projects. More reliable than Selenium, modern API, built-in network mocking support, trace viewer for debugging.
Cypress—good for React/Vue applications, but harder to run in headless environments without additional configuration.
k6 browser—if already using k6 for load testing, browser module allows adding e2e checks to the same tool.
Smoke Test Examples on Playwright
// tests/smoke/critical-paths.spec.ts
import { test, expect } from '@playwright/test';
test.describe('Smoke: Critical paths', () => {
test('Homepage loads and key elements visible', async ({ page }) => {
await page.goto('/');
await expect(page.getByRole('navigation')).toBeVisible();
await expect(page.getByRole('main')).toBeVisible();
// Don't test all content—only structure
});
test('Login flow works end-to-end', async ({ page }) => {
await page.goto('/login');
await page.fill('[name="email"]', process.env.SMOKE_TEST_EMAIL!);
await page.fill('[name="password"]', process.env.SMOKE_TEST_PASSWORD!);
await page.click('[type="submit"]');
await expect(page).toHaveURL('/dashboard', { timeout: 10000 });
await expect(page.getByTestId('user-menu')).toBeVisible();
});
test('API health endpoint responds 200', async ({ request }) => {
const response = await request.get('/api/health');
expect(response.status()).toBe(200);
const body = await response.json();
expect(body.status).toBe('ok');
});
test('Checkout page loads with correct elements', async ({ page }) => {
// Login via API for speed, not through UI
const authResponse = await page.request.post('/api/auth/login', {
data: { email: process.env.SMOKE_TEST_EMAIL, password: process.env.SMOKE_TEST_PASSWORD }
});
const { token } = await authResponse.json();
await page.addInitScript(t => {
localStorage.setItem('auth_token', t);
}, token);
await page.goto('/checkout');
await expect(page.getByTestId('checkout-form')).toBeVisible();
});
});
Production Configuration
// playwright.config.smoke.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
testDir: './tests/smoke',
timeout: 30000,
retries: 2, // Retry on network errors
workers: 1, // Sequentially—don't overload production
use: {
baseURL: process.env.SMOKE_BASE_URL || 'https://app.example.com',
extraHTTPHeaders: {
'X-Smoke-Test': 'true', // Mark traffic for analytics/logs
},
},
reporter: [
['list'],
['json', { outputFile: 'smoke-results.json' }],
],
});
Running on Schedule via GitHub Actions
# .github/workflows/smoke-monitor.yml
name: Smoke Tests Monitor
on:
schedule:
- cron: '*/15 * * * *' # Every 15 minutes
workflow_dispatch:
jobs:
smoke:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20, cache: npm }
- run: npm ci
- run: npx playwright install chromium --with-deps
- name: Run smoke tests
run: npx playwright test --config=playwright.config.smoke.ts
env:
SMOKE_BASE_URL: https://app.example.com
SMOKE_TEST_EMAIL: ${{ secrets.SMOKE_TEST_EMAIL }}
SMOKE_TEST_PASSWORD: ${{ secrets.SMOKE_TEST_PASSWORD }}
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{"text": "Smoke tests FAILED on production! <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Run>"}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
Test Credentials
Separate smoke-test user in the system with:
- Specific flag in DB (
is_synthetic: true) for filtering from analytics - Minimal permissions (read-only where possible)
- Inability to perform real payment operations
- Password rotation via CI secrets
Monitoring Metrics
Beyond test passes, collect: execution time for each test (>50% deviation—sign of performance degradation), percentage of successful runs in last 24 hours, average response time of API health endpoint.
Timeline
Writing 5–10 smoke tests for critical application paths—2–3 days. Setting up GitHub Actions with schedule and Slack notifications—0.5 days. Creating isolated test credentials—0.5 days.







