AI Auto-Generation of E2E Tests
E2E tests are the most expensive to maintain: they break on any UI change, run slowly, are unstable (flaky). Main cause of instability — hard locators like div.container > ul > li:nth-child(3) > a. An AI generator creates Playwright tests with semantic locators (aria-label, data-testid, role) that resist cosmetic layout changes.
Generate Playwright Tests from Scenario Description
from langchain_openai import ChatOpenAI
from playwright.sync_api import sync_playwright
import json
class E2ETestGenerator:
PLAYWRIGHT_PROMPT = """Create Playwright E2E test in TypeScript.
Scenario: {scenario}
App URL: {base_url}
Test data: {test_data}
Test requirements:
1. Use semantic locators: getByRole, getByLabel, getByText, getByTestId
2. DON'T use CSS selectors like .class or #id (except data-testid)
3. Add explicit waits: await expect(locator).toBeVisible()
4. For forms: fill via getByLabel(), not via selectors
5. Check after each significant action (not only at the end)
6. Use page.waitForResponse() for ajax operations
7. Structure: test.describe > test.beforeEach > test
Example of good locator:
✅ page.getByRole('button', {{ name: 'Create order' }})
✅ page.getByTestId('checkout-submit-btn')
❌ page.locator('button.btn-primary:nth-child(2)')
Return TypeScript test code."""
def __init__(self):
self.llm = ChatOpenAI(model="gpt-4o", temperature=0.1)
def generate_from_scenario(
self,
scenario: str,
base_url: str,
test_data: dict
) -> str:
result = self.llm.invoke(
self.PLAYWRIGHT_PROMPT.format(
scenario=scenario,
base_url=base_url,
test_data=json.dumps(test_data)
)
)
return result.content
def generate_from_recording(self, playwright_trace: str) -> str:
"""Improves automatically recorded Playwright Codegen test"""
prompt = f"""Improve automatically recorded Playwright test.
Original test (from Codegen):
```typescript
{playwright_trace}
Fix:
- Replace CSS selectors with semantic locators
- Remove unnecessary clicks
- Add explicit waits
- Add assertions for data verification
Return improved TypeScript code.""" return self.llm.invoke(prompt).content
### Stability Optimization and Self-Healing
```python
class TestStabilityOptimizer:
async def analyze_flakiness(self, test_result: dict) -> dict:
"""Analyzes why test failed and suggests fixes"""
prompt = f"""Analyze flaky test failure.
Test: {test_result['test_name']}
Error: {test_result['error']}
Screenshot: {test_result['screenshot_path']}
Likely causes:
- Race condition: element not yet visible
- Network delay: wait for network response
- Async update: element content changed
- Dynamic ID/class: selector needs updating
Suggest fix (semantic locator approach)."""
return await self.llm.ainvoke(prompt)
Case study: React SPA, 80 critical user journeys for E2E. Initial problem: 35% of tests flaky, breaking 2–3 times per week on style changes. Generated semantic-locator tests: zero flakiness after stabilization. Maintenance time per layout change: 3 hours (manual fix per test) → 0 (tests adapt).
Timeframe: E2E test generation + stability improvements: 4–6 weeks; full framework with cross-browser testing: 8–10 weeks.







