Engineering Council Test Reliability Report

Scope aligned with Slack channel #dezvoltare, covering 2026-03-21 07:00 to 2026-03-28 07:00. Metrics and timings are sourced from GitLab pipelines, jobs, and test-report artifacts for the daily 6 PM regression suite and the production smoke suite. Trend charts use daily buckets across this window.

Executive Snapshot

7
Daily Runs
3/7
Daily Green
19m 28s
Avg Daily Runtime
13
Smoke Attempts
11/13
Smoke Green
3m 14s
Avg Smoke Runtime
2m 56s
Median Smoke Time
0
Current Green Streak

Executive Analysis

Bottom line: the weakest link is smoke reliability, not test speed. The suite can still provide signal, but deploy confidence is being taxed by failed or noisy smoke attempts.

What Matters

  • Daily regression passed 3 of 7 runs (42.9%), with a current green streak of 0 and a best streak of 2 in this window. The latest daily run (150328) failed, so the system is ending the week under tension rather than in a clean state.
  • Smoke passed 11 of 13 attempts (84.6%) across 7 production pipelines. 2 pipeline(s) recovered on rerun, which is useful for continuity but also a sign that first-pass deploy signal is noisier than it should be.
  • Failure concentration is not random: Library has the highest strict failure ratio at 1.00%, while Library has the broadest non-pass footprint at 1.00%.
  • Frontend is the weakest smoke surface in this window at 5/7 green (71.4%).
  • Daily-suite runtime averaged 19m 28s.

Engineering Analysis

  • A release gate should fail loudly for product regressions and quietly for infrastructure noise. Rerun recoveries and incomplete smoke attempts suggest those two failure modes are still partially mixed together.
  • The failure profile is concentrated enough to act on. Library and Library are carrying the strongest signal, which means reliability work should be assigned by category ownership instead of treating the suite as one undifferentiated problem.
  • The broader daily suite is carrying more instability than smoke, which usually means product regressions are escaping into wider coverage areas even when the narrow deploy gate looks acceptable.

Recommended Actions

  • Assign one owner to Library for the next cycle and expect a short written burn-down: top failing tests, suspected root causes, flake versus regression breakdown, and what gets fixed or quarantined first.
  • Treat the daily regression suite like an operations queue until it is calm again: triage failures after each red run, close known-noise items fast, and avoid letting multiple unrelated red signals pile up between runs.
  • Put Frontend smoke under closer guardrails for the next release cycle. It is the best place to improve first-pass deploy confidence quickly.

Improvement Ideas

  • Introduce a small reliability budget for tests: every flaky or quarantined case needs an owner and an expiry, and the team should review that budget weekly the same way it reviews bugs or incidents.
  • Track first-fail to root-cause time as a core metric. Fast diagnosis is as important as raw pass rate because the practical value of a test gate depends on how quickly it helps the team recover.
  • Define a runtime budget per suite and require justification when test count or duration grows. Reliable feedback systems stay trusted when they remain both stable and proportionate.

Council Narrative

  • The most recent daily suite run failed, so the current green streak is 0.
  • In the last 10 daily runs, total test volume stayed flat at 1,189.
  • Library has the highest strict failure ratio at 1.00%, while Billing has the broadest non-pass footprint at 0.00% and showed up in 0 failed runs.
  • The average daily-suite runtime, measured from GitLab start and finish timestamps, was 19m 28s.
  • Smoke runs stayed fast when healthy: average duration was 3m 14s and the median passing run was 2m 56s.
  • Across the latest daily smoke references, observed test volume stayed flat at 110.
  • Smoke-suite breakdown in this window: Frontend 5/7, University 6/6.
  • 2 production pipelines clearly recovered on rerun after an initial smoke failure: 149634, 149671.

Category Failure Ratios

How computed

Failure Ratio = failed test executions divided by total test executions in the selected timeframe.

Non-pass Ratio = (failed + pending + skipped) test executions divided by total test executions in the selected timeframe.

Example: a 1.75% Billing failure ratio means 1.75% of all Billing test executions in this period ended in failed. A 1.35% Library non-pass ratio means 1.35% of all Library test executions in this period ended in failed, pending, or skipped.

How computed

Failure Ratio = failed test executions divided by total test executions in the selected timeframe.

Non-pass Ratio = (failed + pending + skipped) test executions divided by total test executions in the selected timeframe.

Example: a 1.75% Billing failure ratio means 1.75% of all Billing test executions in this period ended in failed. A 1.35% Library non-pass ratio means 1.35% of all Library test executions in this period ended in failed, pending, or skipped.

Daily Daily Suite Status0000103-2103-2303-2503-27
Daily Smoke Attempts0135703-2103-2303-2503-27
Daily Average Daily Suite Runtime19m 07s19m 18s19m 29s19m 40s19m 51s03-2103-2303-2503-27
Daily Average Smoke Runtime0m 00s0m 53s1m 46s2m 40s3m 33s03-2103-2303-2503-27
Daily Suite Total Test Growth (Recent 7 Runs)1189118911891189119003-2103-2303-2503-27
Smoke Suite Total Test Growth (Latest Run Per Day)
FrontendUniversity
10356085110Frontend 03-23: 110Frontend 03-24: 110Frontend 03-25: 110Frontend 03-27: 110University 03-23: 10University 03-24: 10University 03-25: 10University 03-27: 1003-2303-2403-2503-27

Category Aggregate Table

How computed

Failure Ratio = failed test executions divided by total test executions in the selected timeframe.

Non-pass Ratio = (failed + pending + skipped) test executions divided by total test executions in the selected timeframe.

Example: a 1.75% Billing failure ratio means 1.75% of all Billing test executions in this period ended in failed. A 1.35% Library non-pass ratio means 1.35% of all Library test executions in this period ended in failed, pending, or skipped.

How computed

Failure Ratio = failed test executions divided by total test executions in the selected timeframe.

Non-pass Ratio = (failed + pending + skipped) test executions divided by total test executions in the selected timeframe.

Example: a 1.75% Billing failure ratio means 1.75% of all Billing test executions in this period ended in failed. A 1.35% Library non-pass ratio means 1.35% of all Library test executions in this period ended in failed, pending, or skipped.

CategoryTotalFailedPendingSkippedFailure RatioNon-pass RatioRuns With Failures
Billing7560000.00%0.00%0
Web52010000.00%0.00%0
Frontend17644000.23%0.23%3
Library6026001.00%1.00%3
CatFailF%NP%Tot
Billing
Pend 0Skip 0Runs 0
0
0.00%
0.00%
756
Web
Pend 0Skip 0Runs 0
0
0.00%
0.00%
5201
Frontend
Pend 0Skip 0Runs 3
4
0.23%
0.23%
1764
Library
Pend 0Skip 0Runs 3
6
1.00%
1.00%
602

Recent Runs

Recent Daily Suite Runs

DatePipelineSuitesStatusSummary
2026-03-21 18:23149394BillingWebFrontendLibraryPASSEDTotal 1189 | Passed 1189 | Failed 0
2026-03-22 18:23149456BillingWebFrontendLibraryPASSEDTotal 1189 | Passed 1189 | Failed 0
2026-03-23 18:23149694BillingWebFrontendLibraryFAILEDTotal 1189 | Passed 1185 | Failed 4
2026-03-24 18:22149866BillingWebFrontendLibraryFAILEDTotal 1189 | Passed 1186 | Failed 3
2026-03-25 18:22150059BillingWebFrontendLibraryFAILEDTotal 1189 | Passed 1187 | Failed 2
2026-03-26 18:22150180BillingWebFrontendLibraryPASSEDTotal 1189 | Passed 1189 | Failed 0
2026-03-27 18:22150328BillingWebFrontendLibraryFAILEDTotal 1189 | Passed 1188 | Failed 1
2026-03-21 18:23Pipeline 149394BillingWebFrontendLibrary
PASSED
T 1189 | P 1189 | F 0 | Pend 0
2026-03-22 18:23Pipeline 149456BillingWebFrontendLibrary
PASSED
T 1189 | P 1189 | F 0 | Pend 0
2026-03-23 18:23Pipeline 149694BillingWebFrontendLibrary
FAILED
T 1189 | P 1185 | F 4 | Pend 0
2026-03-24 18:22Pipeline 149866BillingWebFrontendLibrary
FAILED
T 1189 | P 1186 | F 3 | Pend 0
2026-03-25 18:22Pipeline 150059BillingWebFrontendLibrary
FAILED
T 1189 | P 1187 | F 2 | Pend 0
2026-03-26 18:22Pipeline 150180BillingWebFrontendLibrary
PASSED
T 1189 | P 1189 | F 0 | Pend 0
2026-03-27 18:22Pipeline 150328BillingWebFrontendLibrary
FAILED
T 1189 | P 1188 | F 1 | Pend 0

Recent Smoke Attempts

DateSuitePipelineJobStatusPassedFailedDuration
2026-03-23 12:20Frontend149520Frontend smokePASSED11003m 08s
2026-03-23 15:20University149634University smokePASSED1002m 09s
2026-03-23 15:25Frontend149634Frontend smokeFAILED10375m 41s
2026-03-23 16:25University149671University smokePASSED1002m 17s
2026-03-23 16:31Frontend149671Frontend smokeFAILED10375m 57s
2026-03-23 17:03University149684University smokePASSED1002m 33s
2026-03-23 17:05Frontend149684Frontend smokePASSED11003m 03s
2026-03-24 20:52University149788University smokePASSED1002m 56s
2026-03-24 20:53Frontend149788Frontend smokePASSED11003m 10s
2026-03-25 12:14University149902University smokePASSED1002m 23s
2026-03-25 12:16Frontend149902Frontend smokePASSED11003m 05s
2026-03-27 14:33University150306University smokePASSED1002m 23s
2026-03-27 14:35Frontend150306Frontend smokePASSED11003m 15s

Smoke Suite Breakdown

Frontend
7 attempts across 7 pipelines
71% green
Passed5
Failed2
Incomplete0
Avg runtime3m 54s
Median passing runtime3m 08s
Pipelines7
University
6 attempts across 6 pipelines
100% green
Passed6
Failed0
Incomplete0
Avg runtime2m 27s
Median passing runtime2m 23s
Pipelines6
Generated from GitLab project adservio/helm2. Times are shown in Europe/Bucharest. Daily-suite runtime is measured from GitLab pipeline and job timestamps. Category counts come from GitLab test-report JSON artifacts, with job-trace fallback when older artifacts have expired.