Sujal Bhakare
Systems Engineering and Research Portfolio
living archivefailures2026

Failure Archive

A structured archive of engineering failures, root causes, and corrective design principles from real hardware/software integration work.

No hero asset
Fallback architecture graphic
Failure Records

Root Causes and Corrective Principles

Failure analysis

LM5148 buck converter failure

Symptom
24 V to 12 V buck output was stuck around 0.04 V and switching traces burned.
Root cause
Suspected layout, current path, switch-node, compensation, or startup issue.
Fix direction
Apply strict high-current buck layout discipline, current-limited bring-up, thermal inspection, and staged validation.
Maturity signal
High-current buck converters require layout and validation discipline before they are treated as solved power blocks.
Failure analysis

VCC to PGND abnormal low resistance

Symptom
Measured around 33 ohms between VCC and PGND.
Root cause
Concern around controller IC damage or unexpected rail loading.
Fix direction
Verify controller rails before power-up and isolate IC power pins during debug.
Maturity signal
Rail resistance measurements need context before applying power.
Failure analysis

eFuse / hot-swap path issue

Symptom
VBAT_EFUSE measured around 1.1 V despite 24 V input, with MOSFET gate at 0 V.
Root cause
Likely gate drive, UVLO, sense, or controller startup issue.
Fix direction
Perform systematic pin-state validation before bypassing the hot-swap controller.
Maturity signal
Protection controllers must be debugged as active systems, not passive connectors.
Failure analysis

Rail short / false continuity confusion

Symptom
Multimeter beeped on rails with bulk capacitors present.
Root cause
Capacitors caused transient continuity behavior.
Fix direction
Distinguish real 0-ohm shorts from capacitor charging behavior.
Maturity signal
Measurement tools can mislead when interpreted without circuit context.
Failure analysis

Soldering and reflow residue

Symptom
Flux residue and solder paste leftovers remained after reflow.
Root cause
Board was not clean enough before power-up inspection.
Fix direction
Clean boards before power-up and inspect for conductive debris or bridges.
Maturity signal
Physical board condition is part of electrical reliability.
Failure analysis

Kelvin sensing correction

Symptom
Shunt sensing layout needed correction.
Root cause
Current measurement was influenced by load-current copper paths.
Fix direction
Use true Kelvin routing for current sensing.
Maturity signal
Precision sensing depends on layout, not only schematic intent.
Failure analysis

W5500 SPI Ethernet failure under motor operation

Symptom
W5500 SPI Ethernet module failed after 2-3 minutes under motor operation.
Root cause
The module was powered from a shared 3.3 V buck tied to a motor rail; motor spikes, EMI, and ground bounce corrupted supply, SPI, and Ethernet stability, and repeated stress may have damaged modules.
Fix direction
Use clean isolated 3.3 V, local bulk and high-frequency decoupling, reset supervisor, shorter SPI traces, solid ground return, shielding or twisted pairs where needed, and avoid powering sensitive Ethernet PHYs from noisy motor rails.
Maturity signal
Industrial protocols need clean electrical foundations.
Failure analysis

WebSocket timeout states

Symptom
Control path entered timeout states during unstable communication iterations.
Root cause
Command stream and firmware state handling needed tighter watchdog and recovery behavior.
Fix direction
Make timeout, brake, reconnect, and command rejection states explicit.
Maturity signal
Determinism requires observable state transitions.
Failure analysis

ESP_STATE reported W5500 as 0.0.0.0

Symptom
Firmware state reported the Ethernet interface as 0.0.0.0.
Root cause
Ethernet initialization, link stability, or network assignment was not valid in that state.
Fix direction
Expose network state clearly and gate motor command paths on validated Ethernet readiness.
Maturity signal
Status reporting is part of safety and debug architecture.
Failure analysis

Display/control lockup

Symptom
OLED display and control logic locked up in unstable firmware iterations.
Root cause
Firmware scheduling, blocking behavior, or peripheral error handling was not stable enough.
Fix direction
Isolate display updates, protect control loops, and recover failed peripherals without blocking actuation safety.
Maturity signal
Debug UI must not destabilize the control path.
Failure analysis

Python controller command path did not initially move motors

Symptom
Controller commands reached software but motors did not move until command/register behavior was corrected.
Root cause
Command format or register behavior did not match motor control expectations.
Fix direction
Validate protocol writes against known motor registers and build command confirmation telemetry.
Maturity signal
Field robotics fails at interfaces, not isolated components.
Failure analysis

Missing Python evdev dependency

Symptom
Controller workflow failed due to a missing Python `evdev` dependency.
Root cause
Runtime environment did not match the control stack dependency assumptions.
Fix direction
Document setup, pin dependencies, and validate environment before field runs.
Maturity signal
Software environment reproducibility is part of system reliability.
Failure analysis

Daemon restart and source update workflow problems

Symptom
System service updates and restarts created friction during iterative debugging.
Root cause
Deployment workflow was not structured enough for repeated field iteration.
Fix direction
Use clear restart procedures, versioned service files, and update checks.
Maturity signal
Operational workflow is part of engineering maturity.
System-Level Lessons

Corrective Principles

  • Power integrity is not optional.
  • Determinism requires isolation from non-real-time layers.
  • Industrial protocols need clean electrical foundations.
  • Bring-up must be staged.
  • Field robotics fails at interfaces, not isolated components.
  • Failure logs are portfolio evidence.
Archive Notes

Technical Writeup

Archive Positioning

This page is not a weakness list. It is evidence of serious engineering exposure: power electronics bring-up, rail debugging, Ethernet instability, firmware state failures, dependency issues, and deployment workflow problems.

The archive preserves corrective principles so later designs can be better constrained before they fail.

Failure Archive | Sujal Bhakare