Sujal Bhakare
Systems Engineering and Research Portfolio
Failure-driven design

Failure Analysis Index

Field issues, root-cause records, correction paths, and what each failure revealed about the system.

Failure analysis

W5500 link loss under motor load

Deterministic Rover Controls
Symptom
The SPI Ethernet path lost link after several minutes during motor operation.
Root cause
A shared noisy 3.3 V rail tied to the motor environment, motor current spikes, EMI, ground bounce, and SPI/Ethernet sensitivity destabilized the link.
Fix direction
Use a clean isolated 3.3 V rail, local bulk and high-frequency decoupling, reset supervision, shorter SPI traces, solid ground return, and shielding where needed.
Maturity signal
Electrical integrity is part of the control architecture, not a separate afterthought.
Failure analysis

Brownout risk

Wearable Continuous Audio Intelligence
Symptom
Always-on capture and BLE activity can stress small wearable power margins.
Root cause
Transient current, regulator behavior, and battery state can interact during radio and write activity.
Fix direction
Validate regulator headroom, staged power modes, brownout logging, and safe recovery metadata.
Maturity signal
Documents constraint discovery, root-cause isolation, and design correction instead of hiding bring-up risk.
Failure analysis

Buffer overflow risk

Wearable Continuous Audio Intelligence
Symptom
Audio context can be overwritten before event persistence completes.
Root cause
Flash write latency or BLE scheduling can block timely buffer handling.
Fix direction
Use deterministic producer/consumer boundaries, backpressure, and event metadata checkpoints.
Maturity signal
Documents constraint discovery, root-cause isolation, and design correction instead of hiding bring-up risk.
Failure analysis

RF instability risk

Wearable Continuous Audio Intelligence
Symptom
BLE range or reliability may degrade when worn or enclosed.
Root cause
Antenna detuning, layout constraints, ground interaction, and enclosure effects.
Fix direction
Tune the pi matching network and validate in realistic mechanical conditions.
Maturity signal
Documents constraint discovery, root-cause isolation, and design correction instead of hiding bring-up risk.
Failure analysis

LM5148 buck converter failure

Failure Archive
Symptom
24 V to 12 V buck output was stuck around 0.04 V and switching traces burned.
Root cause
Suspected layout, current path, switch-node, compensation, or startup issue.
Fix direction
Apply strict high-current buck layout discipline, current-limited bring-up, thermal inspection, and staged validation.
Maturity signal
High-current buck converters require layout and validation discipline before they are treated as solved power blocks.
Failure analysis

VCC to PGND abnormal low resistance

Failure Archive
Symptom
Measured around 33 ohms between VCC and PGND.
Root cause
Concern around controller IC damage or unexpected rail loading.
Fix direction
Verify controller rails before power-up and isolate IC power pins during debug.
Maturity signal
Rail resistance measurements need context before applying power.
Failure analysis

eFuse / hot-swap path issue

Failure Archive
Symptom
VBAT_EFUSE measured around 1.1 V despite 24 V input, with MOSFET gate at 0 V.
Root cause
Likely gate drive, UVLO, sense, or controller startup issue.
Fix direction
Perform systematic pin-state validation before bypassing the hot-swap controller.
Maturity signal
Protection controllers must be debugged as active systems, not passive connectors.
Failure analysis

Rail short / false continuity confusion

Failure Archive
Symptom
Multimeter beeped on rails with bulk capacitors present.
Root cause
Capacitors caused transient continuity behavior.
Fix direction
Distinguish real 0-ohm shorts from capacitor charging behavior.
Maturity signal
Measurement tools can mislead when interpreted without circuit context.
Failure analysis

Soldering and reflow residue

Failure Archive
Symptom
Flux residue and solder paste leftovers remained after reflow.
Root cause
Board was not clean enough before power-up inspection.
Fix direction
Clean boards before power-up and inspect for conductive debris or bridges.
Maturity signal
Physical board condition is part of electrical reliability.
Failure analysis

Kelvin sensing correction

Failure Archive
Symptom
Shunt sensing layout needed correction.
Root cause
Current measurement was influenced by load-current copper paths.
Fix direction
Use true Kelvin routing for current sensing.
Maturity signal
Precision sensing depends on layout, not only schematic intent.
Failure analysis

W5500 SPI Ethernet failure under motor operation

Failure Archive
Symptom
W5500 SPI Ethernet module failed after 2-3 minutes under motor operation.
Root cause
The module was powered from a shared 3.3 V buck tied to a motor rail; motor spikes, EMI, and ground bounce corrupted supply, SPI, and Ethernet stability, and repeated stress may have damaged modules.
Fix direction
Use clean isolated 3.3 V, local bulk and high-frequency decoupling, reset supervisor, shorter SPI traces, solid ground return, shielding or twisted pairs where needed, and avoid powering sensitive Ethernet PHYs from noisy motor rails.
Maturity signal
Industrial protocols need clean electrical foundations.
Failure analysis

WebSocket timeout states

Failure Archive
Symptom
Control path entered timeout states during unstable communication iterations.
Root cause
Command stream and firmware state handling needed tighter watchdog and recovery behavior.
Fix direction
Make timeout, brake, reconnect, and command rejection states explicit.
Maturity signal
Determinism requires observable state transitions.
Failure analysis

ESP_STATE reported W5500 as 0.0.0.0

Failure Archive
Symptom
Firmware state reported the Ethernet interface as 0.0.0.0.
Root cause
Ethernet initialization, link stability, or network assignment was not valid in that state.
Fix direction
Expose network state clearly and gate motor command paths on validated Ethernet readiness.
Maturity signal
Status reporting is part of safety and debug architecture.
Failure analysis

Display/control lockup

Failure Archive
Symptom
OLED display and control logic locked up in unstable firmware iterations.
Root cause
Firmware scheduling, blocking behavior, or peripheral error handling was not stable enough.
Fix direction
Isolate display updates, protect control loops, and recover failed peripherals without blocking actuation safety.
Maturity signal
Debug UI must not destabilize the control path.
Failure analysis

Python controller command path did not initially move motors

Failure Archive
Symptom
Controller commands reached software but motors did not move until command/register behavior was corrected.
Root cause
Command format or register behavior did not match motor control expectations.
Fix direction
Validate protocol writes against known motor registers and build command confirmation telemetry.
Maturity signal
Field robotics fails at interfaces, not isolated components.
Failure analysis

Missing Python evdev dependency

Failure Archive
Symptom
Controller workflow failed due to a missing Python `evdev` dependency.
Root cause
Runtime environment did not match the control stack dependency assumptions.
Fix direction
Document setup, pin dependencies, and validate environment before field runs.
Maturity signal
Software environment reproducibility is part of system reliability.
Failure analysis

Daemon restart and source update workflow problems

Failure Archive
Symptom
System service updates and restarts created friction during iterative debugging.
Root cause
Deployment workflow was not structured enough for repeated field iteration.
Fix direction
Use clear restart procedures, versioned service files, and update checks.
Maturity signal
Operational workflow is part of engineering maturity.
Failure analysis

Power noise corrupts control reliability

Deterministic Distributed Control Architecture
Symptom
Communication or actuator stability degrades during high motor load.
Root cause
Compute, control, and actuation domains are insufficiently isolated from load transients and EMI.
Fix direction
Redesign the power distribution board with stronger isolation, grounding, filtering, and domain-specific measurement points.
Maturity signal
Robotics reliability depends on electrical architecture as much as control logic.
Failure analysis

Latency spikes reach the actuation boundary

Deterministic Distributed Control Architecture
Symptom
AI compute or planning delay produces uneven command timing.
Root cause
Non-deterministic compute is allowed to influence actuator command cadence.
Fix direction
Enforce middleware-side command validation, rate limiting, stale-command rejection, and watchdog behavior.
Maturity signal
Real-time behavior must be enforced at the boundary closest to actuation.