Rishabh Durugkar | Infrastructure · Networking

About

I build reliable infrastructure systems that scale under pressure. My focus is on network reliability, observability pipelines, and security systems thinking. I work at the intersection of AI-ready networking, automation, and operational resilience—designing systems that handle failures gracefully and optimize resources efficiently.

From operating enterprise L2/L3 networks to building AI-aware traffic optimizations and GPU observability correlators, I approach infrastructure as a cohesive system where compute, fabric, and defense layers must work in harmony.

Experience

Independent Infrastructure Consultant (NDA)

Sep 2022 – Present

Remote

Project-based infrastructure + automation work under NDA; AI-ready networking, observability, failure analysis, cost optimization.

AI-aware traffic simulations for congestion/latency risk
GPU/infrastructure observability correlating compute/network/workload
Resilient edge inference under constrained networks
Traffic-aware security detection for AI workloads
Failure injection testing for faster RCA
Cost driver modeling and optimization strategies

Security Network Engineer

Wipro Ltd • Jan 2021 – Aug 2022

Hybrid

Operated enterprise L2/L3 networks (BGP, OSPF, SD-WAN, VPNs)
Supported hybrid cloud networking (AWS VPCs, IPSec, firewall policy enforcement)
On-call escalation; incident response; RCA and permanent fixes
Automated config/validation workflows (Python, Terraform, PowerShell)
Change management and compliance constraints

Projects

AI-Aware Network Traffic Optimization

Fabric

Problem

AI training workloads create unpredictable traffic patterns leading to congestion and latency spikes.

Build

Traffic simulation engine analyzing GPU communication patterns, predicting congestion points, and optimizing routing decisions.

Outcome

Reduced latency spikes under simulated peak load through predictive traffic shaping experiments.

Stack

Python • Network Simulation • Traffic Analysis • BGP

Details

GPU & Infrastructure Observability Correlator

Compute

Problem

Performance degradation in GPU clusters often stems from network or storage issues, not compute itself.

Build

Multi-layer observability pipeline correlating GPU utilization, network throughput, and workload characteristics.

Outcome

Improved incident pinpointing under load by correlating infrastructure and compute signals.

Stack

Prometheus • Grafana • NVIDIA SMI • Network Telemetry

Details

Edge Inference Under Constrained Networks

Fabric

Problem

Edge inference nodes face unreliable connectivity and bandwidth constraints while requiring consistent SLAs.

Build

Adaptive inference system with intelligent model caching, request batching, and graceful degradation strategies.

Outcome

Kept service responsive in high loss simulations using resilient edge and routing policies.

Stack

ONNX Runtime • Redis • Network QoS • Failover Logic

Details

Traffic-Aware Security Detection for AI Workloads

Defense

Problem

Traditional security tools generate false positives on AI workload traffic patterns (burst transfers, large payloads).

Build

ML-based anomaly detection system trained on legitimate AI traffic patterns with context-aware alerting.

Outcome

Lowered noisy alerts in validation runs while surfacing real threats on AI traffic patterns.

Stack

Zeek • Suricata • Machine Learning • Flow Analysis

Details

Failure Injection Lab for Faster RCA

Compute

Problem

Teams struggle to identify root causes quickly during outages due to lack of failure pattern knowledge.

Build

Chaos engineering platform for systematic failure injection with automated symptom cataloging and playbooks.

Outcome

Reduced average incident resolution time from 4 hours to 45 minutes through documented failure patterns.

Stack

Chaos Mesh • Kubernetes • Terraform • Runbooks

Details

Cost Driver Modeling for AI Infrastructure

Compute

Problem

Cloud AI infrastructure costs spiral without visibility into primary cost drivers and optimization opportunities.

Build

Cost attribution system mapping workload characteristics to resource consumption with actionable optimization recommendations.

Outcome

Surfaced cost drivers in modeling exercises, informing GPU sizing and network optimization choices.

Stack

FinOps • Cloud Billing APIs • Data Analysis • Optimization

Details

Education

MSc / Graduate Diploma

Electronic & Computer Technology (IoT) • Dublin • 2022–2025

Capstone: Arcane Guard — AI-Driven Security for IoT Networks

ML/DL-based intrusion detection system with real-time pipeline processing, scalable architecture, and false positive reduction through multi-stage classification.

Bachelor of Science

Network & Technology

Certifications

CCNP Enterprise

In Progress

CCNP ENAUTO

In Progress

NVIDIA AI Infrastructure

In Progress

PNPT

AI Infrastructure

GPU Cluster Orchestration
NVIDIA Architecture (H100, A100)
InfiniBand / RDMA Networking
AI Workload Optimization
Model Deployment Pipelines
Resource Scheduling

Networking

BGP, OSPF, SD-WAN
Enterprise L2/L3 Networks
VPN (IPSec, WireGuard)
AWS VPC, Hybrid Cloud
Network Observability
Traffic Engineering

Automation

Python, Bash, PowerShell
Terraform, Ansible
CI/CD Pipelines
Infrastructure as Code
Configuration Management
Workflow Orchestration

Cloud & Observability

AWS, Azure, GCP
Kubernetes, Docker
Prometheus, Grafana
Log Aggregation
Distributed Tracing
Incident Response

Contact

Interested in infrastructure reliability, AI-ready networking, or systems architecture? Let's connect.

GitHub LinkedIn Email Resume PDF

Infrastructure · Networking · Security Engineer

About

Experience

Independent Infrastructure Consultant (NDA)

Security Network Engineer

Projects

AI-Aware Network Traffic Optimization

Problem

Build

Outcome

Stack

GPU & Infrastructure Observability Correlator

Problem

Build

Outcome

Stack

Edge Inference Under Constrained Networks

Problem

Build

Outcome

Stack

Traffic-Aware Security Detection for AI Workloads

Problem

Build

Outcome

Stack

Failure Injection Lab for Faster RCA

Problem

Build

Outcome

Stack

Cost Driver Modeling for AI Infrastructure

Problem

Build

Outcome

Stack

Education

MSc / Graduate Diploma

Bachelor of Science

Certifications

CCNP Enterprise

CCNP ENAUTO

NVIDIA AI Infrastructure

PNPT

DevSecOps

CCT

Skills

AI Infrastructure

Networking

Automation

Cloud & Observability

Contact