SYS ONLINE

RishabhDurugkar

Infrastructure · Networking · Security

Reliability, automation, and observability for AI-ready environments.

AI Infrastructure Focus

Press ⌘K or Ctrl+K to open the command palette

System Status
AI Infrastructure Focus
Experience4+ years
Certifications5 active
Projects6 deployed
CloudAWS · Azure
Primary LangPython
Focus

AI-ready infrastructure + network automation

Background

Wipro (Security Network Eng) + NDA consulting

Status

Open to full-time opportunities

About

I design, build, and operate infrastructure systems that handle real production load. My work focuses on reliability engineering, network automation, and security validation for hybrid cloud and AI-ready environments.

I believe in treating infrastructure as code, observability as a first-class concern, and automation as a reliability multiplier. Every system I build includes monitoring, alerting, and runbooks from day one.

Currently exploring AI infrastructure challenges—GPU cluster optimization, high-speed networking for distributed training, and observability tooling for machine learning workloads.

Experience

Independent Infrastructure Consultant

NDA Projects

Dec 2025 - Present
  • Infrastructure architecture design and optimization for enterprise clients
  • Network automation and configuration management implementation
  • Observability stack deployment and custom dashboard development
  • Security validation and incident response automation

Security Network Engineer

Wipro Limited

Jun 2024 - Nov 2025
  • Managed enterprise network security infrastructure including firewalls and segmentation
  • Led incident response efforts for network security events
  • Implemented network automation pipelines for configuration management
  • Developed monitoring and alerting solutions for network infrastructure

Projects

View All →

Multi-Cloud Network Automation Pipeline

Problem

Manual network configuration changes across AWS, Azure, and on-prem infrastructure led to configuration drift and prolonged incident response times.

Build

Built an event-driven automation pipeline using Terraform, Ansible, and Python. Integrated with ServiceNow for change tracking and Slack for real-time notifications.

Outcome

Reduced configuration deployment time and improved change audit compliance.

Stack

PythonTerraformAnsibleAWSAzure

AI Infrastructure Observability Stack

Problem

GPU clusters for ML workloads lacked visibility into resource utilization, thermal performance, and network bottlenecks.

Build

Deployed Prometheus, Grafana, and custom DCGM exporters. Built correlation dashboards linking GPU utilization, network throughput, and storage I/O.

Outcome

Enabled proactive identification of bottlenecks and improved workload scheduling efficiency.

Stack

PrometheusGrafanaPythonNVIDIA DCGMeBPF

Zero-Trust Network Segmentation

Problem

Legacy flat network architecture posed security risks and made it difficult to contain lateral movement during incidents.

Build

Designed and implemented microsegmentation using Cisco ACI and Palo Alto firewalls. Created policy-as-code framework for network access control.

Outcome

Enhanced security posture and improved incident containment capabilities.

Stack

Cisco ACIPalo AltoPythonTerraform

Automated Incident Response Platform

Problem

Manual incident triage and response led to inconsistent handling and delayed remediation.

Build

Built incident response automation using Python and integrated with PagerDuty, Slack, and SIEM. Implemented automated runbooks for common scenarios.

Outcome

Standardized incident response procedures and reduced time to initial response.

Stack

PythonPagerDutySplunkSlack API

Infrastructure-as-Code Pipeline

Problem

Manual infrastructure provisioning was error-prone and difficult to audit.

Build

Implemented full IaC pipeline using Terraform Cloud, GitHub Actions, and policy validation with OPA. Created reusable modules for common patterns.

Outcome

Improved infrastructure consistency and reduced provisioning time.

Stack

TerraformGitHub ActionsOPAAWS

Network Performance Analysis Tool

Problem

Troubleshooting network performance issues required manual packet analysis and correlation across multiple sources.

Build

Developed custom Python tool for automated packet capture analysis, flow correlation, and anomaly detection using statistical methods.

Outcome

Accelerated network troubleshooting and improved root cause identification.

Stack

PythonWiresharkeBPFPandas

Education

Bachelor of Engineering in Computer Science

University Name

2017 - 2021
  • Focus on Computer Networks and Distributed Systems
  • Relevant coursework in Network Security and Cloud Computing

Certifications

NVIDIA AI Infrastructure and Operations

NVIDIAJan 2026

GPU cluster management, optimization, and AI workload infrastructure

CCNP Enterprise

CiscoOct 2025

Advanced routing, switching, and troubleshooting

CCNP Enterprise: Core Networking (ENCOR)

CiscoOct 2025

Enterprise network architecture and core technologies

Automating Cisco Enterprise Solutions (ENAUTO)

CiscoOct 2025

Network automation, programmability, and orchestration

CCNA

CiscoMay 2022

Network fundamentals and Cisco technologies

Skills

Infrastructure & Cloud

AWSAzureTerraformKubernetesDockerAnsible

Networking

Cisco (Routing/Switching)BGPOSPFVLANsVPNsSD-WAN

Security

Palo AltoNetwork SegmentationZero TrustSIEMIDS/IPS

Observability

PrometheusGrafanaELK StackDatadogOpenTelemetry

Automation & Scripting

PythonBashPowerShellGitCI/CD

AI Infrastructure

NVIDIA GPUsCUDARayHigh-Speed NetworkingStorage Optimization
Mission Control

Contact

Select a channel and initiate contact.

Comms · Select Channel

Or reach out directly via LinkedIn or GitHub