PhishGuard
ML-Powered Real-Time Phishing URL Detection & Threat Analysis
01 / Project Overview
A full-stack cybersecurity application that analyzes URLs in real-time for phishing indicators using a multi-layer detection pipeline. PhishGuard extracts 30+ URL structural features (domain age, SSL validity, redirect chains, suspicious keywords, typosquatting similarity), feeds them through a trained gradient boosting classifier, and returns risk scores with detailed explanations. The browser extension integration enables in-line protection during web navigation.
02 / The Challenge & Problem
Real-World Problem Statement
Phishing attacks are the leading vector for credential theft and malware distribution, yet most users rely solely on browser safe-browsing lists that have an average 12-hour delay between site creation and blacklisting. Novel phishing domains exploit this window to harvest thousands of credentials before detection.
03 / The Engineering Solution
Implementation & Architectural Approach
Built a zero-day phishing detection system that relies on structural URL analysis rather than blacklists, enabling real-time assessment of previously unseen URLs. The heuristic + ML ensemble achieves 96.2% detection accuracy with a false positive rate under 0.8%, making it viable for production use without disrupting legitimate browsing.
04 / Technical Architecture Flow
URL Parser & WHOIS Resolver
Decomposes URLs into 30+ features including domain age, HTTPS presence, redirect depth, special character density, and Levenshtein distance to top-500 domains.
Gradient Boosting Classifier
XGBoost-based model trained on 50,000 labeled URLs, outputting phishing probability scores with SHAP-based feature attribution for explainability.
Node.js REST API + Extension
Express API serves real-time predictions; companion browser extension intercepts navigation events and overlays risk indicators on flagged pages.
05 / Key Project Features
Zero-Day URL Analysis
Detects phishing on URLs not in any blacklist by analyzing structural patterns characteristic of malicious domains.
SHAP Explainability Reports
Provides per-URL explanations highlighting the specific features (e.g., domain age, typosquatting) driving the risk score.
Browser Extension Integration
Real-time visual overlays on navigation bar warn users before page load completes, preventing credential entry on phishing sites.
06 / Engineering Challenges & Mitigations
Legitimate URLs from new domain registrations were incorrectly flagged as phishing due to low domain age.
Added domain reputation signals from VirusTotal API and built a whitelist of verified new registrars to reduce false positives on legitimate new sites.
WHOIS lookup latency caused prediction delays exceeding 3 seconds for some queries.
Implemented an LRU cache for recent WHOIS lookups and a timeout fallback that uses only local structural features when WHOIS is slow.
07 / Technical & Personal Learnings
Gained deep knowledge of phishing attack patterns, URL obfuscation techniques, and the adversarial machine learning arms race in cybersecurity.
Mastered SHAP model explainability, enabling non-technical stakeholders to understand and trust automated security decisions.