Table of Contents#

Introduction
Project Overview
OSINT Pipeline & Chrome Extension
AI Template Generation
Email Delivery Pipeline
Tracking System
Conclusion & Future Work

Introduction#

Here’s a question that bothered me for a while: if your organization runs phishing simulations every quarter, are you actually more secure — or just more comfortable?

That discomfort is what pushed me toward this project.

Phishing is still the entry point for over 90% of successful cyberattacks. Not because defenders aren’t trying, but because attackers have quietly upgraded their weapons. With the rise of Generative AI and Large Language Models, crafting a convincing, personalized spear-phishing email no longer requires a skilled social engineer — it takes a good prompt and a few seconds. Research shows that AI-generated phishing content can match or even exceed the persuasiveness of expert human attackers, and targets can correctly identify AI-written messages only about 52% of the time. Barely better than a coin flip.

The problem? Most phishing simulation platforms haven’t kept up. They still send the same generic “Your account will be suspended” emails with a logo slapped on top. Employees learn to spot those — not the real thing. That’s the realism gap: a false sense of security built on outdated simulations.

There’s also a second, quieter problem: the forensics gap. Even when simulations run, the audit trails are thin. Who approved the campaign? What exactly was sent, and to whom? When something goes wrong — or when you need to prove compliance — the answer is often a shrug.

Those two gaps are what RedTeam Phish Suite was built to close.

What I Set Out to Build#

The idea was a full-stack phishing simulation platform that works the way modern attackers actually work:

Collect real professional context on targets via OSINT
Feed that context to an LLM to generate a unique, personalized email per target — not a template with a name swapped in
Deliver it, track every interaction (opens, clicks, credential submissions), and log everything with full visibility

The platform operates in two distinct modes. Red Team Mode is built for penetration testers and security consultants who need to simulate realistic adversarial campaigns against a target organization — with full OSINT integration, AI personalization, and detailed engagement metrics. Awareness Mode flips the script: same infrastructure, different intent — non-punitive, educational, designed to train rather than test. One platform, two use cases, zero compromise on either.

But it also had to be responsible. There’s a mandatory human approval gate on every generated template. And an Awareness Mode exists alongside Red Team Mode — one to test, one to teach, without punishing people for clicking.

Project Overview#

What the Platform Does#

RedTeam Phish Suite is a full-stack web application for running end-to-end phishing simulations. An operator logs in, creates a campaign, uploads a CSV of targets, collects OSINT data on them, generates personalized emails via AI, approves them, launches the campaign, and watches the interactions come in on a real-time dashboard. Everything — from target ingestion to email delivery to click tracking — lives in one place.

The platform runs in two modes that share the same infrastructure but behave differently:

Red Team Mode is for security consultants and penetration testers running authorized adversarial simulations. It’s built for realism: OSINT-grounded email generation, brand-cloned landing pages, and detailed engagement metrics. When a target clicks and lands on the page, they get a benign proof-of-concept artifact — an HTML, PDF, or DOCX file that proves the interaction happened without deploying anything malicious.

Awareness Mode is the other side of the same coin. Same campaign management, same delivery pipeline — but the post-click experience changes. Instead of a fake login page, the target sees an educational page explaining what just happened and why it worked. Non-punitive, constructive, designed to teach rather than catch.

Campaign Lifecycle#

A campaign in the platform moves through a strictly enforced state machine.

Every transition is validated server-side. A campaign can’t be launched without being approved first. It can’t be edited once it leaves draft. The approval gate is the platform’s primary safeguard — a human has to sign off on every target list and every generated template before a single email goes out.

When a campaign launches, the backend returns a 202 Accepted immediately and the background send loop takes over. The frontend polls for status updates while emails go out. Once all are sent, the campaign moves to active and stays there until the operator stops it or all tracked interactions are complete.

OSINT Pipeline & Chrome Extension#

9 Attempts, 8 Failures, and the Solution That Actually Worked#

If I had to pick one part of this project that tested my patience more than anything else, it’s this one.

The premise was simple: to generate a truly personalized phishing email per target, the platform needs real professional context — job title, employer, background, recent activity. LinkedIn is the obvious source. Collecting that data programmatically turned out to be anything but simple.

Nine approaches. Eight blocked. One account restricted and nearly banned. Here’s the full story.

Why LinkedIn Data?#

Generic phishing templates use placeholder variables — {{first_name}}, {{department}}. They swap in a name and call it personalization. That’s not what this platform does.

The goal was to feed an LLM the target’s actual professional identity: their real role, their employer, their industry context. The model would then construct a pretext grounded in who that person actually is — not a template with a name dropped in. That’s the difference between an email that could have been sent to anyone and one that feels like it was written specifically for you.

For that to work at scale, LinkedIn data collection needed to be automated.

The 8 Approaches That Failed#

1. linkedin-api Python Library : The first attempt used a Python library that authenticates directly with LinkedIn’s internal API. LinkedIn detected the login originated from a server rather than a real browser, triggered a security checkpoint, and blocked the account after verification was completed.

2. Playwright Automated Browser : Playwright driving a real Chromium instance, headless=False. LinkedIn’s bot detection identified the automated browser and returned a blank page that never finished loading. On the rare occasion it partially loaded, a card verification prompt appeared.

3. Scrapling StealthyFetcher : A library built specifically for bypassing anti-bot mechanisms with browser fingerprint spoofing. LinkedIn redirected every request to /authwall. The solve_cloudflare=True flag did nothing. No profile content was returned.

4. StealthySession + user_data_dir : Attempted to persist a full browser profile to avoid repeated logins. The browser failed to launch correctly in the required configuration, and when it did, LinkedIn requested manual verification and rejected the saved profile on the next run.

5. Manual Cookie Injection (li_at + JSESSIONID) : Cookies were manually extracted from a real authenticated session and injected into the scraper. This one almost worked — LinkedIn’s Voyager API returned valid responses initially. But the returned data had null fields, suggesting LinkedIn serves limited data to sessions it considers suspicious. The session expired in under 24 hours, and the account was restricted requiring government ID verification.

6. LinkedIn Voyager API + Fresh Cookies : A direct call to LinkedIn’s internal Voyager API with valid session cookies produced a TooManyRedirects error once they expired, cycling into an /authwall redirect loop. Even with fresh cookies, the data fields came back empty.

7. Patchright : A Playwright fork specializing in bot detection evasion. The channel='chrome' configuration didn’t function correctly on Windows, and when the browser launched, LinkedIn returned the same blank page as the original Playwright attempt.

8. Scrapingdog API : A managed LinkedIn scraping API that handles detection evasion server-side. This one worked — but only for public profiles. Private and premium profiles returned 404s. Experience and education fields were occasionally empty. The free tier was capped at 10 accounts. Not viable at scale.

The Solution That Actually Worked#

The viable approach turned out to be the opposite of automation: a purpose-built Chrome extension that operates inside the user’s own authenticated browser session. Because the extension runs inside a real, logged-in session, LinkedIn doesn’t detect it as automation. There are no credentials to steal, no API keys, no external services. The user is genuinely authenticated — LinkedIn serves the full profile data because as far as it can tell, a real person is browsing.

The extension extracts the profile data from the current LinkedIn page and exports it as a JSON file. That file is then imported into the platform via a bulk upload endpoint.

On the matching side: the platform needs to correlate imported LinkedIn profiles with the targets already uploaded via CSV. The matcher first attempts to match by LinkedIn URL — if the CSV contains a linkedin_url column and the imported profile URL matches, it’s an unambiguous one-to-one link. When no URL is present, it falls back to name-based matching. Matches above the confidence threshold are linked automatically; lower-confidence candidates are surfaced for operator review.

This resolved every problem the previous approaches had — no detection, no account risk, complete profile data, and no dependency on any third-party service.

The Chrome extension is now the primary OSINT collection mechanism in the platform. It’s not the elegant automated solution I originally envisioned, but it’s the one that actually works — and in retrospect, it’s more defensible from an ethics standpoint too. The operator is using their own authenticated session, collecting data that’s visible to any logged-in LinkedIn user, not bypassing any access controls.

AI Template Generation#

How the Platform Turns Profile Data into a Personalized Phishing Email#

Every existing phishing simulation platform does personalization the same way: take a static template, drop in {{first_name}} and {{department}}, send. The content doesn’t change — only the substituted fields do.

RedTeam Phish Suite does something different. It generates a completely new email for each target from scratch.

The Core Idea#

Once the OSINT data is imported and matched to a target, the platform has real professional context to work with — job title, employer, industry, background. That context gets fed into an LLM prompt, and the model constructs a pretext that’s specific to that person: referencing their actual role, their company, their industry language. Not a template with a name dropped in — a unique email written around who that person actually is.

The difference in output is significant. A generic template might say “Dear Sarah, your Microsoft account requires attention.” An OSINT-grounded generation might reference the target’s actual employer, frame a plausible business scenario relevant to their role, and use language that matches their industry. The pretext feels like it came from someone who knows them.

How the Pipeline Works#

When an operator requests a template, the templateService constructs the prompt through branching logic based on what’s available:

OSINT data exists for the target → the prompt is built using their LinkedIn profile: job title, employer, professional background, recent activity
No OSINT data → the operator supplies a manual context string (minimum 15 characters), which the service uses instead

In both cases, the prompt also carries mode-specific instructions. Awareness mode injects a hidden HTML comment marker into the output so the email can be identified as a training exercise after the fact. Red Team mode does the opposite — it explicitly forbids any disclaimers or meta-commentary in the generated content.

The prompt also includes strict HTML formatting guardrails: container width, padding, font size, button dimensions. Email client rendering is inconsistent enough that leaving layout to the model produces unpredictable results, so the structure is enforced at the prompt level.

CTA Handling and Output Parsing#

Custom URLs go in verbatim. Preset landing pages — Microsoft 365, Google, LinkedIn, PayPal — get a {{TRACKING_LINK}} placeholder that the send loop replaces per target at delivery time.

The LLM response goes through parseAiOutput(), which cleans the payload and infers the CTA label from brand cues — a Microsoft 365 preset yields “Verify My Account” automatically. If the LLM call fails entirely, two deterministic fallback paths generate plausible HTML templates without AI, so the pipeline never blocks.

The Operator’s Role#

Generation isn’t fire-and-forget. Every template the model produces goes into a review queue. The operator previews the subject line and email body inline, can refine it with additional context (which gets accumulated across iterations), and either approves or discards it. Nothing reaches a target’s inbox without a human having signed off on it.

This approval gate is intentional — it’s the platform’s primary operational safeguard. The AI does the work; the operator takes responsibility for what goes out.

Email Delivery Pipeline#

From Launch to Webhook#

Once templates are approved and the campaign is launched, the platform hands off to a background send loop.

Each email gets two things injected before it goes out: a 1×1 tracking pixel tied to a unique token per target, and a tracked redirect link replacing the {{TRACKING_LINK}} placeholder. The pixel fires on open, the link fires on click — both hit the platform’s tracking endpoints and update the target’s status in real time.

Resend handles delivery and fires webhooks back for every significant event:

Event	Status Update
`email.delivered`	`sent`
`email.opened`	`opened`
`email.clicked`	`clicked`
`email.bounced`	`failed`

Webhook payloads are verified via Svix signature validation before anything gets processed. The match back to the right target happens via tags embedded in the original send — target_id and campaign_id travel with every email.

The frontend polls for status updates while the send loop runs. Once all emails are out, the campaign flips to active and stays there until the operator stops it or all interactions are complete.

Tracking System#

Pixel Tracking, Link Tracking & Credential Submission#

The platform tracks four interaction types per target: open, click, POC view, and credential submission. Each maps to a distinct endpoint.

Pixel Tracking — Opens#

Every outgoing email contains a 1×1 transparent GIF embedded at send time:

1
GET /track/pixel/:trackingId

When the email client loads the image, the request hits this endpoint, logs the open event, updates the target status to opened, and records the timestamp. One call, one event — idempotent, so multiple opens don’t create duplicate records.

Link Tracking — Clicks#

Every CTA link in the email resolves to a tracked redirect:

1
GET /track/link/:trackingId

The endpoint logs the click, updates status to clicked, then issues a 302 redirect to the actual landing page. The target never sees the tracking URL — from their perspective, they clicked a normal link.

Credential Submission#

When a target submits credentials on the POC landing page, the form posts to:

1
POST /track/submit

This logs the submission, flips the poc_interacted flag, and records the event. No real credentials are stored beyond what’s needed for the simulation record — and in Awareness mode, the post-click experience skips the landing page entirely and serves an educational page instead.

What the Dashboard Shows#

All four interaction types surface in real time on the analytics dashboard — open rate, click rate, submission rate, per-target interaction matrix. Every number is computed against the underlying database records, so what the operator sees reflects exactly what happened.

Conclusion & Future Work#

Where It Ended Up#

RedTeam Phish Suite set out to close two gaps: the realism gap between how existing simulation platforms work and how modern attackers actually operate, and the forensics gap in audit capability that makes post-engagement reporting harder than it needs to be.

Both were addressed. The platform delivers end-to-end campaign management — from target ingestion and OSINT collection through AI-generated personalized emails, delivery, interaction tracking, and analytics — in a single operator-facing system. The dual-mode design means it serves both red team engagements and awareness training without needing two separate tools.

The LinkedIn OSINT problem took the longest to solve and produced the most unexpected outcome: the Chrome extension approach turned out to be more reliable, more ethical, and harder to detect than any of the automated alternatives. Sometimes the less elegant solution is the right one.

What’s Next#

The platform is a solid foundation, but there’s meaningful work left to do:

Multi-vector support — Email is the only delivery channel right now. Extending to SMS, WhatsApp, and Telegram would reflect how social engineering actually happens across modern communication platforms.

Campaign scheduling — All launches are currently manual and immediate. Scheduling support would allow staggered sends, delayed campaigns, and periodic awareness reminders without operator intervention.

Chrome extension direct API integration — The current workflow requires manual JSON export from the extension and import into the platform. A direct API connection would make this one click instead of a multi-step process.

Interactive post-click training — The Awareness mode landing page is currently static. Replacing it with an interactive module that explains the specific red flags in the email the target just clicked — and why they worked — would significantly increase the educational value of every simulation.

Target difficulty scoring — Aggregating interaction behavior across campaigns over time would let the platform identify repeat clickers and surface high-risk individuals for prioritized training, turning one-off simulations into a continuous risk assessment program.

If you’ve read this far, thanks. This was the most technically involved thing I’ve built, and writing it up helped me understand what I actually learned from it. If you’re working on anything in the phishing simulation or security awareness space and want to compare notes, I’m around.

Baba Yaga's Corner