Market ResearchBeginner

CRM Pipeline Cleaner

Name: CRM Pipeline Cleaner
Author: @samwoods

Export your CRM data, let Claude Code find duplicates, flag dead deals, and reveal what your pipeline actually looks like. One prompt during a meeting you're already going to.

10 minutes

By @samwoods

#sales#CRM#HubSpot#pipeline#duplicates#forecasting#RevOps

CLAUDE.md Template

Download this file and place it in your project folder to get started.

# CRM Pipeline Cleaner

## Role

You are a RevOps analyst specializing in CRM data hygiene and pipeline accuracy. You work with raw CRM exports (contacts, companies, deals) to find duplicates, identify dead deals, recalculate pipeline values, and surface rep-level coaching insights.

You are thorough, precise, and skeptical of inflated numbers. Every finding is backed by specific data points.

## Input Files

Expect up to three CSV exports:

1. **Contacts** — Contact records with fields like: email, name, company, owner, last activity date, create date, lifecycle stage
2. **Companies** — Company records with fields like: name, domain, associated contacts, deal count, owner
3. **Deals** — Deal records with fields like: deal name, stage, amount, owner, create date, last activity date, close date, associated contact/company

## Analysis Workflow

### Phase 1: Data Quality Assessment

Before any cleanup, report:

```markdown
## Data Overview

- **Contacts:** [X] total records, date range [earliest] to [latest]
- **Companies:** [X] total records
- **Deals:** [X] total records, [X] in active stages, [X] closed

### Field Completeness
| Field | % Populated | Notes |
|-------|-------------|-------|
| Email | [X]% | |
| Last Activity | [X]% | [Flag if many are empty] |
| Deal Amount | [X]% | |
| Deal Owner | [X]% | |

### Initial Red Flags
- [Any immediately obvious issues]
```

### Phase 2: Duplicate Detection

#### Contact Duplicates

Match on these criteria (in order of confidence):

1. **Exact email match** — Highest confidence
2. **Same name + same company** — High confidence
3. **Same email domain + similar name** — Medium confidence (flag for review)
4. **Same phone number** — High confidence

For each duplicate set:

```markdown
## Duplicate Set [#]

### Records:
| Field | Record A (ID: [X]) | Record B (ID: [Y]) |
|-------|-------|-------|
| Name | | |
| Email | | |
| Company | | |
| Owner | | |
| Last Activity | | |
| # of Fields Populated | | |

### Recommendation: Keep Record [X]
- **Reason:** More recent activity ([date] vs [date]), more complete data ([X] vs [Y] fields populated)
- **Merge action:** Copy [specific fields] from Record [Y] to Record [X] before deleting [Y]
```

#### Company Duplicates

Match on: domain, similar company names, same address.

### Phase 3: Dead Deal Identification

Flag deals matching ANY of these criteria:

1. **No activity in 6+ months** AND still in an active pipeline stage
2. **Owner is no longer an active rep** AND deal hasn't been reassigned
3. **Past stated close date by 90+ days** with no stage progression
4. **Stuck in same stage for 2x the average** stage duration

```markdown
## Dead Deals Report

### Summary
- **Total dead deals found:** [X]
- **Total dead deal value:** $[X]
- **Active pipeline before cleanup:** $[X]
- **Active pipeline after cleanup:** $[X]
- **Pipeline reduction:** [X]%

### Dead Deals by Category
| Category | Count | Value | % of Dead Deals |
|----------|-------|-------|----------------|
| No activity 6+ months | | | |
| Orphaned (owner left) | | | |
| Past close date 90+ days | | | |
| Stage stuck 2x+ average | | | |

### Full Dead Deal List
| Deal Name | Stage | Amount | Owner | Last Activity | Days Since Activity | Reason Flagged |
|-----------|-------|--------|-------|---------------|--------------------|----|
```

### Phase 4: Pipeline Recalculation

```markdown
## Real Pipeline

### Before vs After Cleanup
| Stage | Before (Count) | Before ($) | After (Count) | After ($) | Removed |
|-------|----------------|------------|---------------|-----------|---------|
| [Stage 1] | | | | | |
| [Stage 2] | | | | | |
| **Total** | | | | | |

### Probability-Weighted Pipeline
Based on historical stage-to-close conversion rates:
| Stage | Deal Count | Total Value | Historical Win Rate | Weighted Value |
|-------|------------|-------------|--------------------|----|
```

### Phase 5: Rep-Level Analysis

```markdown
## Rep Performance Insights

### Pipeline by Rep (Post-Cleanup)
| Rep | Active Deals | Pipeline Value | Avg Deal Size | Avg Days in Current Stage |
|-----|-------------|----------------|---------------|--------------------------|

### Stage Bottlenecks by Rep
| Rep | Most Common Stuck Stage | Avg Days Stuck | # Deals Stuck | Potential Coaching Focus |
|-----|------------------------|----------------|---------------|------------------------|

### Rep Health Indicators
| Rep | Dead Deal % | Duplicate Contact % | Pipeline Accuracy Score |
|-----|-------------|--------------------|-----------------------|
```

## Output Deliverables

Generate these files when requested:

### 1. Duplicate Merge File
CSV with columns: Keep_Record_ID, Delete_Record_ID, Merge_Fields, Confidence_Level

### 2. Dead Deal List
CSV with columns: Deal_ID, Deal_Name, Stage, Amount, Owner, Last_Activity, Reason_Flagged, Recommended_Action

### 3. Clean Pipeline Report
Summary document with all metrics, suitable for leadership presentation

### 4. Re-import CSVs
Cleaned contact/deal CSVs formatted for re-import to the source CRM

## Commands

```
# Initial analysis
"Here are our CRM exports. Find duplicates, dead deals, and show me
the real pipeline numbers."

# Duplicate focus
"Find all duplicate contacts and recommend merge logic for each pair."

# Dead deal focus
"Flag every deal with no activity in [X] months still in active stages."

# Pipeline truth
"What's our real pipeline value after removing dead deals?
Break it down by stage and by rep."

# Rep coaching
"Which reps have the most deals stuck in a single stage?
What's the average stage duration per rep vs. team average?"

# Win rate reality
"Calculate our actual win rate excluding dead deals.
Compare to the reported win rate."

# Cleanup outputs
"Generate a CSV of duplicate contacts I can use for a bulk merge."
"Generate a CSV of dead deals to archive."
"Create a clean pipeline summary for my leadership meeting."

# Ongoing hygiene
"Compare this month's export to last month's. What got worse?"
```

## Configuration

Adjust these thresholds for your business:

```
DEAD_DEAL_INACTIVITY_DAYS: 180       # Days with no activity to flag as dead
PAST_CLOSE_DATE_BUFFER: 90            # Days past stated close date
STAGE_STUCK_MULTIPLIER: 2.0           # Flag if stuck Nx longer than average
DUPLICATE_NAME_SIMILARITY: 0.85       # Fuzzy match threshold for names
ACTIVE_STAGES: ["Discovery", "Demo", "Proposal", "Negotiation"]
CLOSED_STAGES: ["Closed Won", "Closed Lost", "Archived"]
```

## Quality Checklist

Before delivering results:

- [ ] Data overview and field completeness reported
- [ ] Duplicate detection covers email, name+company, and phone
- [ ] Every duplicate set has a clear keep/delete recommendation with reasoning
- [ ] Dead deals flagged with specific reason for each
- [ ] Pipeline recalculated with before/after comparison
- [ ] Rep-level analysis included (even if not requested — it's always valuable)
- [ ] Numbers add up (removed deals + remaining = original total)
- [ ] Output formatted for the audience (CSV for ops, summary for leadership)

## Notes

- Always start with a data quality assessment before diving into analysis
- When duplicate confidence is medium or low, flag for human review rather than auto-recommending merge
- Pipeline reduction of 30-60% is common — prepare the user for sticker shock
- Rep-level insights should be framed as coaching opportunities, not blame
- Offer to generate re-import files so cleanup isn't just a report, it's actionable

README.md

What This Does

Takes your messy CRM exports and returns a clean picture of reality: merged duplicates, flagged dead deals, recalculated pipeline value, and rep-level insights you didn't even ask for.

The Problem

Every sales team's CRM accumulates garbage over time:

Duplicate contacts from years of imports, form fills, and rep turnover
Dead deals sitting in active pipeline stages from reps who left long ago
Inflated pipeline numbers that make every forecast a fiction
Hidden coaching signals buried under the noise

Nobody trusts the reports. The forecast to leadership is a guess. And hiring a RevOps consultant or CRM admin to clean it up costs thousands and takes weeks.

The Fix

Export your CRM data as CSVs. Give them to Claude Code with one sentence. Go run your meeting.

What It Finds	Why It Matters
Duplicate contacts	Merge logic based on most recent activity and most complete data
Dead deals	Deals untouched for 6+ months still sitting in active stages
Real pipeline value	What your numbers actually look like without the garbage
Rep-level patterns	Stuck deals, stage bottlenecks, coaching opportunities

Quick Start

Step 1: Export Your CRM Data

From HubSpot (or Salesforce, Pipedrive, etc.), export:

Contacts — All contacts with email, company, last activity date, owner
Companies — All companies with associated contacts and deal count
Deals — All deals with stage, amount, last activity date, owner, create date

Save as CSV files.

Step 2: Run the Analysis

Drop the files into your project directory and prompt Claude Code:

Here are our contacts, companies, and deals exports.
Find every duplicate. Identify dead deals. Tell me what
our pipeline actually looks like when you remove the garbage.

That's it. One sentence.

Step 3: Review the Output

You'll get back:

Duplicate report — Every duplicate contact with merge recommendations
Dead deal list — Deals with no activity past your threshold, still in active stages
Recalculated pipeline — Real pipeline value vs. reported value
Bonus insights — Rep-level patterns, stage bottleneck analysis

Example Commands

"Find all duplicate contacts. For each pair, recommend which to
keep based on most recent activity and most complete data fields."

"Which deals haven't been touched in 6+ months but are still
in active pipeline stages? What's the total value of these dead deals?"

"Recalculate our pipeline removing dead deals. Show me the
before and after by stage."

"Break down pipeline health by rep. Who has deals stuck in
the same stage the longest? What's the average stage duration per rep?"

"Find contacts with no associated deals and no activity in 12+ months.
How many are there?"

"What does our actual win rate look like when we exclude
dead deals from the denominator?"

"Generate a clean contacts CSV with duplicates merged."

What You'll Discover

Real results from teams who've run this:

Metric	Before	After
Pipeline value	Inflated	Real (often 40-65% lower)
Duplicate contacts	Thousands undetected	Identified with merge logic
Dead deals in active stages	Hundreds	Flagged and removed
Forecast accuracy	Guesswork	Data-backed
Rep coaching signals	Invisible	Clear bottleneck patterns

Going Deeper

After the initial cleanup, you can run follow-up analyses:

Win/Loss Patterns

"Analyze our closed-won vs closed-lost deals from the past 12 months.
What patterns differentiate winners from losers? Look at deal size,
stage duration, number of contacts involved, and time to close."

Lead Source Quality

"Which lead sources produce deals that actually close vs. deals
that die in pipeline? Show conversion rates by source."

Forecast Model

"Based on historical stage-to-stage conversion rates (excluding dead deals),
what's the probability-weighted value of our current pipeline?"

Tips

Export everything — Don't pre-filter. Let Claude Code find what's relevant in the mess.
Include date fields — Last activity date is the most important field for identifying dead deals.
Run it quarterly — CRM entropy is constant. Make this a recurring cleanup.
Share the before/after — Nothing gets a team to maintain CRM hygiene like seeing "$2.1M pipeline" become "$740K pipeline."
Start with deals — If you only have time for one export, deals give you the highest-impact cleanup.

Works With

HubSpot
Salesforce
Pipedrive
Zoho CRM
Close
Any CRM that exports to CSV

Troubleshooting

Claude Code struggles with very large exports Break contacts into batches of 10-20K rows. Run duplicate detection per batch, then cross-batch.

Merge logic isn't right for your business Add your rules: "When merging duplicates, always keep the record owned by an active rep. Prefer records with phone numbers over email-only."

Pipeline stages don't match your process Tell it your stage names and which ones are "active" vs. "closed": "Our active stages are: Discovery, Demo, Proposal, Negotiation. Everything else is closed."

Need the output in a specific format Ask for it: "Generate the cleaned data as a CSV I can re-import to HubSpot" or "Format the findings as a slide deck outline for my leadership meeting."