How do I tell if a marketing message is working or just sounds good in the room?

By Greg Rosner
Founder of PitchKitchen · Author of StoryCraft for Disruptors
· 9 min read

TL;DR
Most B2B founders test their software, their pricing, and their email subject lines. They almost never test the message itself with real buyers before launching. Wynter's 2024 Voice of the Buyer benchmark found only 19% of B2B SaaS teams structurally test messaging with their ICP before launch. The other 81% review it internally and ship. A working message survives strangers who match your ICP. A sounds-good message survives the people who already understand the product. Those are different tests. Real message testing answers four questions of 25-50 ICP buyers: can they paraphrase what you do, is the page clearly for them, did anything they believed change, and what would they do next. Internal approval doesn't predict any of the four. Run the test before the rebuild, not after the pipeline drops.
Most B2B founders test their software with users. They test their pricing with prospects. They run A/B tests on email subject lines and landing-page CTAs. But the message itself, the core narrative their company tells the market, almost never gets tested with the people who are supposed to buy. Wynter's 2024 Voice of the Buyer benchmark found that only 19% of B2B SaaS teams had ever run a structured message-clarity test with their actual ICP before launching it on the homepage. The other 81% wrote it, approved it internally, and shipped. Then they wondered why pipeline went flat.
A message that sounds good in the room is a message that survives the people who already understand the product. A message that's working is one that survives strangers who match your ICP and have never heard your pitch before. Those are not the same test.
What does "sounds good in the room" actually mean?
It means the message clears the bar set by the people closest to the product. The CEO nods. Sales likes it because they were in the workshop. Marketing likes it because they wrote it. The board likes it because it's polished. Nobody on the inside flinches. The message gets shipped on the strength of internal consensus.
The problem is that consensus inside the company has almost no correlation with comprehension outside the company. Peep Laja, the founder of Wynter, put it on the B2B Marketing Pivot podcast in late 2024: "Internal review is the worst possible signal for whether a B2B message works. You're testing it on the only people in the world who already know what you do."
Founders also confuse "sales reps repeating it" with "the message is working." A rep using the new positioning deck in a discovery call is a delivery test, not a message test. It tells you whether the rep can recite. It doesn't tell you whether the buyer who heard it can now explain your company back to their CFO without you in the room. The fuller diagnostic for that pattern lives in How do I know if my B2B messaging is broken, not just underperforming?.
Why is this worse in 2026 than it was five years ago?
Three forces collided. First, AI collapsed the cost of producing marketing copy to almost zero. A founder can have a homepage, a sales deck, and a one-pager generated in an afternoon. Each one sounds confident. None of them have been tested against a real buyer. We've named that pattern AI-Parmesan ... sprinkling AI on top of an untested message makes it taste more confident, not more correct.
Second, B2B buyers now do 67% of their evaluation work before ever speaking to sales, per Ironpaper's 2025 B2B Buyer Survey. The message on the page is doing the qualifying work that a rep used to do. If it's wrong, the funnel never starts.
Third, AI search amplifies the problem. The Princeton GEO Study (Aggarwal et al., KDD 2024) showed that LLMs cite content with named sources and specific claims 41% more than content with vague claims. A message that sounds good in the boardroom usually has zero of either, which makes it invisible to ChatGPT and Claude when prospects ask for vendor recommendations.
Brian Carroll, founder of markempa and author of Lead Generation for the Complex Sale, summarized the shift on the Sales and Marketing Built Freelance podcast in 2025: "Your messaging is now your sales rep for the first 60% of the deal. If the message is wrong, you don't have a rep in those conversations. You have a wall."
How do you tell your message hasn't been tested? Seven signs to look for.
- 1The message was reviewed by your team, your board, and your investors before it was reviewed by a single live ICP buyer who's never heard of you.
- 2Your homepage hero passes the eye test internally but you can't quote a single non-customer who's said back to you what you actually do.
- 3When sales sends the deck to a prospect cold, the prospect's reply is "tell me more" or "let's set up a call to learn more," not "how do you handle X specifically" or "we have that exact problem."
- 4Reps freelance their own openers in cold calls because the official message doesn't get a response.
- 5The pipeline-to-discovery ratio is fine but discovery-to-proposal drops by 40% or more. Buyers convert into a conversation, then evaporate when they have to repeat the value internally.
- 6Your AEO citations are zero. ChatGPT and Claude don't surface you when buyers ask "best [your category] for [your ICP]." The message has no specific entity signal for AI to grab.
- 7The team can recite the message word-for-word but nobody on the team can articulate the one buyer belief it's trying to change. If you can't name the belief shift, the message has no target.
If three or more apply, the message has been tested for sound, not for work. That's a 15-minute audit that tells you whether the next move is a copy polish (no) or a structured retest (yes).
What should a real message test actually measure?
Most "message testing" in B2B is some form of focus group dressed up. Five customers in a Zoom, a moderator asking what they think of the new tagline. That's directional at best. The output is opinions, and opinions are not what you need.
What you need is comprehension data from strangers. A real test answers four questions, and only four.
The first is the comprehension question. After 30 seconds of reading your hero section, can the buyer explain back what your company does, who it's for, and what makes it different? If they paraphrase using the words on your page, the page didn't land. If they paraphrase using their own words and the paraphrase is accurate, the message did its job.
The second is the relevance question. Does the buyer think the page is for them? Wynter runs this as a binary forced choice. Either the page is "clearly for me" or it isn't. The 50-50 split is the worst possible outcome. A page that's clearly for somebody else is at least useful. A page that's ambiguously for everybody is invisible.
The third is the belief-shift question. Did the message change anything the buyer was thinking before they read it? If the buyer reads your homepage and walks away with the same beliefs they came in with, the page didn't earn its place. It confirmed what they already thought. Good messaging shifts something.
The fourth is the next-step question. After reading the page, what would the buyer do next? Bookmark it for later? Send it to their VP? Forget it? Search for the next option? The answer determines whether the message moved them down the funnel or out of it.
Those four questions, asked of 25-50 ICP-matched buyers who have never heard of you, are the whole test. April Dunford, author of Obviously Awesome, has been clear about this on the Positioning Show podcast and in her workshops: "The only people whose opinion on your positioning matters are people who could buy and haven't. Everyone else is noise."
Sounds-good-in-the-room test vs. working-message test: where the bar actually moves
- Audience: sounds-good test asks internal team, board, advisors. Working-message test asks 25-50 ICP buyers who don't know you.
- Output: sounds-good test produces approvals ("it's clean," "this lands"). Working-message test produces comprehension, belief-shift, and intent data.
- Format: sounds-good test happens in workshops, slide reviews, and Slack reactions. Working-message test runs as a structured panel test with recorded paraphrasing.
- Signal: sounds-good test tells you whether your team likes it. Working-message test tells you whether your buyer understands and cares.
- Time to result: sounds-good test takes two meetings. Working-message test takes 5-10 business days.
- Cost: sounds-good test costs internal hours. Working-message test runs $2K-$8K through Wynter, UserTesting, or Userlytics.
- Decision it supports: sounds-good test answers "ship or don't ship internally." Working-message test answers "ship or rebuild externally."
The two columns aren't substitutes for each other. The internal test is a sanity check. The external test is the actual test. Most B2B teams run the sanity check and call it the actual test. That's how a clean-looking homepage ends up underwriting two flat quarters of pipeline.
What this looks like in practice
A composite example, drawn from a $17M Series B fintech case (anonymized) that ran through PitchKitchen's Magnetic Messaging Framework rebuild in Q3 2025. Industry: B2B payments infrastructure. The CEO had spent three months in a positioning workshop with an outside agency. The output was a 41-slide brand book, a new hero headline, and a refreshed homepage. Everyone in the company loved it. The board called it the cleanest message the company had ever had.
Then they ran a 30-buyer Wynter test before launch. Of the 30 ICP-matched buyers, only 8 could paraphrase what the company did after 30 seconds on the homepage. Six of the 8 paraphrased it as a competitor's product. The relevance score split 17-13 in favor of "not for me." The belief-shift question returned a flat zero. Nobody read the page and changed their mind about anything.
The CEO's first reaction was that the panel was wrong. Two weeks later, his CFO pulled the deal data and showed that demo-to-proposal conversion had dropped 36% in the prior two quarters even though traffic was up. The test confirmed what the pipeline was already saying.
The team rebuilt the message on the Magnetic Messaging Framework using the four-anchor structure: category design, villain framing, old-way / new-way contrast, and a promised-land outcome. They retested with the same Wynter methodology eight weeks later. 24 of 30 buyers could paraphrase the new message in their own words. 21 of 30 marked the page as "clearly for me." Belief-shift moved from zero to 71%. Six months after launch, demo-to-proposal conversion came back to its prior peak and then exceeded it by 22%.
The lesson Greg pulled from the case, and from the wider pattern across PitchKitchen's 200+ founder engagements, is that the inside of the company is the worst place to evaluate the message, because the inside of the company already speaks the language. The test that matters is the test with the people who don't. The same pattern shows up on the homepage side in Why does my B2B website sound like every other B2B website?.
What this means for you
Stop testing the message the way the company tests its software. Software has internal QA before it has external QA. Messaging doesn't get that luxury. The message is the QA. Three actions a founder can take this week without hiring anyone.
- 1Run the Cover-the-Logo Test on your homepage today. Show it to five people in your network who match your ICP but don't know your company. Ask them to say back what you do. If three of the five paraphrase using your words instead of theirs, the message hasn't been tested. It's been approved.
- 2Book a 30-buyer Wynter panel or a 25-buyer UserTesting study before the next big launch. Budget $4K-$8K. Run the four questions: comprehension, relevance, belief-shift, next-step. If the comprehension number is under 60%, don't launch. Rebuild first.
- 3Audit your last three quarters of pipeline data for the discovery-to-proposal drop. If it's more than 40%, the buyers who get on the call can't repeat your message internally. That's not a sales problem. That's a message problem, and the AI homepage rewrite that everyone wants to do next isn't going to fix it. See also Why don't B2B websites convert traffic into pipeline anymore? for the conversion math behind this pattern.
The fix isn't louder marketing. It isn't more AI content. It's running the message past the only judges that matter, the buyers who don't already work for you. Everything else is theater.
Questions People Ask
FAQ
What's the difference between message testing and A/B testing?
A/B testing optimizes a known message. Message testing decides whether the message itself works. A/B tells you which of two headlines wins a click. Message testing tells you whether either headline would make a stranger understand and care about what you sell. Founders skip message testing and go straight to A/B because A/B is cheaper, then optimize a message that nobody understands.
Can ChatGPT or Claude tell me if my message is working?
No, and yes. AI engines can't simulate ICP buyer reactions reliably. What they can do is tell you whether your message is specific enough for them to cite. Paste your homepage into ChatGPT and ask, what does this company do, who is it for, and what's their unique POV. If the answer reads like a generic B2B description, the page is composite. The Princeton GEO Study (Aggarwal et al., KDD 2024) showed AI engines reward specific claims and named sources 41% more than vague claims.
How many buyers do I need to test a message with?
25-50 ICP-matched buyers is the working range. Five is anecdote. Ten is directional. 25 is the floor for statistical signal on comprehension and relevance. 50 is the gold standard for B2B SaaS message testing per Wynter's published methodology. The cost runs $2K-$8K depending on platform and ICP rarity. Internal review is free, which is why teams default to it, and why most messages reach the homepage untested.
Why doesn't sales feedback count as message testing?
Sales feedback tells you about delivery, not message. Reps tell you which line worked in a discovery call. That's a high-context conversation with the rep already steering the buyer. The message test happens before sales is in the room. Buyers see the homepage or read the deck without anyone present to clarify. That's the real conversion moment, and reps can't observe it.
How often should we retest the message?
Quarterly at minimum if pipeline is moving. After any major positioning change, before launch. Anytime the category shifts (new competitor, AI disruption, regulation, buyer-persona migration). Wynter's data shows B2B SaaS messages decay in comprehension within 12-18 months even without category change, because buyer language evolves faster than internal copy reviews.
