Enterprise AI Analysis: Deconstructing Lexical Diversity in LLMs vs. Humans
An OwnYourAI.com breakdown of "Do LLMs produce texts with 'human-like' lexical diversity?" by Kendro, Maloney & Jarvis
Executive Summary: The Authenticity Gap in AI-Generated Content
As enterprises increasingly adopt Large Language Models (LLMs) for content creation, a critical question emerges: does this AI-generated text truly resonate like human writing? Groundbreaking research by Kelly Kendro, Jeffrey Maloney, and Scott Jarvis provides a definitive, data-driven answer. Their study meticulously compares texts from four different ChatGPT models against a diverse group of 240 human writers, analyzing six core dimensions of lexical diversitythe variety and richness of vocabulary.
The core finding is both simple and profound: LLMs do not write like humans. In fact, they exhibit significantly higher levels of lexical diversity, a phenomenon we term "hyper-diversity." This makes their text measurably distinct and, paradoxically, less authentic. The study reveals that newer, more advanced LLMs are even *less* human-like in this regard. Furthermore, machine learning models can distinguish between human and AI-authored text with over 97% accuracy based on these lexical patterns alone.
For businesses, this "authenticity gap" poses a significant risk. Content that feels unnatural or "robotic" can damage brand credibility, reduce customer engagement, and ultimately impact ROI. This analysis from OwnYourAI.com unpacks the study's findings, translates them into actionable enterprise strategies, and demonstrates how custom AI solutions are essential to bridge this gap, ensuring your AI speaks with a voice that is both powerful and genuinely human.
Decoding Lexical Diversity: The Six Dimensions of Written Authenticity
The study's power lies in its multi-faceted approach to measuring what makes writing feel diverse. It moves beyond simple word counts to capture the subtle statistical fingerprints of language. Understanding these six dimensions is key to diagnosing and tuning AI-generated content for enterprise use.
The Core Finding: A Quantifiable "Hyper-Diversity" in LLMs
The most striking result from the research is the clear, statistically significant gap between LLM and human writing across all six lexical dimensions. LLMs consistently produced text that was longer, used more unique words, and repeated words in patterns measurably different from the 240 human participants. This isn't a subtle difference; it's a fundamental architectural signature of current LLMs.
The chart below visualizes one of the most telling metrics: Dispersion. This measures how far apart repeated words are. Humans tend to reuse words in clusters, while LLMs spread them out more evenly. The study uses an inverted scale, so a lower score for LLMs indicates higher, less-human-like dispersion.
Metric Deep Dive: Dispersion (Lower is Less Human-like)
This "hyper-diverse" and overly-even distribution of words is a key reason why LLM text can feel unnaturally polished or lack a personal rhythm. It avoids repetition in a way that humans, even skilled writers, do not. For an enterprise, this means that out-of-the-box LLM content may lack the natural cadence that builds trust and rapport with an audience.
Not All AIs Are Equal: Newer Models Are Drifting Further from the Human Baseline
Counter-intuitively, the study found that as LLMs become more powerful, their lexical patterns diverge even further from human norms. The newest models analyzed (GPT-4.5 and o4-mini) exhibited the highest levels of lexical diversity, making them the *least* human-like. This creates a critical "Model Selection Dilemma" for enterprises.
The chart below shows a clear trend across four ChatGPT versions for the MATTR score (variety-repetition). A higher score means more unique words are being introduced more rapidlya trend moving away from the stable human baseline.
Trend Analysis: Variety-Repetition (MATTR) Across GPT Models
Enterprise Strategy: Simply adopting the latest, most powerful LLM is not a guaranteed path to success. The optimal model depends on the specific use case. For generating creative ideas, high diversity might be a benefit. For writing empathetic customer service responses or authentic marketing copy, an older, or more carefully tuned, model may produce superior, more human-like results. This is where expert guidance is crucial to align model choice with business goals.
The Power of Detection: Why Generic AI Poses a Business Risk
The research used a Support Vector Machine (SVM), a type of machine learning classifier, to see if it could distinguish between the texts. The results are a wake-up call for any business using generic AI tools.
The SVM model could:
- Identify LLM vs. Human text with 97.2% accuracy. This means generic AI content is easily detectable, not just by other algorithms (like spam filters or search engine quality raters) but potentially by discerning customers.
- Distinguish between different LLM models with 62.5% accuracy. Each model has its own detectable lexical fingerprint.
- Fail to distinguish between human writers. The model's accuracy in identifying a writer's L1/L2 status or education level was at or below chance. This reinforces the idea of a stable "human baseline" that AI currently fails to replicate.
The Business Value Proposition: If your content is algorithmically identifiable as "AI-generated," you risk being penalized by platforms and perceived as inauthentic by your audience. The goal is not to trick users, but to communicate in a voice that is consistent with your brand. OwnYourAI.com specializes in developing custom AI solutions that are tuned to your specific human baseline, creating content that is not just effective, but authentic.
Book a Meeting to Develop Your Authentic AI VoiceStrategic Roadmap: An Enterprise Framework for Authentic AI Content
Moving from generic AI to a custom, authentic AI voice requires a strategic approach. Based on the insights from this research, we've developed a 5-step framework to guide enterprises.
Interactive ROI Calculator: The Cost of Inauthenticity
Generic AI might seem cheaper upfront, but what is the hidden cost of lower engagement and brand damage? Use this calculator to estimate the potential value of a custom-tuned AI solution that prioritizes authenticity and performance.
Test Your Knowledge: Nano-Learning Quiz
Reinforce your understanding of the key takeaways from this analysis with a short quiz.
Bridge the Authenticity Gap with OwnYourAI.com
The research is clear: off-the-shelf LLMs produce text that is quantifiably different from human writing. To build a brand that connects, you need an AI that speaks your language. Our team of experts uses data-driven insights like those in this paper to build, tune, and deploy custom AI solutions that deliver real business value and true authenticity.
Schedule a Free Consultation to Customize Your AI