Friday, April 24, 2026
Independent Technology Journalism  ·  Est. 2026
Artificial Intelligence

Generative AI at Work: What Actually Delivers in 2026

The Spreadsheet That Wrote Itself—And Why That's Only the Beginning Earlier this year, a mid-sized logistics firm in Rotterdam watched its finance team cut monthly close from eleven days to...

Generative AI at Work: What Actually Delivers in 2026

The Spreadsheet That Wrote Itself—And Why That's Only the Beginning

Earlier this year, a mid-sized logistics firm in Rotterdam watched its finance team cut monthly close from eleven days to four. The tool doing the heavy lifting wasn't some bespoke enterprise platform—it was Microsoft 365 Copilot, running on top of GPT-4o, pulling from SharePoint and reconciling ledger entries against live ERP data. That's not a marketing slide. That's a use case their CFO described in a public earnings call in Q2 2026. It caught our attention because it's the kind of specific, boring, operational win that tends to get lost beneath flashier AI demos.

The generative AI productivity space has matured considerably since the chaotic product launches of 2023 and 2024. We're past the phase where "AI assistant" meant a chat window bolted onto existing software. The tooling has gotten genuinely sophisticated—and genuinely complicated to evaluate. We spent several weeks talking to practitioners, reviewing benchmark data, and testing integrations across enterprise stacks to figure out what's actually working, what's oversold, and what the real cost looks like when you get past the free tier.

The Actual State of Enterprise AI Adoption in Late 2026

The numbers are striking, if you read them carefully. According to Gartner's Q3 2026 enterprise survey, 61% of organizations with more than 1,000 employees now have at least one generative AI tool deployed in a production workflow—up from 29% in the same survey two years prior. But here's the number that matters more: only 34% of those deployments had cleared a formal ROI review at the 12-month mark. Adoption is fast. Justification is harder.

OpenAI's enterprise tier for ChatGPT crossed $2.1 billion in annualized revenue as of September 2026, per figures reported by The Information. Microsoft, which embedded Copilot across its M365 suite, has sold Copilot licenses to over 85,000 organizations. But license sales don't tell you whether people are using the tools well. They often aren't.

"Most enterprises are still in what I'd call the 'tourist phase,'" said Dr. Priya Venkataraman, director of AI systems research at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). "They've deployed something, their employees have tried it a few times, and now they're waiting for someone to tell them what to do next. The organizations getting real value are the ones that redesigned the workflow first—and bolted the AI on second."

"The organizations getting real value are the ones that redesigned the workflow first—and bolted the AI on second." — Dr. Priya Venkataraman, CSAIL

Which Tools Are Actually Winning—and at What Tasks

We compared the four most widely deployed generative AI productivity platforms across enterprise accounts this fall. The differences are significant, and they matter depending on your use case.

Platform Underlying Model Best-fit Use Case Context Window Avg. Enterprise Seat Cost (Annual)
Microsoft 365 Copilot GPT-4o (fine-tuned) Document generation, email triage, Excel analysis 128K tokens $360/user
Google Workspace Duet AI Gemini 1.5 Pro Meeting summarization, Docs drafting, Sheets formulas 1M tokens $264/user
Anthropic Claude for Work Claude 3.7 Sonnet Long-document analysis, policy review, code review 200K tokens $300/user
Notion AI (Enterprise) Mix (GPT-4o + proprietary) Knowledge base management, project summaries 32K tokens $192/user

Context window size isn't just a spec-sheet number. For legal teams reviewing contracts or compliance officers auditing policy documents, the ability to pass an entire 300-page document into a single prompt—which Gemini 1.5 Pro genuinely supports—changes what's possible. Marcus Webb, VP of enterprise architecture at Deloitte's AI practice, told us his team has moved several legal review workflows entirely to Claude 3.7 Sonnet because of its handling of long-form reasoning chains. "It doesn't lose the thread," he said. "Earlier models would contradict themselves between page one and page forty of a brief. This one mostly doesn't."

Where Developers Are Finding Real Gains (and Real Friction)

For engineering teams, the conversation has shifted from "should we use AI for code?" to "how do we keep it from making things worse?" GitHub Copilot, now on its fourth major iteration, integrated with VS Code and JetBrains IDEs, reports that developers using it merge pull requests roughly 26% faster on benchmarks involving boilerplate-heavy tasks. That number drops significantly for complex refactors or security-sensitive code paths—and that's where things get interesting.

Dr. James Okafor, senior security researcher at Carnegie Mellon's CyLab, has been tracking AI-generated code vulnerabilities since 2024. His team found that in a controlled study of 4,000 code completions generated by popular AI tools, roughly 18% introduced at least one weakness mappable to the CWE Top 25 list—Common Weakness Enumeration, the industry-standard catalog of dangerous software flaws. "The model doesn't know it's writing security-critical code unless you tell it explicitly," Okafor said. "And even then, it'll sometimes optimize for what looks correct rather than what is correct."

Keep reading
More from Verodate