We spent nearly a year building a gen AI tool. These are the 5 (hard) lessons we learned

When generative artificial intelligence (gen AI) emerged prominently in late 2022, it quickly became the next big thing. McKinsey also got inspired, not only publishing extensively on the topic but also realizing that it could use gen AI to help bring together the company’s vast but separate knowledge sources. 



Say hello to “Lilli,” the gen AI tool launched in August 2023. Named after Lillian Dombrowski, the first professional woman hired by McKinsey in 1945, our goal was to aggregate McKinsey’s 40-plus sources and capabilities into a single platform. This would allow teams around the world to readily access McKinsey’s knowledge, generate insights, and thus help clients.  



That was the idea, and we got there. But it was far from easy. The effort was rife with technical hurdles. To give just one example, much of McKinsey’s knowledge is codified in PowerPoint presentations, which Lilli could not read well. At first, we could only parse roughly 15% of a PowerPoint document. We had to create our own tool to be able to read over 85% of any kind of document. With so many challenges and the need to work in a fundamentally new way, we described ourselves as riding the “struggle bus.”  



Technological change is inherently difficult. For any organization, incorporating gen AI will probably feature such difficulties. That said, it is possible to benefit from experience. Looking back on our 11-month effort, here are five principles we learned—sometimes the hard way. 



Define a shared aspiration



Before starting to build, it’s vital to define a clear vision of the organization’s goals. McKinsey sought to broaden insights and enhance productivity. We wanted our extensive resources, including documents, articles, and webinars, to be more accessible, and thus help our colleagues to generate unique insights. Many people were unaware of the available resources or didn’t know how to find them. Now, they simply ask Lilli, spending less time searching and more time solving client problems.  



Assemble a multidisciplinary team



Gen AI affects many aspects of an organization, so diverse expertise is vital. Initially, we focused on building a minimally viable product with a lean, technically oriented team. As Lilli evolved, the team expanded to include legal, adoption, communications, and subject matter specialists. Lilli had to be technically robust, responsible, secure, aligned with McKinsey’s values, and validated by our experts. We learned that defining workstreams alone wasn’t enough. It was also important to manage their interdependencies.  



Put the user first



Gen AI may be trendy, but trendy is not the point. The point is to improve how we work. To do so, we continuously gathered input from across the firm. Right away, people were excited about Lilli. Almost as fast, we heard that it wasn’t being updated fast enough or was confusing to use. We anchored every decision in the user needs revealed by this feedback, focusing relentlessly on solving real human and business problems. And we built a beta program of users who test new features before they roll out. 



Teach, learn, repeat



Lilli’s launch was a major milestone, but only the beginning. For lasting success, it needed to adapt to evolving needs. We embedded continuous learning loops into Lilli’s development, using data, analytics, and feedback to enhance the platform. The rapid pace of gen AI requires comfort with uncertainty and some inefficiency, like rewriting code repeatedly. Adoption and education are crucial, so we organized road shows, learning programs, lunch-and-learns, and office hours to demonstrate how Lilli could help. IT teams were trained to teach others, ensuring effective adoption. Significant impact is only possible by changing how tasks get done or reducing the time and cost of activities. 



Measure and manage



We had to rebuild our existing knowledge base for gen AI. Implementing Retrieval Augmented Generation Operations (RAGOps) was key. Metrics like answer quality help RAGOps continuously improve Lilli’s capabilities, measuring relevance, comprehensiveness, and correctness. Dashboards enable developers and content experts to monitor and refine Lilli’s performance. 



In less than a year, Lilli has become indispensable. Almost three-quarters of our colleagues are active users, and as of early May, it has processed more than three million prompts.  



It’s impossible to put a dollar figure on Lilli’s value, but we see a clear impact in two ways. First, Lilli is saving an average of up to 30% of consultants’ time by streamlining information gathering and synthesis: no more plowing through thousands of documents to find a killer slide. And second, the quality of their insights is substantially better.  



Our advice to other organizations: the struggle bus is waiting. Don’t hesitate to get on board. 

Top Articles