AI can now set (almost) perfect typography

Before we get into our weekly roundup of visual generative AI gluttony, I must confess that my favorite visual story this week has been the 4-year-long quest of an honest craftsman (Jony Ive) who decided to embark on the seemingly impossible quest to create the perfect typeface—one that already had been 300 years in the making.



It felt like a perfect antidote to the news that an AI that can, for the first time in history, do actual typesetting. Not at the caliber of Ive and his design crew, but impressively nonetheless. There’s also a new technology that makes instantly perfect AI-generated product shots, one that can freeze time and space better than the bullet time technology from the Matrix, and a micro-short film that reminds us that our imagination can shape the nonsensical into the stuff of dreams.



Deep Floyd: perfect typesetting and spatial design



Stability.ai is releasing things faster than these AI recaps. Right after last week’s post, it announced Deep Floyd, a generative AI capable of setting perfect typography in any setting made of any material under any conditions imaginable. Most other generative AI platforms render text as a garbled mess, which makes this release big news. It’s the first time scientists have achieved not only legible text but beautifully made text, as well (it’s not Jony Ive’s typeface, no, but it’s solid).



[Images: Stability AI]



Even more surprising is the fact that the typography respects the spatial relationship with the object boundaries in the scene where the text appears. A graffiti will look nice on a wall, as if it were made by a human with taste and skill. A neon sign will look professionally made. And a traffic sign will look, well, like a traffic sign should look in real life.



Perfusion: make the perfect product shots



Nvidia’s fascinating new technology—presented this week at Siggraph 2023—will eventually impact all creative and marketing industries, just as Chicxulub did with the dinosaurs. Called Perfusion, this text-to-image technology will grab an image of a real object and seamlessly integrate it into a synthetic image. While current generative AI reconstruction technologies deform the original subject, often beyond recognition, the amazing thing about Perfusion is that it maintains the core identity of an object, even while it modifies attributes like its clothes.



[Image: Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon]



You can clearly see in the example above: A teddy bear changes its outfit but maintains its integrity as a bear. The researchers say the model uses a novel mechanism called “key-locking perfusion” to accomplish this wizardry. Expect it to appear soon in your favorite image generator to create anything from the perfect product shot to your perfect selfie “taken” on the perfect beach that never existed in the vacation you never took.



AI that captures time and space: HOSNeRF



In the totally-blew-up-my-mind category this week we have HOSNeRF. We have spoken about NeRF technology before, the seemingly magical AI networks that build fully navigable 3D scenes with just a few shots or a video. But this is next level because it doesn’t only capture space but time. Think Matrix-y bullet time but better and at any point during your video:







This new technology enables pausing a video at any frame and re-rendering all scene details—humans that move, objects, and of course the backgrounds—from any point of view. For now it is just a research paper but, given the pace of development, we will probably get this tomorrow.



Today’s eye candy: Thank you for not answering



LA-based filmmaker Paul Trillo has made this trippy trip of a short film called Thank You For Not Answering about a man who leaves a voicemail for someone from his past, recalling fragments of his fading memory and things that could have been. He tells me its genesis over email: “I was looking to take advantage of the aesthetic limitations of the AI. The surreal and often uncanny nature of Gen-2 would actually be difficult to recreate with cameras or traditional animation.”









He claims it is the strangest experience he has ever had, like “shaking a Magic 8 Ball until you get the answer you’re looking for.”  He first created generated images to make a storyboard in the open-source version of Stable Diffusion called Automatic1111. “I would write a bit of the story and ideas for visuals, generate some images, and then feed that text and imagery into Gen-2,” he says, resulting in 100 images and 400 videos that he edited down to 55 clips in the final cut. “While the AI makes a lot of chaotic choices, it’s ultimately up to you to decide what kind of stories you want to tell.”