Text to Design Figma plugin might look like magic. But the recent improvements to the plugin are no accident.
They are a result of many months of experimentation with AI prompting, UI/UX, and multimodal LLMs.
Here are some interesting discoveries:
Increasing Information Density Per Token
The plugin works by using a LLM to generate HTML+CSS which then gets converted to all the different Figma element types you see in the Figma Layers panel.
The initial version of the plugin was simply generating raw CSS and HTML, but to increase the speed and quality of the output I had to figure out a way to pack as much styling data into the generated code as possible all while keeping the system prompts and output short, and hallucination free.
The solution was to prompt the LLM to work with higher-level styling abstractions using Tailwind and FontAwesome CSS classes for the vector icons. This resulted in better quality results, fewer hallucinations, and faster generation time, as the results had a higher information density per token.
A lot more can be done in terms of higher levels of abstraction in the future.
Below is the difference between the old and new approach:
Inlining the Design Preview
The biggest UX improvement since the initial release was the inline design preview.
In the initial plugin release, I had three different chat modes that a user had to switch between, one for general conversations but no design generation, the second just for design generation but no conversations and the third is just image gen. This resulted in a poor UX.
To solve this UX nightmare, I merged the chat and design modes into one. This allows the user to ask questions and make changes to the designs, within a single conversation context.
Here is a high-level breakdown of how the LLM chat responses are rendered:
Future improvements would be to merge the image generation mode into a single unified chat.
Multimodal Magic
The last challenge! How do I allow users to modify their existing Figma elements by chatting with an LLM?
The initial obvious solution is to take existing Figma elements and convert them to an HTML representation, passing the result to an LLM.
However, in November OpenAI launched a multimodal chatGPT model, which surprisingly yielded better results at a lower token cost. With this came the added benefit of being able to convert ANY visual input (existing designs, website screenshots, doodles, wireframes, mobile app screens, etc) to HTML and ultimately Figma elements.
Win-Win.
Future improvements would be to use the existing Figma design system from the user's project (components and styles) to generate designs that when imported use existing components and not just copies.
Conclusion
If you are building a SaaS that uses LLMs, be prepared to stay laser-focused on a single problem for days or even weeks, while testing novel solutions and researching the latest developments, and your results will look like magic!
Happy building,
Ollie 🍻