ComfyUI Meetup Presentation

ComfyUI x Zerospace

This was a talk I gave at the September 2025 ComfyUI meetup at Zerospace - The talk is about presenting our latest findings fusing 3D game engines with video generations to create controllable, consistent 3D animation. Xuelong is the Technical Director at ZeroSpace, and Derek Chan is a Technical Artist at ZeroSpace.

Role // Tech Artist, Gen-AI RnD

Tools // ComfyUI, Unreal Engine, Musubi-Tuner, Nano Banana Pro, Wan22, AI-Toolkit

Year // 2025

Project Breakdown

I know not everyone has time to sit through a 30 minute presentation, so here’s a quick TLDR. This talk was given during the early peak of gen AI adoption in production, and a lot of new techniques have come out since then. But for me it was a chance to recap what I’d learned and it reaffirmed something I keep coming back to: there’s still a ton of untapped potential in AI controllability when you bring traditional 3D engines into the picture. Instead of trying to break away from what already exists, we should be building on top of it.

The thesis was about maintaining control and flexibility within a gen AI workflow, especially on a creative team. So much of what’s on the market treats generation like a slot machine. You put something in, you get what you get. We’re moving toward more controllable toolsets, but there’s already a long history of great 3D tools, game engines, and technical art practices that can meet gen AI halfway. This talk was my proof of concept for that coalescence.

Reference Fine-Tuning

First up was reference fine-tuning with Nano Banana and ComfyUI. Our team had generated a set of reference images for a knight character, and the goal was building a LoRA that locked in our specific style and design language.

With most LoRA workflows, you’re at the mercy of your initial generations. If the references are 90% there but the helmet details are off, you’re stuck regenerating and hoping for the best. Nano Banana gave me the ability to surgically chain edits to specific design notes using client image local editing. I could target exactly what needed fixing without touching the parts that already worked. The result was a tighter LoRA that actually matched what the team intended.

UE Rendering Pipeline

Second was a control pass render system built in Unreal. Every DCC has somebody making a similar tool at this point, but my background is in Unreal and I already had experience automating sequencer exports. So this felt like a natural extension.

The idea was to provide true ground truth data from the 3D engine, not inferred data from depth estimation or OpenPose. Camera-based passes, real geometry, real lighting. The difference matters. When you feed control data into a diffusion model, the quality of that input directly determines how much control you have over the output. Inferred depth from a 2D image is a guess. A Z-buffer from Unreal is the truth.

I built a small tool for this, basically a V1 for a plugin I’d eventually love to publish. It handles sequencer automation and pass management so you can export a full set of control passes without manually setting up each render. It slots into an existing Unreal pipeline without asking anyone to learn new software.

Vace is awesome

The last section was about convergence. At the end of the day, we’re still trying to use these tools to make real content and ship real work.

I spent some time on Alibaba’s wan vace model, which I think struck gold with its design. The key insight is that vace uses masking in a way no other model really had at that point, creating a dual stream approach to video conditioning. You can take your ground truth Unreal passes alongside your refined LoRA inputs and produce stylized, controllable output.

The whole point is that it still works within a team. It still respects the CG house principles we should all be following. The pipeline doesn’t get replaced. It just gets better tools.

Tags: