Ludicrous Speed!

Ludicrous Speed!

One very useful feature I’ve been meaning to support for the AI Horde for a while has been request batching. Request batching is the function to generate multiple Stable Diffusion images in parallel, by using internal mechanism to the ML libraries, instead of splitting them into multiple processes. Due to the re-using common parts of the request, it allows the GPU to generate each extra image with just 20% slowdown, instead of 100%, so long as you stay within your GPU’s power.

Soon after we finished adding LCM support, I turned my view to making this a possibility, as between these two features, it could massively increase the overall speed at which the AI Horde completes requests. The only problem is the overall complexity to handling this in the Inference.

Today I’m proud to announce that the AI Horde natively supports smartly batching multiple images in the same request when possible which can result in massive improvements in overall speed! Read on for more details of how we achieved it.

By relying on ComfyUI, the most difficult part was done, and earlier work done by Tazlin and Jug had already prepared the ground to use our hordelib library to handle sending such batched requests to the comfy engine, but I still had a lot of work to do to not only allow the AI Horde accept and queue such loads properly, but also for the worker to be able to understand payloads for multiple images.

Fortunately due to the new setup of the reGen worker, being able to adjust it to accept one job for multiple images and then submit multiple image results at the end was easier than I expected. Of course doing multiple image submissions was the hardest part and I had to basically refactor that whole area of the code.

The AI Horde queuing part was not as code intensive. Making a worker pick up multiple requests when possible was not particularly hard, but not giving the worker more than it can “chew through” was. You see your worker might be able to do 1 image at 2048×2048, and it might be able to do 20 images at 512×512. However give it 20x2048x2048 and it will fall down and die! So this required a bit of fancy footwork. The way I solved this is that the worker declares how many batched images they can do along with its max resolution. The horde then assumes that the worker can safely achieve their max batching at 1/3rd of their max resolution. After this part, as the requested resolution of a job increases, the horde will smartly reduce the amount of batches from a job it will give that worker.

Practically this means that when I declare I can do 20 batches and my max resolution for one image is 2048×2048, then I will pick my full 20 images at 512×512 but will only pick 7 images at my full resolution.

Therefore the AI Horde will continue smartly slicing a request for multiple images into a number of jobs. Only this time instead of each job being 1 image, it can be multiple. Effectively this means that the horde is able to way more efficiently utilize the maximum processing power of each worker and therefore the overall performance improves!

There were a few hiccups along this development as well. For one I realized that the hordelib code did not handle batching for img2img requests at all, so I had to pull up my sleeves and jump into the way hordelib translates requests to comfyUI nodes and figure it all out. It took me a while but now that I understand this better, it will make it easier for me to add even more fancy additions to our comfy workflows!

Another somewhat important problem is that the seed returned by batched requests in comfy is not accurate. The explanation of this is a bit too technical, but at the end of the day, there is extra variable when trying to replicate an image generated via a batch, on top of the generation seed. Currently the horde will return the relevant “batch_id” in the generation metadata, which I hope in the future to use so I can add a way to replicate images from batched requests as well.

For now, if you need to ensure you can always replicate your images via a seed, the best way to do it is to request them using the new disable_batching keyword on your request. Setting this to true will make your request always split to 1 image per job, which is the way the horde used to work until now. However since disable_batching is significantly less optimal than batching, it is only available to trusted users (i.e. those who’ve been running workers for a while) and patreon supporters.

Of course you can continue manually splitting your requests to 1 image per request, but that already has increased kudos costs, and in the future this might get disincentivized further for the health of the AI Horde.

Between batching and LCM proliferation, we’re already starting to see significantly improved generation times on the AI Horde. To the point that with enough priority, you can receive 20x1024x1024 images in less than a minute! A small problem is that currently one of our most popular frontends, Artbot, defaults to manually splitting each request to 1 per image. Nevertheless, Its developer Rockbandit is already hard at work making their requests batching-compatible and once that happens, I expect the overall speed with massively improve!