Video Game Blurs (and how the best one works)

[ad_1]

Blurs are the basic building block for many video game post-processing effects and essential for sleek and modern GUIs. Video game Depth of Field and Bloom or frosted panels in modern user interfaces – used subtly or obviously – they’re everywhere. Even your browser can do it, just tap this sentence!

Texture coordinates, also called UV Coordinates or UVs for short
Effect of “Bloom”, one of many use-cases for blur algorithms

Conceptually, “Make thing go blurry” is easy, boiling down to some form of “average colors in radius”. Doing so in realtime however, took many a graphics programmer through decades upon decades of research and experimentation, across computer science and maths. In this article, we’ll follow their footsteps.

A graphics programming time travel, if you will.

cyber

Using the GPU in the device you are reading this article on, and the WebGL capability of your browser, we’ll implement realtime blurring techniques and retrace the trade-offs graphics programmers had to make in order to marry two, sometimes opposing, worlds: Mathematical theory and Technological reality.

This is my submission to this year’s Summer of Math Exposition

SOMELogo

With many interactive visualizations to guide us, we’ll journey through a bunch of blurs, make a detour through frequency space manipulations, torture your graphics processor to measure performance, before finally arriving at an algorithm with years worth of cumulative graphics programmer sweat – The ✨ Dual Kawase Blur 🌟

Table of Contents

Setup – No blur yet #

In the context of video game post-processing, a 3D scene is drawn, also called rendering, and saved to an intermediary image – a framebuffer. In turn, this framebuffer is processed to achieve various effects. Since this processing happens after a 3D scene is rendered, it’s called post-processing. All that, many times a second.

Depending on technique, framebuffers can hold non-image data and post-processing effects like Color-correction or Tone-mapping don’t even require intermediate framebuffers: There’s more than one way (@35:20)

detective

This is where we jump in: with a framebuffer in hand, after the 3D scene was drawn. We’ll use a scene from a mod called NEOTOKYO°. Each time we’ll implement a blur, there will be a box, a canvas instructed with WebGL 1.0, rendering at native resolution of your device. Each box has controls and relevant parts of its code below.

No coding or graphics programming knowledge required to follow along. But also no curtains! You can always see how we talk with your GPU. Terms and meanings will be explained, once it’s relevant.

speak

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.

Blur Fragment Shader noBlurYet.fs



precision highp float;


varying vec2 uv;


uniform float lightBrightness;


uniform sampler2D texture;


void main() {
	
	gl_FragColor = texture2D(texture, uv) * lightBrightness;
}
WebGL Javascript simple.js
import * as util from '../utility.js'

export async function setupSimple() {
	
	const WebGLBox = document.getElementById('WebGLBox-Simple');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	
	const ctx = {
		
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null },
		
		fb: { scene: null, final: null },
		
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, lightBrightness: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		}
	};

	
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const noBlurYetFrag = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/noBlurYet.fs");

	
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	
	function reCompileBlurShader() {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, noBlurYetFrag, ["lightBrightness"]);
	}

	
	reCompileBlurShader()

	
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		
		gl.useProgram(ctx.shd.blur.handle);
		const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
		gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.lightBrightness, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		if (ctx.mode == "bloom") {
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

We don’t have a blur implemented yet, not much happening. Above the box you have an Animate button, which will move the scene around to tease out problems of upcoming algorithms. Movement happens before our blur will be applied, akin to the player character moving. To see our blur in different use-cases, there are 3 modes:

Different blur algorithms behave differently based on use-case. Some are very performance efficient, but break under movement. Some reveal their flaws with small, high contrast regions like far-away lights

teach

  • In Scene mode the blur will be applied across the whole image
  • In Lights mode we see and blur just the Emission parts of the scene, sometimes called “Self-Illumination”
    • This also unlocks the lightBrightness slider, where you can boost the energy output of the lights
  • In Bloom mode, we use the original scene and add the blurred lights from the previous mode on top to create a moody scene. This implements the effect of Bloom, an important use-case for blurs in real-time 3D graphics
Adding the blurred emission pass as we do in this article, or thresholding the scene and blurring that, is not actually how modern video games do bloom. We’ll get into that a bit later.

detective

Finally, you see Resolution of the canvas and Frames per Second / time taken per frame, aka “frametime”. A very important piece of the puzzle is performance, which will become more and more important as the article continues and the mother of invention behind our story.

Frame-rate will be capped at your screen’s refresh rate, most likely 60 fps / 16.6 ms. We’ll get into proper benchmarking as our hero descents this article into blurry madness

book

Technical breakdown #

Understanding the GPU code is not necessary to follow this article, but if you do choose to peek behind the curtain, here is what you need to know

teach

We’ll implement our blurs as a fragment shader written in GLSL. In a nut-shell, a fragment shader is code that runs on the GPU for every output-pixel, in-parallel. Image inputs in shaders are called Textures. These textures have coordinates, often called UV coordinates – these are the numbers we care about.

Technically, fragment shaders run per fragment, which aren’t necessarily pixel sized and there are other ways to read framebuffers, but none of that matters in the context of this article.

detective

Texture coordinates, also called UV Coordinates or UVs for short
Texture coordinates, also called “UV” Coordinates or “UVs” for short
Note the squished appearance of the image

UV coordinates specify the position we read in the image, with bottom left being 0,0 and the top right being 1,1. Neither UV coordinates, nor shaders themselves have any concept of image resolution, screen resolution or aspect ratio. If we want to address individual pixels, it’s on us to express that in terms of UV coordinates.

Although there are ways to find out, we don’t know which order output-pixels are processed in, and although the graphics pipeline can tell us, the shader doesn’t even know which output-pixel it currently processes

book

The framebuffer is passed into the fragment shader in line uniform sampler2D texture as a texture. Using the blur shader, we draw a “Full Screen Quad”, a rectangle covering the entire canvas, with matching 0,0 in the bottom-left and 1,1 in the top-right varying vec2 uv UV coordinates to read from the texture.

The texture’s aspect-ratio and resolution are the same as the output canvas’s aspect-ratio and resolution, thus there is a 1:1 pixel mapping between the texture we will process and our output canvas. The graphics pipeline steps and vertex shader responsible for this are not important for this article.

The blur fragment shader accesses the color of the texture with texture2D(texture, uv), at the matching output pixel’s position. In following examples, we’ll read from neighboring pixels, for which we’ll need to calculate a UV coordinate offset, a decimal fraction corresponding to one pixel step, calculated with with 1 / canvasResolution

One way to think of fragment shader code is “What are the instructions to construct this output pixel?”

think

Graphics programming is uniquely challenging in the beginning, because of how many rules and limitations the hardware, graphics APIs and the rendering pipeline impose. But it also unlocks incredible potential, as other limitations dissolve. Let’s find out how graphics programmers have leveraged that potential.

Box Blur #

From a programmer’s perspective, the most straight forward way is to average the neighbors of a pixel using a for-loop. What the fragment shader is expressing is: “look Y pixels up & down, X pixels left & right and average the colors”. The more we want to blur, the more we have to increase kernelSize, the bounds of our for-loop.


for (int y = -kernel_size; y <= kernel_size; ++y) {

	for (int x = -kernel_size; x <= kernel_size; ++x) {
		
		vec2 offset = vec2(x, y) * samplePosMult * frameSizeRCP;
		
		sum += texture2D(texture, uv + offset);
	}
}

The bigger the for-loop, the more texture reads we perform, per output-pixel. Each texture read is often called a “texture tap” and the total amount of those “taps” per-frame will now also be displayed. New controls, new samplePosMultiplier, new terms – Play around with them, get a feel for them, with a constant eye on FPS.

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.

Blur Fragment Shader boxBlur.fs

precision highp float;

varying vec2 uv;


uniform vec2 frameSizeRCP;
uniform float samplePosMult; 

uniform float bloomStrength; 

uniform sampler2D texture;

const int kernel_size = KERNEL_SIZE;

void main() {
	
	vec4 sum = vec4(0.0);
	
	const int size = 2 * kernel_size + 1;
	
	const float totalSamples = float(size * size);

	
	for (int y = -kernel_size; y <= kernel_size; ++y) {
	
		for (int x = -kernel_size; x <= kernel_size; ++x) {
			
			vec2 offset = vec2(x, y) * samplePosMult * frameSizeRCP;
			
			sum += texture2D(texture, uv + offset);
		}
	}

	
	gl_FragColor = (sum / totalSamples) * bloomStrength;
}
WebGL Javascript boxBlur.js
import * as util from '../utility.js'

export async function setupBoxBlur() {
	
	const WebGLBox = document.getElementById('WebGLBox-BoxBlur');
	const WebGLBoxDetail = document.getElementById('WebGLBox-BoxBlurDetail');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	
	const ctx = {
		
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null },
		
		fb: { scene: null, final: null },
		
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: WebGLBoxDetail.querySelector('#renderer'),
			iterTime: WebGLBoxDetail.querySelector('#iterTime'),
			tapsCount: WebGLBoxDetail.querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const boxBlurFrag = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/boxBlur.fs");

	
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		
		const worker = new Worker("./js/benchmark/boxBlurBenchmark.js", { type: "module" });

		
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: boxBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value
		});

		
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, boxBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	
	reCompileBlurShader(ui.blur.kernelSize.value)

	
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		const tapsNewText = (canvas.width * canvas.height * KernelSizeSide * KernelSizeSide / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		
		gl.useProgram(ctx.shd.blur.handle);
		const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
		gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
		gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		if (ctx.mode == "bloom") {
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

Visually, the result doesn’t look very pleasing. The stronger the blur, the more “boxy” features of the image become. This is due to us reading and averaging the texture in a square shape. Especially in bloom mode, with strong lightBrightness and big kernelSize, lights become literally squares.

Performance is also really bad. With bigger kernelSizes, our Texture Taps count skyrockets and performance drops. Mobile devices will come to a slog. Even the worlds fastest PC graphics cards will fall below screen refresh-rate by cranking kernelSize and zooming the article on PC, thus raising canvas resolution.

We kinda failed on all fronts. It looks bad and runs bad.

facepalm

Then, there’s this samplePosMultiplier. It seems to also seemingly increase blur strength, without increasing textureTaps or lowering performance (or lowering performance just a little on certain devices). But if we crank that too much, we get artifacts in the form of repeating patterns. Let’s play with a schematic example:

  • The white center square represents the output pixel
  • Grey squares are the pixels we would read, with the current kernelSize, with samplePosMult untouched
  • the black dots are our actual texture reads per-output-pixel, our “sample” positions

On can say, that an image is a “continous 2D signal”. When we texture tap at a specific coordinate, we are sampling the “image signal” at that coordinate. As previously mentioned, we use UV coordinates and are not bound by concepts like “pixels position”. Where we place our samples is completely up to us.

A fundamental blur algorithm option is increasing the sample distance away from the center, thus increasing the amount of image we cover with our samples - more bang for your sample buck. This works by multiplying the offset distance. That is what samplePosMult does and is something you will have access to going forward.

Doing it too much, brings ugly repeating patterns. This of course leaves some fundamental questions, like where these artifacts come from and what it even means to read between two pixels. And on top of that we have to address performance and the boxyness of our blur! But first…

What even is a kernel? #

What we have created with our for-loop, is a convolution. Very simplified, in the context of image processing, it’s usually a square of numbers constructing an output pixel, by gathering and weighting pixels, that the square covers. The square is called a kernel and was the thing we visualized previously.

For blurs, the kernel weights must sum up to 1. If that were not the case, we would either brighten or darken the image. Ensuring that is the normalization step. In the box blur above, this happens by dividing the summed pixel color by totalSamples, the total amount of samples taken. A basic “calculate the average” expression.

The same can be expressed as weights of a kernel, a number multiplied with each sample at that position. Since the box blur weighs all sample the same regardless of position, all weights are the same. This is visualized next. The bigger the kernel size, the smaller the weights.

Kernels applied at the edges of our image will read from areas “outside” the image, with UV coordinates smaller than 0,0 and bigger than 1,1. Luckily, the GPU handles this for us and we are free to decide what happens to those outside samples, by setting the Texture Wrapping mode.

Texture Wrapping Modes and results on blurring
Texture Wrapping Modes and results on blurring (Note the color black bleeding-in)
Top: Framebuffer, zoomed out. Bottom: Framebuffer normal, with strong blur applied

Among others, we can define a solid color to be used, or to “clamp” to the nearest edge’s color. If we choose a solid color, then we will get color bleeding at the edges. Thus for almost all post-processing use-cases, edge color clamping is used, as it prevents weird things happening at the edges. This article does too.

You may have noticed a black "blob" streaking with stronger blur levels along the bottom. Specifically here, it happens because the lines between the floor tiles align with the bottom edge, extending black color to infinity

detective

Convolution as a mathematical concept is surprisingly deep and 3blue1brown has an excellent video on it, that even covers the image processing topic. Theoretically, we won’t depart from convolutions. We can dissect our code and express it as weights and kernels. With the for-loop box blur, that was quite easy!

But what is a convolution?
YouTube Video by 3Blue1Brown

On a practical level though, understanding where the convolution is, how many there are and what kernels are at play will become more and more difficult, once we leave the realm of classical blurs and consider the wider implications of reading between pixel bounds. But for now, we stay with the classics:

Gaussian Blur #

The most famous of blur algorithms is the Gaussian Blur. It uses the normal distribution, also known as the bell Curve to weight the samples inside the kernel, with a new variable sigma σ to control the flatness of the curve. Other than generating the kernel weights, the algorithm is identical to the box blur algorithm.

Gaussian blur weights formula for point (x,y) (Source)

To calculate the weights for point (x,y), the above formula is used. The gaussian formula has a weighting multiplier 1/(2πσ²). In the code, there is no such thing though. The formula expresses the gaussian curve as a continuous function going to infinity. But our code and its for-loop are different - discrete and finite.

float gaussianWeight(float x, float y, float sigma)
{
	
	return exp(-(x * x + y * y) / (2.0 * sigma * sigma));
}

For clarity, the kernel is generated in the fragment shader. Normally, that should be avoided. Fragment shaders run per-output-pixel, but the kernel weights stay the same, making this inefficient.

teach

Just like with the box blur, weights are summed up and divided at the end, instead of the term 1/√(2πσ²) precalculating weights. sigma controls the sharpness of the curve and thus the blur strength, but wasn’t that the job of kernelSize? Play around with all the values below and get a feel for how the various values behave.

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.

Blur Fragment Shader gaussianBlur.fs

precision highp float;

varying vec2 uv;

uniform vec2 frameSizeRCP; 
uniform float samplePosMult; 
uniform float sigma;

uniform float bloomStrength; 

uniform sampler2D texture;

const int kernel_size = KERNEL_SIZE;

float gaussianWeight(float x, float y, float sigma)
{
	
	return exp(-(x * x + y * y) / (2.0 * sigma * sigma));
}

void main() {
	
	vec4 sum = vec4(0.0);
	

	float weightSum = 0.0;
	
	const int size = 2 * kernel_size + 1;
	
	const float totalSamples = float(size * size);

	
	for (int y = -kernel_size; y <= kernel_size; ++y) {
	
		for (int x = -kernel_size; x <= kernel_size; ++x) {

			
			float w = gaussianWeight(float(x), float(y), sigma);
			
			vec2 offset  = vec2(x, y) * samplePosMult * frameSizeRCP;

			
			sum += texture2D(texture, uv + offset) * w;
			weightSum += w;
		}
	}

	
	gl_FragColor = (sum / weightSum) * bloomStrength;
}
WebGL Javascript gaussianBlur.js
import * as util from '../utility.js'

export async function setupGaussianBlur() {
	
	const WebGLBox = document.getElementById('WebGLBox-GaussianBlur');
	const WebGLBoxDetail = document.getElementById('WebGLBox-GaussianBlurDetail');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	
	const ctx = {
		
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null },
		
		fb: { scene: null, final: null },
		
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, sigma: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			sigma: WebGLBox.querySelector('#sigmaRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: WebGLBoxDetail.querySelector('#renderer'),
			iterTime: WebGLBoxDetail.querySelector('#iterTime'),
			tapsCount: WebGLBoxDetail.querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const gaussianBlurFrag = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/gaussianBlur.fs");

	
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.sigma.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		
		const worker = new Worker("./js/benchmark/gaussianBlurBenchmark.js", { type: "module" });

		
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: gaussianBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value,
			sigma: ui.blur.sigma.value
		});

		
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);


	
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, gaussianBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength", "sigma"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	
	reCompileBlurShader(ui.blur.kernelSize.value)

	
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);


		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		const tapsNewText = (canvas.width * canvas.height * KernelSizeSide * KernelSizeSide / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		
		gl.useProgram(ctx.shd.blur.handle);
		const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
		gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
		gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
		gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
		gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		if (ctx.mode == "bloom") {
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

The blur looks way smoother than our previous box blur, with things generally taking on a “rounder” appearance, due to the bell curve’s smooth signal response. That is, unless you move the sigma slider down. If you move sigma too low, you will get our previous box blur like artifacts again.

Let’s clear up what the values actually represent and how they interact. The following visualization shows the kernel with its weights expressed as height in an Isometric Dimetric perspective projection. There are two different interaction modes with sigma when changing kernelSize and two ways to express sigma.

sigma describes the flatness of our mathematical curve, a curve going to infinity. But our algorithm has a limited kernelSize. Where the kernel stops, no more pixel contributions occur, leading to box-blur-like artifacts due to the cut-off. In the context of image processing, there are two ways to setup a gaussian blur…

A small sigma, thus a flat bell curve, paired with a small kernel size effectively is a box blur, with the weights making the kernel box-shaped.

think

… way 1: Absolute Sigma. sigma is an absolute value in pixels independent of kernelSize, with kernelSize acting as a “window into the curve” or way 2: sigma is expressed relative to the current kernelSize. For practical reasons (finicky sliders) the relative to kernelSize mode is used everywhere.

Eitherway, the infinite gaussian curve will have a cut-off somewhere. sigma too small? - We get box blur like artifacts. sigma too big? - We waste blur efficiency, as the same perceived blur strength requires bigger kernels, thus bigger for-loops with lower performance. An artistic trade-off every piece of software has to make.

An optimal kernel would be one, where the outer weights are almost zero. Thus, if we increased kernelSize in Absolute Sigma mode by one, it would make close to no more visual difference.

teach

There are other ways of creating blur kernels, with other properties. One way is to follow Pascal’s triangle to get a set of predefined kernel sizes and weights. These are called Binomial Filters and lock us into specific “kernel presets”, but solve the infinity vs cut-off dilemma, by moving weights to zero within the sampling window.

Binomial Kernels are also Gaussian-like in their frequency response. We won’t expand on these further, just know that we can choose kernels by different mathematical criteria, chasing different signal response characteristics. But speaking of which, what even is Gaussian Like? Why do we care?

What is Gaussian-like? #

In Post-Processing Blur algorithms you generally find two categories. Bokeh Blurs and Gaussian-Like Blurs. The gaussian is chosen for its natural appearance, its ability to smooth colors without “standout features”. Gaussian Blurs are generally used as an ingredient in an overarching visual effect, be it frosted glass Interfaces or Bloom.

Bokeh blur, gaussian blur comparison
Bokeh Blur and Gaussian Blur compared.

In contrast to that, when emulating lenses and or creating Depth of Field, is “Bokeh Blur” - also known as “Lens Blur” or “Cinematic Blur”. This type of blur is the target visual effect. The challenges and approaches are very much related, but algorithms used differ.

Algorithms get really creative in this space, all with different trade-offs and visuals. Some sample using a poission disk distribution and some have cool out of the box thinking: Computerphile covered a complex numbers based approach to creating Bokeh Blurs, a fascinating number theory cross-over.

Video Game & Complex Bokeh Blurs
YouTube Video by Computerphile

This article though doesn’t care about these stylistics approaches. We are here to chase a basic building block of graphics programming and realtime visual effects, a “Gaussian-Like” with good performance. Speaking of which!

Performance #

The main motivator of our journey here, is the chase of realtime performance. Everything we do must happen within few milliseconds. The expected performance of an algorithm and the practical cost once placed in the graphics pipeline, are sometimes surprisingly different numbers though. Gotta measure!

This chapter is about a very technical motivation. If you don't care about how fast a GPU does what it does, feel free to skip this section.

happy

With performance being such a driving motivator, it would be a shame if we couldn’t measure it in this article. Each WebGL Box has a benchmark function, which blurs random noise at a fixed resolution of 1600x1200 with the respective blur settings you chose and a fixed iteration count workload, a feature hidden so far.

Realtime graphics programming is sometimes more about measuring than programming.

laugh

Benchmarking is best done by measuring shader execution time. This can be done in the browser reliably, but only on some platforms. No way exists to do so across all platforms. Luckily, there is the classic method of “stalling the graphics pipeline”, forcing a wait until all commands finish, a moment in time we can measure.

Across all platforms a stall is guaranteed to occur on command gl.readPixels(). Interestingly, the standards conform command for this: gl.finish() is simply ignored by mobile apple devices.

book

Below is a button, that unlocks this benchmarking feature, unhiding a benchmark button and Detailed Benchmark Results section under each blur. This allows you to start a benchmark with a preset workload, on a separate Browser Worker. There is only one issue: Browsers get very angry if you full-load the GPU this way.

If the graphics pipeline is doing work without reporting back (called “yielding”) to the browser for too long, browsers will simply kill all GPU access for the whole page, until tab reload. If we yield back, then the measured results are useless and from inside WebGL, we can’t stop the GPU, once its commands are issued.

⚠️ Especially on mobile: please increase kernelSize and iterations slowly. The previous algorithms have bad kernelSize performance scaling on purpose, be especially careful with them.

Stay below 2 seconds of execution time, or the browser will lock GPU access for the page, disabling all blur examples, until a browser restart is performed. On iOS Safari this requires a trip to the App Switcher, a page reload won't be enough.

iOS and iPad OS are especially strict, will keep GPU access disabled, even on Tab Reload. You will have go to the App Switcher (Double Tap Home Button), Swipe Safari Up to close it and relaunch it from scratch.

miffed

What are we optimizing for? #

With the above Box Blur and above Gaussian Blur, you will measure performance scaling very badly with kernelSize. Expressed in the Big O notation, it has a performance scaling of O(pixelCount * kernelSize²). Quadratic scaling of required texture taps in terms of kernelSize. We need to tackle this going forward.

Especially dedicated Laptop GPUs are slow to get out of their lower power states. Pressing the benchmark button multiple times in a row may result in the performance numbers getting better.

detective

Despite the gaussian blur calculating the kernel completely from scratch on every single pixel in our implementation, the performance of the box blur and gaussian blur are very close to each other at higher iteration counts. In fact, by precalculating those kernels we could performance match both.

But isn't gaussian blur a more complicated algorithm?

think

As opposed to chips from decades ago, modern graphics cards have very fast arithmetic, but comparatively slow memory access times. With workloads like these, the slowest thing becomes the memory access, in our case the texture taps. The more taps, the slower the algorithm.

Our blurs perform a dependant texture read, a graphics programming sin. This is when texture coordinates are determined during shader execution, which opts out of many automated shader optimizations.

teach

Especially on personal computers, you may also have noticed that increasing samplePosMultiplier will negatively impact performance (up to a point), even though the required texture taps stay the same.

This is due hardware texture caches accelerating texture reads which are spatially close together and not being able to do so effectively, if the texture reads are all too far apart. Platform dependant tools like Nvidia NSight can measure GPU cache utilization. The browser cannot.

These are key numbers graphics programmers chase when writing fragment shaders: Texture Taps and Cache Utilization. There is another one, we will get into in a moment. Clearly, our Blurs are slow. Time for a speed up!

Separable Gaussian Blur #

We have not yet left the classics of blur algorithms. One fundamental concept left on the table is “convolution separability”. Certain Convolutions like our Box Blur, our Gaussian Blur and the Binominal filtering mentioned in passing previously can all be performed in two separate passes, by two separate 1D Kernels.

Gaussian blur weights formula for, separated

Not all convolutions are separable. In the context of graphics programming: If you can express the kernel weights as a formula with axes X, Y and factor-out both X and Y into two separate formulas, then you have gained separability of a 2D kernel and can perform the convolution in two passes, massively saving on texture taps.

Some big budget video games have used effects with kernels that are not separable, but did it anyway in two passes + 1D Kernel for the performance gain, with the resulting artifacts being deemed not too bad.

detective

Computerphile covered the concept of separability in the context of 2D image processing really well, if you are interested in a more formal explanation.

Separable Filters and a Bauble
YouTube Video by Computerphile

Here is our Gaussian Blur, but expressed as a separable Version. You can see just Pass 1 and Pass 2 in isolation or see the final result. Same visual quality as our Gaussian Blur, same dials, but massively faster, with no more quadratic scaling of required texture taps.

Blur Fragment Shader gaussianBlurSeparable.fs

precision highp float;

varying vec2 uv;

uniform vec2 frameSizeRCP; 
uniform float samplePosMult; 
uniform float sigma;
uniform vec2 direction; 

uniform float bloomStrength; 

uniform sampler2D texture;

const int kernel_size = KERNEL_SIZE;

float gaussianWeight(float x, float sigma)
{
	
	return exp(-(x * x) / (2.0 * sigma * sigma));
}

void main() {
	
	vec4 sum = vec4(0.0);
	
	float weightSum = 0.0;
	
	
	const int size = 2 * kernel_size + 1;

	
	for (int i = -kernel_size; i <= kernel_size; ++i) {
		
		float w = gaussianWeight(float(i), sigma);
		
		
		vec2 offset = vec2(i) * direction * samplePosMult * frameSizeRCP;

		
		sum += texture2D(texture, uv + offset) * w;
		weightSum += w;
	}

	
	gl_FragColor = (sum / weightSum) * bloomStrength;
}
WebGL Javascript gaussianSeparableBlur.js
import * as util from '../utility.js'

export async function setupGaussianSeparableBlur() {
	
	const WebGLBox = document.getElementById('WebGLBox-GaussianSeparableBlur');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	
	const ctx = {
		
		mode: "scene",
		passMode: "pass1",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		
		tex: { sdr: null, selfIllum: null, frame: null, frameIntermediate: null, frameFinal: null },
		
		fb: { scene: null, intermediate: null, final: null },
		
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, sigma: null, bloomStrength: null, direction: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			sigma: WebGLBox.querySelector('#sigmaRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[name="modeGaussSep"]'),
			passModes: WebGLBox.querySelectorAll('input[name="passMode"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#renderer'),
			passMode: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#passMode'),
			iterTime: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-GaussianSeparableBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const gaussianBlurFrag = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/gaussianBlurSeparable.fs");

	
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.sigma.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});
	
	
	ui.rendering.passModes.forEach(radio => {
		
		if (radio.value === "pass1")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.passMode = event.target.value;
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		
		const worker = new Worker("./js/benchmark/gaussianSeparableBlurBenchmark.js", { type: "module" });

		
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: gaussianBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value,
			sigma: ui.blur.sigma.value,
			passMode: ctx.passMode
		});

		
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.passMode.textContent = event.data.passMode;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);


	
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, gaussianBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength", "sigma", "direction"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	
	reCompileBlurShader(ui.blur.kernelSize.value)

	
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.intermediate);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.intermediate, ctx.tex.frameIntermediate] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate);
		gl.clearColor(0.0, 0.0, 0.0, 1.0);
		gl.clear(gl.COLOR_BUFFER_BIT);


		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		
		const samplesPerPixel = ctx.passMode == "combined" ? KernelSizeSide * 2 : KernelSizeSide;
		const tapsNewText = (canvas.width * canvas.height * samplesPerPixel / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		
		gl.useProgram(ctx.shd.blur.handle);
		
		if (ctx.passMode == "pass1") {
			
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 1.0, 0.0); 
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		} else if (ctx.passMode == "pass2") {
			
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 0.0, 1.0); 
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		} else {
			
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 1.0, 0.0); 
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
			
			
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.uniform2f(ctx.shd.blur.uniforms.direction, 0.0, 1.0); 
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameIntermediate);
			gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
			gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		if (ctx.mode == "bloom") {
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameIntermediate); ctx.tex.frameIntermediate = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.intermediate); ctx.fb.intermediate = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

If you benchmark the performance, you will see a massive performance uplift, as compared to our Gaussian Blur! But there is a trade-off made, that’s not quite obvious. In order to have two passes, we are writing out a new framebuffer. Remember the “modern chips are fast but memory access in relation is not” thing?

With a modern High-res 4k screen video game, multi-pass anything implies writing out 8.2 Million Pixels to memory, just to read them back in. With smaller kernels on high-res displays, a separable kernel may not always be faster. But with bigger kernels, it almost always is. With a massive speed-up gained, how much faster can we go?

The magic of frequency space #

…how about blurs that happen so fast, that they are considered free! We are doing a bit of a detour into Frequency Space image manipulation.

Any 2D image can be converted and edited in frequency space, which unlocks a whole new sort of image manipulation. To blur an image in this paradigm, we perform an image Fast Fourier Transform, then mask high frequency areas to perform the blur and finally do the inverse transformation.

A Fourier Transform decomposes a signal into its underlying Sine Frequencies. The output of an image Fast Fourier Transform are “Magnitude” and “Phase” component images. These images can be combined back together with the inverse image FFT to produce the original image again…

FFT Viz Input image
Input image for the following interactive FFT example
The green stripes are not an error, they are baked into the image on purpose.

…but before doing so, we can manipulate the frequency representation of the image in various ways. Less reading, more interaction! In the following interactive visualization you have the magnitude image, brightness boosted into a human visible representation on the left and the reconstructed image on the right.

For now, play around with removing energy. You can paint on the magnitude image with your fingers or with the mouse. The output image will be reconstructed accordingly. Also, play around with the circular mask and the feathering sliders. Try to build intuition for what’s happening.

The magnitude image represents the frequency make-up of the image, with the lowest frequencies in the middle and higher at the edges. Horizontal frequencies (vertical features in the image) follow the X Axis and vertical frequencies (Horizontal features in the image) follow the Y Axis, with in-betweens being the diagonals.

Repeating patterns in the image lighten up as bright-points in the magnitude representation. Or rather, their frequencies have high energy: E.g. the green grid I added. Removing it in photoshop wouldn’t be easy! But in frequency space it is easy! Just paint over the blueish 3 diagonal streaks.

Removing repeating features by finger-painting black over frequencies still blows me away.

surprised

As you may have noticed, the Magnitude representation holds mirrored information. This is due to the FFT being a complex number analysis and our image having only “real” component pixels, leaving redundant information. The underlying number theory was covered in great detail by 3Blue1Brown:

But what is the Fourier Transform? A visual introduction.
YouTube Video by 3Blue1Brown

The underlying code this time is not written by me, but is from @turbomaze’s repo JS-Fourier-Image-Analysis. There is no standard on how you are supposed to plot the magnitude information and how the quadrants are layed out. I changed the implementation by @turbomaze to follow the convention used by ImageMagick.

We can blur the image by painting the frequency energy black in a radius around the center, thus eliminating higher frequencies and blurring the image. If we do so with a pixel perfect circle, then we get ringing artifacts - The Gibbs phenomenon. By feathering the circle, we lessen this ringing and the blur cleans up.

Drawing a circle like this? That's essentially free on the GPU! We get the equivalent of get super big kernels for free!

party

But not everything is gold that glitters. First of all, performance. Yes, the “blur” in frequency space is essentially free, but the trip to frequency space, is everything but. The main issue comes down to FFT transformations performing writes to exponentially many pixels per input pixel, a performance killer.

And then there's still the inverse conversion!

facepalm

But our shaders work the other way around, expressing the “instructions to construct an output pixel”. There are fragment shader based GPU implementations, but they rely on many passes for calculation, a lot of memory access back and forth. Furthermore, non-power of two images require a slower algorithm.

This article is in the realm of fragment shaders and the graphics pipeline a GPU is part of, but there are also GPGPU and compute shader implementations with no fragment shader specific limitations. Unfortunately the situation remains: Conversion of high-res images to frequency space is too costly in the context of realtime graphics.

Deleting the frequencies of that grid is magical, but leaves artifacts. In reality it's worse, as my example is idealized. Click Upload Image, take a photo of a repeating pattern and see how cleanly you can get rid of it.

detective

Then there are the artifacts I have glossed over. The FFT transformation considers the image as an infinite 2D signal. By blurring, we are bleeding through color from the neighbor copies. And that’s not to mention various ringing artifacts that happen. None of this is unsolvable! But there a more underlying issue…

What is a Low-Pass filter? #

It's a filter that removes high frequencies and leaves the low ones, easy!

happy

Try the FFT Example again and decrease the frequencyCutRadius to blur. At some point the green lines disappear, right? It is a low pass filter, one where high frequencies are literally annihilated. Small bright lights in the distance? Also annihilated…

If we were to use this to build an effect like bloom, it would remove small lights that are meant to bloom as well! Our gaussian blur on the other hand, also a low-pass filter, samples and weights every pixel. In a way it “takes the high frequency energy and spreads it into low frequency energy”.

So Low Pass Filter ≠ Low Pass Filter, it depends on context as to what is meant by that word and the reason the article didn’t use it until now. Frequency Space energy attenuations are simply not the correct tool for our goal of a “basic graphics programming building block” for visual effects.

This is a deep misunderstanding I held for year, as in why didn't video games such a powerful tool?

speak

There are other frequency space image representations, not just FFT Magnitude + Phase. Another famous one is Discrete cosine transform. Again, computerphile covered it in great detail in a video. As for realtime hires images, no. DCT conversion is multiple magnitudes slower. Feel free to dive deeper into frequency space…

JPEG DCT, Discrete Cosine Transform (JPEG Pt2)
YouTube Video by Computerphile

…as for this article, it’s the end of our frequency space detour. We talked so much about what’s slow on the GPU. Let’s talk about something that’s not just fast, but free:

Bilinear Interpolation #

Reading from textures comes with a freebie. When reading between pixels, the closet four pixel are interpolated bilinearly to create the final read, unless you switch to Nearest Neightbor mode. Below you can drag the color sample with finger touch or the mouse. Take note of how and when the color changes in the respective modes.

Since reading between pixels gets a linear mix of pixel neighbors, we can linearly interpolate part of our gaussian kernel, sometimes called a Linear Gaussian. By tweaking gaussian weights and reducing the amount of samples we could do a 7 × 7 gaussian kernel worth of information with only a 4 × 4 kernel, as shown in the linked article.

Though mathematically not the same, visually the result is very close. There are a lot of hand-crafted variations on this, different mixes of kernel sizes and interpolation amounts.

speak

Bilinear interpolation allows us to resize an image by reading from it at lower resolution. In a way, it’s a free bilinear resize built into every graphics chip, zero performance impact. But there is a limit - the bilinear interpolation is limited to a 2 × 2 sample square. Try to resize the kiwi below in different modes.

To make this more obvious, the following canvas renders at 25% of native resolution.

teach

❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.

WebGL Vertex Shader circleAnimationSize.vs

attribute vec2 vtx;
varying vec2 uv;


uniform vec2 offset;
uniform float kiwiSize;

void main()
{
	
	uv = vtx * vec2(0.5, -0.5) + 0.5;
	
	
	gl_Position = vec4(vtx * kiwiSize + offset, 0.0, 1.0);
}
WebGL Fragment Shader simpleTexture.fs
precision highp float;
varying vec2 uv;

uniform sampler2D texture;

void main() {
	gl_FragColor = texture2D(texture, uv);
}
WebGL Javascript bilinear.js
import * as util from './utility.js'

export async function setupBilinear() {
	
	const WebGLBox = document.getElementById('WebGLBox-Bilinear');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;
	
	
	const resDiv = 4; 
	let renderFramebuffer, renderTexture;
	let buffersInitialized = false;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: true,
	});

	
	const ctx = {
		mode: "nearest",
		flags: { isRendering: false, initComplete: false },
		
		tex: { sdr: null },
		
		shd: {
			kiwi: { handle: null, uniforms: { offset: null, kiwiSize: null } },
			blit: { handle: null, uniforms: { texture: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg'),
			contextLoss: canvas.parentElement.querySelector('div'),
		},
		rendering: {
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			animate: WebGLBox.querySelector('#animateCheck'),
			kiwiSize: WebGLBox.querySelector('#kiwiSize'),
		}
	};

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.value === "nearest")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			if (!ui.rendering.animate.checked) redraw();
		});
	});


	
	const circleAnimationSize = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/circleAnimationSize.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");

	
	ui.rendering.kiwiSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	
	ctx.shd.kiwi = util.compileAndLinkShader(gl, circleAnimationSize, simpleTexture, ["offset", "kiwiSize"]);
	
	
	ctx.shd.blit = util.compileAndLinkShader(gl, simpleQuad, simpleTexture, ["texture"]);
	
	
	gl.useProgram(ctx.shd.kiwi.handle);

	
	util.bindUnitQuad(gl);

	
	function loadSVGAsImage(blob) {
		return new Promise((resolve) => {
			const img = new Image();
			const url = URL.createObjectURL(blob);
			
			img.onload = () => {
				URL.revokeObjectURL(url);
				resolve(img);
			};
			
			img.src = url;
		});
	}

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		buffersInitialized = true;
		ctx.flags.initComplete = false;

		
		gl.deleteFramebuffer(renderFramebuffer);
		renderFramebuffer = gl.createFramebuffer();
		gl.bindFramebuffer(gl.FRAMEBUFFER, renderFramebuffer);

		
		gl.deleteTexture(renderTexture);
		renderTexture = gl.createTexture();
		gl.bindTexture(gl.TEXTURE_2D, renderTexture);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
		gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, canvas.width / resDiv, canvas.height / resDiv, 0, gl.RGBA, gl.UNSIGNED_BYTE, null);
		gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, renderTexture, 0);
		buffersInitialized = true;

		
		let base = await fetch("img/kiwi4by3.svg");
		let baseBlob = await base.blob();
		let baseImage = await loadSVGAsImage(baseBlob);
		let baseBitmap = await createImageBitmap(baseImage, { resizeWidth: canvas.width / resDiv, resizeHeight: canvas.height / resDiv, colorSpaceConversion: 'none', resizeQuality: "high" });

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.NEAREST, baseBitmap, 4);

		baseBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	async function redraw() {
		if (!buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		gl.viewport(0, 0, canvas.width / resDiv, canvas.height / resDiv);
		if (!renderFramebuffer) return;
		gl.bindFramebuffer(gl.FRAMEBUFFER, renderFramebuffer);
		gl.clear(gl.COLOR_BUFFER_BIT);
		
		
		gl.useProgram(ctx.shd.kiwi.handle);
		
		
		gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, ctx.mode == "nearest" ? gl.NEAREST : gl.LINEAR);
		gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, ctx.mode == "nearest" ? gl.NEAREST : gl.LINEAR);

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		
		gl.uniform2fv(ctx.shd.kiwi.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.kiwi.uniforms.kiwiSize, ui.rendering.kiwiSize.value);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		gl.viewport(0, 0, canvas.width, canvas.height);
		gl.bindFramebuffer(gl.FRAMEBUFFER, null);
		
		
		gl.useProgram(ctx.shd.blit.handle);
		
		
		if (!renderTexture) return;
		gl.bindTexture(gl.TEXTURE_2D, renderTexture);
		
		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;
			gl.viewport(0, 0, canvas.width, canvas.height);

			stopRendering();
			startRendering();
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(renderTexture); renderTexture = null;
		gl.deleteFramebuffer(renderFramebuffer); renderFramebuffer = null;
		buffersInitialized = false;
		ctx.flags.initComplete = false;
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

Nearest Neightbor looks pixelated, if the size is not at 100% size, which is equivalent to 1:1 pixel mapping. At 100% it moves “jittery”, as it “snaps” to the nearest neighbor. Bilinear keeps things smooth, but going below 50%, especially below 25%, we get exactly the same kind of aliasing, as we would get from nearest neighbor!

You may have noticed similar aliasing when playing YouTube Videos at a very high manually selected video resolution, but in a small window. Same thing!

detective

With 2 × 2 samples, we start skipping over color information, if the underlying pixels are smaller than half a pixel in size. Below 50% size, our bilinear interpolation starts to act like nearest neighbor interpolation. So as a result, we can shrink image in steps of 50%, without “skipping over information” and creating aliasing. Let’s use that!

Downsampling #

One fundamental thing thing you can do in post-processing is to shrink “downsample” first, perform the processing at a lower resolution and upsample again. With the idea being, that you wouldn’t notice the lowered resolution. Below is the Separable Gaussian Blur again, with a variable downsample / upsample chain.

Each increase of downSample adds a 50% scale step. Let’s visualize the framebuffers in play, as it gets quite complex. Here is an example of a square 1024 px² image, a downSample of 2 and our two pass separable Gaussian blur.

Downsample and Blur Framebuffers
Framebuffers and their sizes, as used during the downsample + blur chain

One unused optimization is that the blur can read straight from the 512 px² framebuffer and output the 256 px² directly, skipping one downsample step.

detective

Below you have the option to skip part of the downsample or part of the upsample chain, if you have downSample set to higher than 1. What may not be quite obvious is why we also upsample in steps. Play around with all the dials and modes, to get a feel for what’s happening.

Blur Fragment Shader gaussianSeparableBlur.fs

precision highp float;

varying vec2 uv;

uniform vec2 frameSizeRCP; 
uniform float samplePosMult; 
uniform float sigma;
uniform vec2 direction; 

uniform float bloomStrength; 

uniform sampler2D texture;

const int kernel_size = KERNEL_SIZE;

float gaussianWeight(float x, float sigma)
{
	
	return exp(-(x * x) / (2.0 * sigma * sigma));
}

void main() {
	
	vec4 sum = vec4(0.0);
	
	float weightSum = 0.0;
	
	
	const int size = 2 * kernel_size + 1;

	
	for (int i = -kernel_size; i <= kernel_size; ++i) {
		
		float w = gaussianWeight(float(i), sigma);
		
		
		vec2 offset = vec2(i) * direction * samplePosMult * frameSizeRCP;

		
		sum += texture2D(texture, uv + offset) * w;
		weightSum += w;
	}

	
	gl_FragColor = (sum / weightSum) * bloomStrength;
}
WebGL Javascript downsample.js
import * as util from '../utility.js'



export async function setupGaussianDownsampleBlur() {
	
	const WebGLBox = document.getElementById('WebGLBox-GaussianDownsampleBlur');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	
	const ctx = {
		
		mode: "scene",
		skipMode: "normal",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null, down: [], intermediate: [], nativeIntermediate: null },
		
		fb: { scene: null, final: null, down: [], intermediate: [], nativeIntermediate: null },
		
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			passthrough: { handle: null },
			blur: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, sigma: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			kernelSize: WebGLBox.querySelector('#sizeRange'),
			sigma: WebGLBox.querySelector('#sigmaRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
			downSample: WebGLBox.querySelector('#downSampleRange'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			skipModes: WebGLBox.querySelectorAll('input[name="skipMode"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#renderer'),
			skipMode: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#skipMode'),
			iterTime: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-GaussianDownsampleBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const gaussianBlurFrag = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/gaussianBlurSeparable.fs");

	
	ui.blur.kernelSize.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.sigma.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.downSample.addEventListener('input', () => { 
		updateSkipModeControls();
		if (!ui.rendering.animate.checked) redraw() 
	});

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.kernelSize.addEventListener('input', () => {
		reCompileBlurShader(ui.blur.kernelSize.value);
		ui.blur.samplePos.disabled = ui.blur.kernelSize.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.kernelSize.value == 0;
	});

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.name === "skipMode") return;
		
		
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	
	ui.rendering.skipModes.forEach(radio => {
		
		if (radio.value === "normal")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.skipMode = event.target.value;
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	
	function updateSkipModeControls() {
		const hasIntermediarySteps = ui.blur.downSample.value > 1;
		ui.rendering.skipModes.forEach(radio => {
			radio.disabled = !hasIntermediarySteps;
		});
		
		if (!hasIntermediarySteps && ctx.skipMode !== "normal") {
			ctx.skipMode = "normal";
		}
		
		ui.rendering.skipModes.forEach(radio => {
			radio.checked = (radio.value === ctx.skipMode);
		});
	}

	
	updateSkipModeControls();

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		
		const worker = new Worker("./js/benchmark/downsampleBenchmark.js", { type: "module" });

		
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			blurShaderSrc: gaussianBlurFrag,
			kernelSize: ui.blur.kernelSize.value,
			samplePos: ui.blur.samplePos.value,
			sigma: ui.blur.sigma.value,
			downSample: ui.blur.downSample.value,
			skipMode: ctx.skipMode
		});

		
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.skipMode.textContent = event.data.skipMode;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	
	ctx.shd.passthrough = util.compileAndLinkShader(gl, simpleQuad, simpleTexture);

	
	function reCompileBlurShader(blurSize) {
		ctx.shd.blur = util.compileAndLinkShader(gl, simpleQuad, gaussianBlurFrag, ["frameSizeRCP", "samplePosMult", "bloomStrength", "sigma", "direction"], "#define KERNEL_SIZE " + blurSize + '\n');
	}

	
	reCompileBlurShader(ui.blur.kernelSize.value)

	
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		
		
		gl.deleteFramebuffer(ctx.fb.nativeIntermediate);
		gl.deleteTexture(ctx.tex.nativeIntermediate);
		[ctx.fb.nativeIntermediate, ctx.tex.nativeIntermediate] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		const maxDown = ui.blur.downSample.max;
		for (let i = 0; i < ui.blur.downSample.max; ++i) {
			gl.deleteFramebuffer(ctx.fb.down[i]);
			gl.deleteTexture(ctx.tex.down[i]);
			gl.deleteFramebuffer(ctx.fb.intermediate[i]);
			gl.deleteTexture(ctx.tex.intermediate[i]);
		}
		ctx.fb.down = [];
		ctx.tex.down = [];
		ctx.fb.intermediate = [];
		ctx.tex.intermediate = [];

		let w = canvas.width, h = canvas.height;
		for (let i = 0; i < maxDown; ++i) {
			w = Math.max(1, w >> 1);
			h = Math.max(1, h >> 1);
			const [fb, tex] = util.setupFramebuffer(gl, w, h);
			ctx.fb.down.push(fb);
			ctx.tex.down.push(tex);
			const [intermediateFb, intermediateTex] = util.setupFramebuffer(gl, w, h);
			ctx.fb.intermediate.push(intermediateFb);
			ctx.tex.intermediate.push(intermediateTex);
		}

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	
	function performSeparableBlur(srcTexture, targetFB, width, height, intermediateFB, intermediateTex, bloomStrength) {
		gl.useProgram(ctx.shd.blur.handle);
		
		
		gl.uniform2f(ctx.shd.blur.uniforms.frameSizeRCP, 1.0 / width, 1.0 / height);
		gl.uniform1f(ctx.shd.blur.uniforms.samplePosMult, ui.blur.samplePos.value);
		gl.uniform1f(ctx.shd.blur.uniforms.sigma, Math.max(ui.blur.kernelSize.value / ui.blur.sigma.value, 0.001));
		gl.uniform1f(ctx.shd.blur.uniforms.bloomStrength, bloomStrength);
		
		
		gl.bindFramebuffer(gl.FRAMEBUFFER, intermediateFB);
		gl.viewport(0, 0, width, height);
		gl.uniform2f(ctx.shd.blur.uniforms.direction, 1.0, 0.0); 
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, srcTexture);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		
		
		gl.bindFramebuffer(gl.FRAMEBUFFER, targetFB);
		gl.viewport(0, 0, width, height);
		gl.uniform2f(ctx.shd.blur.uniforms.direction, 0.0, 1.0); 
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, intermediateTex);
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		const KernelSizeSide = ui.blur.kernelSize.value * 2 + 1;
		const effectiveRes = [Math.max(1, canvas.width >> +ui.blur.downSample.value), Math.max(1, canvas.height >> +ui.blur.downSample.value)];
		const tapsNewText = (effectiveRes[0] * effectiveRes[1] * KernelSizeSide * 2 / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		
		const levels = ui.blur.downSample.value;
		let srcTex = ctx.tex.frame;
		let w = canvas.width, h = canvas.height;

		if (levels > 0) {
			if (ctx.skipMode === "skipDown") {
				
				const lastDownsampleFB = ctx.fb.down[levels - 1];
				const lastIntermediateFB = ctx.fb.intermediate[levels - 1];
				const lastIntermediateTex = ctx.tex.intermediate[levels - 1];
				
				w = Math.max(1, canvas.width >> levels);
				h = Math.max(1, canvas.height >> levels);
				const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
				
				performSeparableBlur(srcTex, lastDownsampleFB, w, h, lastIntermediateFB, lastIntermediateTex, bloomStrength);
				srcTex = ctx.tex.down[levels - 1];
			} else {
				
				gl.useProgram(ctx.shd.passthrough.handle);
				for (let i = 0; i < levels - 1; ++i) {
					const fb = ctx.fb.down[i];
					w = Math.max(1, w >> 1);
					h = Math.max(1, h >> 1);

					gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
					gl.viewport(0, 0, w, h);

					gl.activeTexture(gl.TEXTURE0);
					gl.bindTexture(gl.TEXTURE_2D, srcTex);
					gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
					gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
					srcTex = ctx.tex.down[i];
				}

				
				const lastDownsampleFB = ctx.fb.down[levels - 1];
				const lastIntermediateFB = ctx.fb.intermediate[levels - 1];
				const lastIntermediateTex = ctx.tex.intermediate[levels - 1];
				w = Math.max(1, w >> 1);
				h = Math.max(1, h >> 1);
				const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
				
				performSeparableBlur(srcTex, lastDownsampleFB, w, h, lastIntermediateFB, lastIntermediateTex, bloomStrength);
				srcTex = ctx.tex.down[levels - 1];
			}
		} else {
			
			const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			
			performSeparableBlur(srcTex, ctx.fb.final, canvas.width, canvas.height, ctx.fb.nativeIntermediate, ctx.tex.nativeIntermediate, bloomStrength);
			srcTex = ctx.tex.frameFinal;
		}

		
		if (levels > 0) {
			if (ctx.skipMode === "skipUp") {
				
				
			} else {
				
				gl.useProgram(ctx.shd.passthrough.handle);
				for (let i = levels - 2; i >= 0; i--) {
					const fb = ctx.fb.down[i];
					let upsampleW = Math.max(1, canvas.width >> (i + 1));
					let upsampleH = Math.max(1, canvas.height >> (i + 1));
					gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
					gl.viewport(0, 0, upsampleW, upsampleH);
					gl.activeTexture(gl.TEXTURE0);
					gl.bindTexture(gl.TEXTURE_2D, srcTex);
					gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
					gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
					srcTex = ctx.tex.down[i];
				}
			}
		}

		
		
		if (!(ctx.mode == "bloom" && levels == 0)) {
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.useProgram(ctx.shd.passthrough.handle);
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, srcTex);
			gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		if (ctx.mode == "bloom") {
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		gl.deleteFramebuffer(ctx.fb.nativeIntermediate); ctx.fb.nativeIntermediate = null;
		gl.deleteTexture(ctx.tex.nativeIntermediate); ctx.tex.nativeIntermediate = null;
		for (let i = 0; i < ui.blur.downSample.max; ++i) {
			gl.deleteTexture(ctx.tex.down[i]);
			gl.deleteFramebuffer(ctx.fb.down[i]);
			gl.deleteTexture(ctx.tex.intermediate[i]);
			gl.deleteFramebuffer(ctx.fb.intermediate[i]);
		}
		ctx.tex.down = [];
		ctx.fb.down = [];
		ctx.tex.intermediate = [];
		ctx.fb.intermediate = [];
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

With each downsample step, our kernel covers more and more area, thus increasing the blur radius. Performance is again, a massive lift up, as we quadratically get less and less pixels to blur with each step. We get bigger blurs with the same kernelSize and with stronger blurs in Scene mode, the resolution drop is not visible.

With smaller blurs you will get “shimmering”, as aliasing artifacts begin, even with our bilinear filtering in place. Small blurs and lower resolution don’t mix. This is especially painful in bloom mode with strong lightBrightness, as lights will start to “turn on and off” as they are not resolved correctly at lower resolutions.

There must be some kind of sweet spot of low resolution and blur strong enough to hide the the low resolution.

think

Skipping Downsample steps will bring obviously horrible aliasing. As for upsampling, there is a deep misunderstanding I held for years, until I read the SIGGRAPH 2014 presentation Next Generation Post Processing in Call of Duty: Advanced Warfare by graphics magician Jorge Jimenez. One page stuck out to me:

Page 159 from presentation Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez
Page 159 from presentation Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez

With upsampling, even when going from low res to high res in one jump, we aren’t “skipping” any information, right? Nothing is missed. But if you look closely at the above demo with larger downSample chains with Skip Upsample Steps mode, then you will see a vague grid like artifact appearing, especially with strong blurs. This point was expanded in the addendum.

Nearest Neightbor Interpolation Bilinear Neightbor Interpolation
Visualization of the bilinear interpolation (Source)

How to keeps things smooth when upsampling is the field of “Reconstruction filters”. By skipping intermediate upsampling steps we are performing a 2 × 2 sample bilinear reconstruction of very small pixels. As a result we get the bilinear filtering characteristic pyramid shaped hot spots. How we upscale matters.

Smooth Blur Animation #

One fundamental challenge with advanced blur algorithms is that it becomes challenging to get smooth blur sliders and smooth blur strength animations. Eg. with our separable gaussian blur, you could set kernelSize to the maximum required and adjust samplePosMultiplier smoothly between 0% and 100%.

With downsampling in the picture, this becomes more difficult and solutions to this are very context dependant, so we won’t dive into it. One approach you see from time to time is to simply give up on animating blur strength and blend between a blurred and unblurred version of the scene, as shown below. Visually, not very pleasing.

blurSliderDemo1 blurSliderDemo2

Kawase Blur #

Now we get away from the classical blur approaches. It’s the early 2000s and graphics programmer Masaki Kawase, today senior graphics programmer at Tokyo based company Silicon Studio, is programming the Xbox video game DOUBLE-S.T.E.A.L, a game with vibrant post-processing effects.

During the creation of those visual effects, Masaki Kawase used a new blurring technique that he presented in the 2003 Game Developers Conference talk Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (Wreckless). This technique became later referred to as the “Kawase Blur”. Let’s take a look at it:

kawase blur filter pattern kawase blur filter pattern
Sample placement in what later become known as the "Kawase Blur"
Excerpt from GDC presentation Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (2003)

This technique does not have a kernelSize parameter anymore. It works in passes of 4 equally weighted samples, placed diagonally from the center output pixel, in the middle where the corners of 4 pixels touch. These samples get color contributions equally from their neighbors, due to bilinear filtering.

This is new, there is no center pixel sample and, except for the required normalization, no explicit weights! The weighting happens as a result of bilinear filtering.

teach

After a pass is complete, that pass is used as the input to the next pass, where the outer 4 diagonal samples increase in distance by one pixel length. With each pass, this distance grows. Two framebuffers are required for this, which switch between being input and output between passes. This setup is often called “ping-ponging”.

L The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.

WebGL Fragment Shader kawase.fs

precision highp float;

varying vec2 uv;

uniform vec2 frameSizeRCP; 
uniform float samplePosMult; 
uniform float pixelOffset; 
uniform float bloomStrength; 

uniform sampler2D texture;

void main() {
	
	vec2 o = vec2(pixelOffset + 0.5) * samplePosMult * frameSizeRCP;
	
	
	vec4 color = vec4(0.0);
	color += texture2D(texture, uv + vec2( o.x,  o.y)); 
	color += texture2D(texture, uv + vec2(-o.x,  o.y)); 
	color += texture2D(texture, uv + vec2(-o.x, -o.y)); 
	color += texture2D(texture, uv + vec2( o.x, -o.y)); 
	color /= 4.0;
	
	
	gl_FragColor = color * bloomStrength;
}
WebGL Javascript kawase.js
import * as util from '../utility.js'

export async function setupKawaseBlur() {
	
	const WebGLBox = document.getElementById('WebGLBox-KawaseBlur');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	
	const ctx = {
		
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		
		tex: { sdr: null, selfIllum: null, frame: null, frameIntermediate1: null, frameIntermediate2: null, frameFinal: null },
		
		fb: { scene: null, intermediate1: null, intermediate2: null, final: null },
		
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			kawase: { handle: null, uniforms: { frameSizeRCP: null, samplePosMult: null, bloomStrength: null, pixelOffset: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			iterations: WebGLBox.querySelector('#iterationsRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[name="modeKawase"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#renderer'),
			kawaseIterations: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#kawaseIterations'),
			iterTime: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-KawaseBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const kawaseFrag = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/kawase.fs");

	
	ui.blur.iterations.addEventListener('input', () => { 
		
		const iterations = parseInt(ui.blur.iterations.value);
		ui.blur.samplePos.disabled = iterations === 0;
		ui.blur.samplePosReset.disabled = iterations === 0;
		if (!ui.rendering.animate.checked) redraw() 
	});
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		
		const worker = new Worker("./js/benchmark/kawaseBenchmark.js", { type: "module" });

		
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			kawaseShaderSrc: kawaseFrag,
			kawaseIterations: ui.blur.iterations.value,
			samplePos: ui.blur.samplePos.value
		});

		
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.kawaseIterations.textContent = event.data.kawaseIterations;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	
	ctx.shd.kawase = util.compileAndLinkShader(gl, simpleQuad, kawaseFrag, ["frameSizeRCP", "samplePosMult", "pixelOffset", "bloomStrength"]);

	
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.intermediate1);
		gl.deleteFramebuffer(ctx.fb.intermediate2);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.intermediate1, ctx.tex.frameIntermediate1] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.intermediate2, ctx.tex.frameIntermediate2] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate1);
		gl.clearColor(0.0, 0.0, 0.0, 1.0);
		gl.clear(gl.COLOR_BUFFER_BIT);
		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.intermediate2);
		gl.clearColor(0.0, 0.0, 0.0, 1.0);
		gl.clear(gl.COLOR_BUFFER_BIT);

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		const iterations = parseInt(ui.blur.iterations.value);
		
		const samplesPerPixel = iterations === 0 ? 1 : iterations * 4;
		const tapsNewText = (canvas.width * canvas.height * samplesPerPixel / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		
		if (iterations === 0) {
			
			const finalFB = ctx.mode === "bloom" ? ctx.fb.final : null; 
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			
			
			gl.useProgram(ctx.shd.kawase.handle);
			gl.uniform2f(ctx.shd.kawase.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.kawase.uniforms.samplePosMult, 0.0); 
			gl.uniform1f(ctx.shd.kawase.uniforms.pixelOffset, 0.0); 
			gl.uniform1f(ctx.shd.kawase.uniforms.bloomStrength, ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value);
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frame);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		} else {
			
			gl.useProgram(ctx.shd.kawase.handle);
			gl.uniform2f(ctx.shd.kawase.uniforms.frameSizeRCP, 1.0 / canvas.width, 1.0 / canvas.height);
			gl.uniform1f(ctx.shd.kawase.uniforms.samplePosMult, ui.blur.samplePos.value);
			
			

			let currentInputTex = ctx.tex.frame;
			let currentInputFB = ctx.fb.scene;
			
			for (let i = 0; i < iterations; i++) {
			
			let outputFB, outputTex;
			if (i === iterations - 1) {
				
				outputFB = ctx.mode === "bloom" ? ctx.fb.final : null; 
			} else {
				
				if (i % 2 === 0) {
					outputFB = ctx.fb.intermediate1;
					outputTex = ctx.tex.frameIntermediate1;
				} else {
					outputFB = ctx.fb.intermediate2;
					outputTex = ctx.tex.frameIntermediate2;
				}
			}

			
			gl.bindFramebuffer(gl.FRAMEBUFFER, outputFB);
			gl.viewport(0, 0, canvas.width, canvas.height);

			
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, currentInputTex);

			
			gl.uniform1f(ctx.shd.kawase.uniforms.pixelOffset, i);

			
			const finalBrightness = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			const distributedBrightness = Math.pow(finalBrightness, 1.0 / iterations);
			gl.uniform1f(ctx.shd.kawase.uniforms.bloomStrength, distributedBrightness);

			
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

			
			if (i < iterations - 1) {
				if (i % 2 === 0) {
					currentInputTex = ctx.tex.frameIntermediate1;
				} else {
					currentInputTex = ctx.tex.frameIntermediate2;
				}
			}
		}
		}

		if (ctx.mode == "bloom") {
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameIntermediate1); ctx.tex.frameIntermediate1 = null;
		gl.deleteTexture(ctx.tex.frameIntermediate2); ctx.tex.frameIntermediate2 = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.intermediate1); ctx.fb.intermediate1 = null;
		gl.deleteFramebuffer(ctx.fb.intermediate2); ctx.fb.intermediate2 = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	const initialIterations = parseInt(ui.blur.iterations.value);
	ui.blur.samplePos.disabled = initialIterations === 0;
	ui.blur.samplePosReset.disabled = initialIterations === 0;

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

Akin to the Central Limit Theorem making repeated passes of a box-blur approach a gaussian blur, our Kawase blur provides a smooth gaussian-like results, due to the iterative convolution at play. Technically, there are two convolutions happening at the same time - bilinear filtering and the diagonal samples with increasing distance.

Two different origins: The Gaussian blur came from a mathematical concept entering graphics programming. The Kawase blur was born to get the most out of what hardware provides for free.

book

It is not a separable convolution, due to its diagonal sampling nature. As no downsampling is used, this means that we write all pixels out to memory, each pass. Even if we could separate, the cost of writing out twice as many passes to memory would outweigh the benefit of going from 4 samples per-pass to 2.

With so few samples, you cannot increase samplePosMultiplier without instantly getting artifacts. We mess up the sample pattern.

detective

Take note of textures taps: They grow linearly, with increasing blur radius. In DOUBLE-S.T.E.A.L, Masaki Kawase used it to create the bloom effect, calculated at a lower resolution. But there is one more evolution coming up - We have blur, we have downsampling. Two separate concepts. What if we “fused” them?

Dual Kawase Blur #

Marius Bjørge, principal graphics architect at ARM took this thought to its logical conclusion, when he was optimizing mobile graphics rendering with ARM graphics chips. In a SIGGRAPH 2015 talk he presented an algorithm, that would later become knows as the ✨ Dual Kawase Blur 🌟, this article’s final destination.

Dual Kawase sampling patterns
Dual Kawase sampling patterns
Excerpt from Bandwidth-Efficient Rendering, talk by Marius Bjørge

This blur works with the idea of Masaki Kawase’s “placing diagonal samples at increasing distance”, but does so in conjunction with downsampling, which effectively performs this “increase in distance”. There is also a dedicated upsample filter. I’ll let Marius Bjørge explain this one, an excerpt from his talk mentioned above

Marius Bjørge: For lack for a better name, dual filter is something I come up with when playing with different downsampling and upsampling patterns. It's sort of a derivative of the Kawase filter, but instead of ping-ponging between two equally sized textures, this filter works by having the same filter for down sampling and having another filter for upsampling.

The downsample filter works by sampling four pixels covering the target pixel, and you also have four pixels on the corner of this pixel to smudge in some information from all the neighboring pixels. So the end upsample filter works by reconstructing information from the downsample pass. So this pattern was chosen to get a nice smooth circular shape.

Let’s try it. This time, there are two blur shaders, as there is an upsample and downsample stage. Again, there is no kernelSize. Instead there are downsampleLevels, which performs the blur in conjunction with the down sampling. Play around with all the slider and get a feel for it.

L The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again.

WebGL Fragment Shader dual-kawase-down.fs

precision highp float;

varying vec2 uv;

uniform vec2 frameSizeRCP; 
uniform float offset; 
uniform float bloomStrength; 

uniform sampler2D texture;

void main() {
	
	vec2 halfpixel = frameSizeRCP * 0.5;
	vec2 o = halfpixel * offset;
	
	
	vec4 color = texture2D(texture, uv) * 4.0;
	
	
	color += texture2D(texture, uv + vec2(-o.x, -o.y)); 
	color += texture2D(texture, uv + vec2( o.x, -o.y)); 
	color += texture2D(texture, uv + vec2(-o.x,  o.y)); 
	color += texture2D(texture, uv + vec2( o.x,  o.y)); 
	
	
	gl_FragColor = (color / 8.0) * bloomStrength;
}
WebGL Fragment Shader dual-kawase-up.fs

precision highp float;

varying vec2 uv;

uniform vec2 frameSizeRCP; 
uniform float offset; 
uniform float bloomStrength; 

uniform sampler2D texture;

void main() {
	
	vec2 halfpixel = frameSizeRCP * 0.5;
	vec2 o = halfpixel * offset;
	
	vec4 color = vec4(0.0);
	
	
	color += texture2D(texture, uv + vec2(-o.x * 2.0, 0.0)); 
	color += texture2D(texture, uv + vec2( o.x * 2.0, 0.0)); 
	color += texture2D(texture, uv + vec2(0.0, -o.y * 2.0)); 
	color += texture2D(texture, uv + vec2(0.0,  o.y * 2.0)); 
	
	
	color += texture2D(texture, uv + vec2(-o.x,  o.y)) * 2.0; 
	color += texture2D(texture, uv + vec2( o.x,  o.y)) * 2.0; 
	color += texture2D(texture, uv + vec2(-o.x, -o.y)) * 2.0; 
	color += texture2D(texture, uv + vec2( o.x, -o.y)) * 2.0; 
	
	
	gl_FragColor = (color / 12.0) * bloomStrength;
}
WebGL Javascript dual-kawase.js
import * as util from '../utility.js'

export async function setupDualKawaseBlur() {
	
	const WebGLBox = document.getElementById('WebGLBox-DualKawaseBlur');
	const canvas = WebGLBox.querySelector('canvas');

	
	const radius = 0.12;

	
	const gl = canvas.getContext('webgl', {
		preserveDrawingBuffer: false,
		antialias: false,
		alpha: false,
	});

	
	const ctx = {
		
		mode: "scene",
		flags: { isRendering: false, buffersInitialized: false, initComplete: false, benchMode: false },
		
		tex: { sdr: null, selfIllum: null, frame: null, frameFinal: null, down: [] },
		
		fb: { scene: null, final: null, down: [] },
		
		shd: {
			scene: { handle: null, uniforms: { offset: null, radius: null } },
			passthrough: { handle: null },
			downsample: { handle: null, uniforms: { frameSizeRCP: null, offset: null, bloomStrength: null } },
			upsample: { handle: null, uniforms: { frameSizeRCP: null, offset: null, bloomStrength: null } },
			bloom: { handle: null, uniforms: { offset: null, radius: null, texture: null, textureAdd: null } }
		}
	};

	
	const ui = {
		display: {
			spinner: canvas.parentElement.querySelector('svg', canvas.parentElement),
			contextLoss: canvas.parentElement.querySelector('div', canvas.parentElement),
			fps: WebGLBox.querySelector('#fps'),
			ms: WebGLBox.querySelector('#ms'),
			width: WebGLBox.querySelector('#width'),
			height: WebGLBox.querySelector('#height'),
			tapsCount: WebGLBox.querySelector('#taps'),
		},
		blur: {
			downsample: WebGLBox.querySelector('#downsampleRange'),
			samplePos: WebGLBox.querySelector('#samplePosRange'),
			samplePosReset: WebGLBox.querySelector('#samplePosRangeReset'),
		},
		rendering: {
			animate: WebGLBox.querySelector('#animateCheck'),
			modes: WebGLBox.querySelectorAll('input[type="radio"]'),
			lightBrightness: WebGLBox.querySelector('#lightBrightness'),
			lightBrightnessReset: WebGLBox.querySelector('#lightBrightnessReset'),
		},
		benchmark: {
			button: WebGLBox.querySelector('#benchmark'),
			label: WebGLBox.querySelector('#benchmarkLabel'),
			iterOut: WebGLBox.querySelector('#iterOut'),
			renderer: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#renderer'),
			downsampleLevels: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#downsampleLevels'),
			iterTime: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#iterTime'),
			tapsCount: document.getElementById('WebGLBox-DualKawaseBlurDetail').querySelector('#tapsCountBench'),
			iterations: WebGLBox.querySelector('#iterations')
		}
	};

	
	const circleAnimation = await util.fetchShader("shader/circleAnimation.vs");
	const simpleTexture = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/simpleTexture.fs");
	const bloomVert = await util.fetchShader("shader/bloom.vs");
	const bloomFrag = await util.fetchShader("shader/bloom.fs");
	const simpleQuad = await util.fetchShader("shader/simpleQuad.vs");
	const dualKawaseDown = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/dual-kawase-down.fs");
	const dualKawaseUp = await util.fetchShader("https://blog.frost.kiwi/dual-kawase/shader/dual-kawase-up.fs");

	
	ui.blur.downsample.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.blur.samplePos.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });
	ui.rendering.lightBrightness.addEventListener('input', () => { if (!ui.rendering.animate.checked) redraw() });

	
	ui.rendering.animate.addEventListener("change", () => {
		if (ui.rendering.animate.checked)
			startRendering();
		else {
			ui.display.fps.value = "-";
			ui.display.ms.value = "-";
			ctx.flags.isRendering = false;
			redraw()
		}
	});

	canvas.addEventListener("webglcontextlost", () => {
		ui.display.contextLoss.style.display = "block";
	});

	ui.blur.downsample.addEventListener('input', () => {
		ui.blur.samplePos.disabled = ui.blur.downsample.value == 0;
		ui.blur.samplePosReset.disabled = ui.blur.downsample.value == 0;
	});

	
	ui.rendering.modes.forEach(radio => {
		
		if (radio.value === "scene")
			radio.checked = true;
		radio.addEventListener('change', (event) => {
			ctx.mode = event.target.value;
			ui.rendering.lightBrightness.disabled = ctx.mode === "scene";
			ui.rendering.lightBrightnessReset.disabled = ctx.mode === "scene";
			if (!ui.rendering.animate.checked) redraw();
		});
	});

	ui.benchmark.button.addEventListener("click", () => {
		ctx.flags.benchMode = true;
		stopRendering();
		ui.display.spinner.style.display = "block";
		ui.benchmark.button.disabled = true;

		
		const worker = new Worker("./js/benchmark/dualKawaseBenchmark.js", { type: "module" });

		
		worker.postMessage({
			iterations: ui.benchmark.iterOut.value,
			downShaderSrc: dualKawaseDown,
			upShaderSrc: dualKawaseUp,
			downsampleLevels: ui.blur.downsample.value,
			samplePos: ui.blur.samplePos.value
		});

		
		worker.addEventListener("message", (event) => {
			if (event.data.type !== "done") return;

			ui.benchmark.label.textContent = event.data.benchText;
			ui.benchmark.tapsCount.textContent = event.data.tapsCount;
			ui.benchmark.iterTime.textContent = event.data.iterationText;
			ui.benchmark.renderer.textContent = event.data.renderer;
			ui.benchmark.downsampleLevels.textContent = event.data.downsampleLevels;

			worker.terminate();
			ui.benchmark.button.disabled = false;
			ctx.flags.benchMode = false;
			if (ui.rendering.animate.checked)
				startRendering();
			else
				redraw();
		});
	});

	ui.benchmark.iterations.addEventListener("change", (event) => {
		ui.benchmark.iterOut.value = event.target.value;
		ui.benchmark.label.textContent = "Benchmark";
	});

	
	ctx.shd.scene = util.compileAndLinkShader(gl, circleAnimation, simpleTexture, ["offset", "radius"]);

	
	ctx.shd.bloom = util.compileAndLinkShader(gl, bloomVert, bloomFrag, ["texture", "textureAdd", "offset", "radius"]);

	
	ctx.shd.passthrough = util.compileAndLinkShader(gl, simpleQuad, simpleTexture);

	
	ctx.shd.downsample = util.compileAndLinkShader(gl, simpleQuad, dualKawaseDown, ["frameSizeRCP", "offset", "bloomStrength"]);
	ctx.shd.upsample = util.compileAndLinkShader(gl, simpleQuad, dualKawaseUp, ["frameSizeRCP", "offset", "bloomStrength"]);

	
	util.bindUnitQuad(gl);

	async function setupTextureBuffers() {
		ui.display.spinner.style.display = "block";
		ctx.flags.buffersInitialized = true;
		ctx.flags.initComplete = false;

		gl.deleteFramebuffer(ctx.fb.scene);
		gl.deleteFramebuffer(ctx.fb.final);
		[ctx.fb.scene, ctx.tex.frame] = util.setupFramebuffer(gl, canvas.width, canvas.height);
		[ctx.fb.final, ctx.tex.frameFinal] = util.setupFramebuffer(gl, canvas.width, canvas.height);

		const maxDown = parseInt(ui.blur.downsample.max);
		for (let i = 0; i < maxDown; ++i) {
			gl.deleteFramebuffer(ctx.fb.down[i]);
			gl.deleteTexture(ctx.tex.down[i]);
		}
		ctx.fb.down = [];
		ctx.tex.down = [];

		let w = canvas.width, h = canvas.height;
		for (let i = 0; i < maxDown; ++i) {
			w = Math.max(1, w >> 1);
			h = Math.max(1, h >> 1);
			const [fb, tex] = util.setupFramebuffer(gl, w, h);
			ctx.fb.down.push(fb);
			ctx.tex.down.push(tex);
		}

		let [base, selfIllum] = await Promise.all([
			fetch("/dual-kawase/img/SDR_No_Sprite.png"),
			fetch("/dual-kawase/img/Selfillumination.png")
		]);
		let [baseBlob, selfIllumBlob] = await Promise.all([base.blob(), selfIllum.blob()]);
		let [baseBitmap, selfIllumBitmap] = await Promise.all([
			createImageBitmap(baseBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" }),
			createImageBitmap(selfIllumBlob, { colorSpaceConversion: 'none', resizeWidth: canvas.width * 1.12, resizeHeight: canvas.height * 1.12, resizeQuality: "high" })
		]);

		ctx.tex.sdr = util.setupTexture(gl, null, null, ctx.tex.sdr, gl.LINEAR, baseBitmap);
		ctx.tex.selfIllum = util.setupTexture(gl, null, null, ctx.tex.selfIllum, gl.LINEAR, selfIllumBitmap);

		baseBitmap.close();
		selfIllumBitmap.close();

		ctx.flags.initComplete = true;
		ui.display.spinner.style.display = "none";
	}

	let prevNow = performance.now();
	let lastStatsUpdate = prevNow;
	let fpsEMA = 60;
	let msEMA = 16;

	async function redraw() {
		if (!ctx.flags.buffersInitialized)
			await setupTextureBuffers();
		if (!ctx.flags.initComplete)
			return;

		
		const levels = parseInt(ui.blur.downsample.value);
		
		let totalTaps = 0;
		for (let i = 0; i < levels; i++) {
			const levelW = Math.max(1, canvas.width >> (i + 1));
			const levelH = Math.max(1, canvas.height >> (i + 1));
			totalTaps += levelW * levelH * 5; 
			if (i < levels - 1) totalTaps += levelW * levelH * 8; 
		}
		if (levels > 0) totalTaps += canvas.width * canvas.height * 8; 
		const tapsNewText = (totalTaps / 1000000).toFixed(1) + " Million";
		ui.display.tapsCount.value = tapsNewText;
		
		ui.display.width.value = canvas.width;
		ui.display.height.value = canvas.height;

		
		let radiusSwitch = ui.rendering.animate.checked ? radius : 0.0;
		let speed = (performance.now() / 10000) % Math.PI * 2;
		const offset = [radiusSwitch * Math.cos(speed), radiusSwitch * Math.sin(speed)];
		gl.useProgram(ctx.shd.scene.handle);
		const texture = ctx.mode == "scene" ? ctx.tex.sdr : ctx.tex.selfIllum;
		gl.activeTexture(gl.TEXTURE0);
		gl.bindTexture(gl.TEXTURE_2D, texture);
		gl.uniform2fv(ctx.shd.scene.uniforms.offset, offset);
		gl.uniform1f(ctx.shd.scene.uniforms.radius, radiusSwitch);

		
		gl.bindFramebuffer(gl.FRAMEBUFFER, ctx.fb.scene);
		gl.viewport(0, 0, canvas.width, canvas.height);

		
		gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);

		const downsampleLevels = parseInt(ui.blur.downsample.value);
		let srcTex = ctx.tex.frame;

		if (downsampleLevels > 0) {
			
			const finalBrightness = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			const totalPasses = 2 * downsampleLevels;
			const distributedBrightness = Math.pow(finalBrightness, 1.0 / totalPasses);
			
			
			gl.useProgram(ctx.shd.downsample.handle);
			gl.uniform1f(ctx.shd.downsample.uniforms.offset, ui.blur.samplePos.value);
			
			let w = canvas.width, h = canvas.height;
			for (let i = 0; i < downsampleLevels; ++i) {
				const fb = ctx.fb.down[i];
				w = Math.max(1, w >> 1);
				h = Math.max(1, h >> 1);

				gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
				gl.viewport(0, 0, w, h);

				const frameSizeRCP = [1.0 / w, 1.0 / h];
				gl.uniform2fv(ctx.shd.downsample.uniforms.frameSizeRCP, frameSizeRCP);
				gl.uniform1f(ctx.shd.downsample.uniforms.bloomStrength, distributedBrightness);

				gl.activeTexture(gl.TEXTURE0);
				gl.bindTexture(gl.TEXTURE_2D, srcTex);
				gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
				srcTex = ctx.tex.down[i];
			}

			
			gl.useProgram(ctx.shd.upsample.handle);
			gl.uniform1f(ctx.shd.upsample.uniforms.offset, ui.blur.samplePos.value);
			
			for (let i = downsampleLevels - 2; i >= 0; i--) {
				const fb = ctx.fb.down[i];
				w = Math.max(1, canvas.width >> (i + 1));
				h = Math.max(1, canvas.height >> (i + 1));

				gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
				gl.viewport(0, 0, w, h);

				const srcW = Math.max(1, canvas.width >> (i + 2));
				const srcH = Math.max(1, canvas.height >> (i + 2));
				const frameSizeRCP = [1.0 / srcW, 1.0 / srcH];
				gl.uniform2fv(ctx.shd.upsample.uniforms.frameSizeRCP, frameSizeRCP);
				gl.uniform1f(ctx.shd.upsample.uniforms.bloomStrength, distributedBrightness);

				gl.activeTexture(gl.TEXTURE0);
				gl.bindTexture(gl.TEXTURE_2D, srcTex);
				gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
				srcTex = ctx.tex.down[i];
			}

			
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);

			const srcW = Math.max(1, canvas.width >> 1);
			const srcH = Math.max(1, canvas.height >> 1);
			const frameSizeRCP = [1.0 / srcW, 1.0 / srcH];
			gl.uniform2fv(ctx.shd.upsample.uniforms.frameSizeRCP, frameSizeRCP);
			gl.uniform1f(ctx.shd.upsample.uniforms.bloomStrength, distributedBrightness);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, srcTex);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
			
			if (ctx.mode != "bloom") {
				srcTex = ctx.tex.frameFinal;
			}
		} else {
			
			const finalFB = ctx.mode == "bloom" ? ctx.fb.final : null;
			gl.bindFramebuffer(gl.FRAMEBUFFER, finalFB);
			gl.viewport(0, 0, canvas.width, canvas.height);
			gl.useProgram(ctx.shd.passthrough.handle);
			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, srcTex);
			gl.uniform1i(gl.getUniformLocation(ctx.shd.passthrough.handle, "texture"), 0);
			const bloomStrength = ctx.mode == "scene" ? 1.0 : ui.rendering.lightBrightness.value;
			gl.uniform1f(gl.getUniformLocation(ctx.shd.passthrough.handle, "bloomStrength"), bloomStrength);
			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
			
			if (ctx.mode != "bloom") {
				srcTex = ctx.tex.frameFinal;
			}
		}

		if (ctx.mode == "bloom") {
			
			gl.bindFramebuffer(gl.FRAMEBUFFER, null);
			gl.useProgram(ctx.shd.bloom.handle);

			gl.uniform2fv(ctx.shd.bloom.uniforms.offset, offset);
			gl.uniform1f(ctx.shd.bloom.uniforms.radius, radiusSwitch);

			gl.activeTexture(gl.TEXTURE0);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.sdr);
			gl.uniform1i(ctx.shd.bloom.uniforms.texture, 0);

			gl.activeTexture(gl.TEXTURE1);
			gl.bindTexture(gl.TEXTURE_2D, ctx.tex.frameFinal);
			gl.uniform1i(ctx.shd.bloom.uniforms.textureAdd, 1);

			gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
		}

		
		gl.finish();

		const now = performance.now();
		let dt = now - prevNow;

		if (dt > 0) {
			const instFPS = 1000 / dt;
			const ALPHA = 0.05;
			fpsEMA = ALPHA * instFPS + (1 - ALPHA) * fpsEMA;
			msEMA = ALPHA * dt + (1 - ALPHA) * msEMA;
		}
		prevNow = now;

		if (ui.rendering.animate.checked && now - lastStatsUpdate >= 1000) {
			ui.display.fps.value = fpsEMA.toFixed(0);
			ui.display.ms.value = msEMA.toFixed(2);
			lastStatsUpdate = now;
		}
	}

	let animationFrameId;

	
	function nativeResize() {
		const [width, height] = util.getNativeSize(canvas);

		if (width && canvas.width !== width || height && canvas.height !== height) {
			canvas.width = width;
			canvas.height = height;

			if (!ctx.flags.benchMode) {
				stopRendering();
				startRendering();
			}
			if (!ui.rendering.animate.checked)
				redraw();
		}
	}

	
	nativeResize();

	let resizePending = false;
	window.addEventListener('resize', () => {
		if (!resizePending) {
			resizePending = true;
			requestAnimationFrame(() => {
				resizePending = false;
				nativeResize();
			});
		}
	});

	function renderLoop() {
		if (ctx.flags.isRendering && ui.rendering.animate.checked) {
			redraw();
			animationFrameId = requestAnimationFrame(renderLoop);
		}
	}

	function startRendering() {
		
		ctx.flags.isRendering = true;
		renderLoop();
	}

	function stopRendering() {
		
		ctx.flags.isRendering = false;
		cancelAnimationFrame(animationFrameId);
		
		gl.finish();

		
		gl.deleteTexture(ctx.tex.sdr); ctx.tex.sdr = null;
		gl.deleteTexture(ctx.tex.selfIllum); ctx.tex.selfIllum = null;
		gl.deleteTexture(ctx.tex.frame); ctx.tex.frame = null;
		gl.deleteTexture(ctx.tex.frameFinal); ctx.tex.frameFinal = null;
		gl.deleteFramebuffer(ctx.fb.scene); ctx.fb.scene = null;
		gl.deleteFramebuffer(ctx.fb.final); ctx.fb.final = null;
		for (let i = 0; i < parseInt(ui.blur.downsample.max); ++i) {
			gl.deleteTexture(ctx.tex.down[i]);
			gl.deleteFramebuffer(ctx.fb.down[i]);
		}
		ctx.tex.down = [];
		ctx.fb.down = [];
		ctx.flags.buffersInitialized = false;
		ctx.flags.initComplete = false;
		ui.display.fps.value = "-";
		ui.display.ms.value = "-";
	}

	function handleIntersection(entries) {
		entries.forEach(entry => {
			if (entry.isIntersecting) {
				if (!ctx.flags.isRendering && !ctx.flags.benchMode) startRendering();
			} else {
				stopRendering();
			}
		});
	}

	
	let observer = new IntersectionObserver(handleIntersection);
	observer.observe(canvas);
}

It’s also a gaussian-like blur. Remember our first gaussian Blur? Its performance tanked exponentially, as we increased kernel radius. But now, with each downsample step, the required texture taps grow slower and slower. The stronger our blur, the less additional samples we require!

This was of special interest to Marius Bjørge, as his goal was to reduce memory access, which is especially slow on mobile devices, and still produce a motion-stable non shimmering blur. Speaking of which, go into bloom mode, crank lightBrightness and compare it to our downsample example.

Even though the resolution is reduced to the same downSample level, no shimmering! That’s the Dual Kawase Blur for you - A gaussian-like post-processing blur, with good performance, no heavy repeated memory writes and motion stable output. This makes it ideal as a basic building block for visual effects like bloom.

What are the big boys doing? #

The Dual Kawase Blur has found its way into game engines and user interfaces alike. For instance the Linux Desktop Environment KDE uses it as the frosted backdrop effect since in 2018, where it remains the algorithm of choice to this day. I used KDE’s implementation as a guide when creating my demo above.

KDE Plasma's Blur with noise at max strength
KDE Plasma's Blur with noise at max strength (Source)

Of course, graphics programming didn’t stop in 2015 and there have been new developments. The previously mentioned talk Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez showcased an evolution on the “downsample while blurring” idea to handle far-away and very bright lights at high blur strengths better.

Uneven interpolation of bright, small light sources (Left), Page 156 from presentation
Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez

In turn, this technique got picked up two years later by graphics programmer Mikkel Gjoel, when working on the video game INSIDE by Studio Playdead. In the GDC 2016 talk Low Complexity, High Fidelity - INSIDE Rendering he showcased a further optimization, reducing the number of texture reads required.

Blur algorithm used for Bloom in video game Inside
Excerpt from talk Low Complexity, High Fidelity - INSIDE Rendering by Mikkel Gjoel & Mikkel Svendsen

I showcased the bloom use-case a lot. The technique used in my demos is rather primitive, akin to the time of video game bloom disasters, where some many games had radioactive levels of bloom, showing off a then novel technique. In this older style an extra lights pass or the scene after thresholding, was blurred and added on top.

Bloom in Video game
Bloom in Video game The Elder Scrolls IV: Oblivion, from article by Bloom Disasters

These days 3D engines follow a Physically based shading model, with HDR framebuffers capturing pixels in an energy conserving manner. Specular light reflections preserve the super bright pixels from the lamp they originated from.

With such a wide range of energy values, light that should bloom doesn’t need special selection anymore. Instead of defining what to blur, everything is blurred and the bright parts naturally start glowing, without predefined “parts to blur”.

Physically Based Blur
Multiple blurs stacked to create a natural light fall-off
Page 144 in Next Generation Post Processing in Call of Duty: Advanced Warfare by Jorge Jimenez

The result isn’t just blurred once, but rather multiple blur strengths are stacked on top of each other, for a more natural light fall-off, as shown in the previously mentioned talk by Jorge Jimenez. This isn’t an article about bloom, but the underlying building block. The Blur.

This was a journey through blurs and I hope you enjoyed the ride! If you are a new visitor from the Summer of Math Exposition and enjoyed this article, you’ll enjoy my other graphics programming deep-dives on this blog. Also during SoME 3, my submission was a WebApp + Video Adventure into Mirrorballs:

Mathematical Magic Mirrorball #SoME3
YouTube Video by FrostKiwi

Addendum #

Additional things that came to light as a result of discussions around this article.

Upsample skip steps technique #

In the downsampling chapter I mentioned, that skipping upsample steps will result in “a vague grid like artifact appearing”. In an E-Mail, Masaki Kawase expanded on this with a reference to his 2009 CEDEC talk Anti-Downsized Buffer Artifacts, that there is an in-between path, when the Downscale - Upsample chain is a bit longer.

2Times2009 2 2Times2009 1
Skipping Upsample steps and the resulting artifacts (Left) vs performing one intermediary upsample step with 4-Tap Blur, before going on to skipping the remaining intermediary up-sample steps with reduced artifacts (Right)
Page 99 - 100 from the 2009 CEDEC talk
Anti-Downsized Buffer Artifacts by Masaki Kawase

This involves performing a slight 4 Texture-Tap blur on the very first upsample from the smallest to 2nd smallest framebuffer size and then skipping all the remaining upscample steps, a technique explained in the above linked talk from page 72 onwards. A balance of a longer upsample chain vs the appearance of artifacts.

Multiple blurs stacked #

I was surprised to learn that the “Multiple blurs stacked to create a natural light fall-off” thing was also presented by Masaki Kawase in the 2004 GDC talk Practical Implementation of High Dynamic Range Rendering. Those couple of years in particular were quite eventful for graphics programming!

Use of Multiple Gaussian Filters, Excerpt from the 2004 GDC Talk
Practical Implementation of High Dynamic Range Rendering (Video Direct Link) by Masaki Kawase

Didn't even know about this connection, truly a graphics programmer idol of mine

love

Here are the 3 slides mentioned in the excerpt from the talk:

Page 31 - 33 from the 2004 GDC Talk
Practical Implementation of High Dynamic Range Rendering by Masaki Kawase

[ad_2]

Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment