The project (icui.github.io/seiswg) I’ll talk about here started as an attempt to answer a simple question: can we build a serious finite difference seismic solver that is portable across GPUs, and even runnable in a browser, without maintaining multiple codebases?
Building a seismic solver that runs everywhere
Seismic wave simulation sits at an uncomfortable intersection of scientific ambition and hardware reality. On one hand, finite difference solvers are conceptually simple and great for prototyping, benchmarking, and teaching. On the other, once you care about scale, performance, or inversion workflows, you quickly become dependent on GPU acceleration—and that usually means CUDA, and that usually means NVIDIA.
That tradeoff was fine when the audience was limited to a specific cluster. It became much harder to justify when people wanted to run the same models on Apple Silicon laptops, shared institutional resources, or even inside a browser for teaching and experimentation. That tension is what pushed me to rethink how we structure GPU-based seismic solvers in the first place.
From Python and CUDA to something more portable
The starting point was Seispie, a finite difference seismic solver written in Python with GPU acceleration via numba-cuda. It supports forward simulation, adjoint runs, and full waveform inversion. Functionally, it overlaps with packages like SPECFEM++, but with a more straightforward and less mathematically heavy implementation.
That simplicity made Seispie useful in several contexts:
- benchmarking new physical models (like Cosserat media),
- prototyping new ideas quickly,
- and teaching wave propagation without overwhelming students with framework complexity.
But there was a hard limitation: numba-cuda ties you to NVIDIA GPUs. If your hardware doesn’t support CUDA, the code simply doesn’t run.
I looked at other portability options that are common in HPC—Kokkos and SYCL were both on the table—but neither quite fit what I wanted. What surprised me was that the most compelling alternative didn’t come from the traditional HPC ecosystem at all: it came from the web.
Why WebGPU—and why Rust?
WebGPU is a relatively new standard that provides low-level access to native GPU APIs like Vulkan, Metal, and DirectX 12 through a unified interface. Despite the name, it’s not limited to graphics or browsers. It explicitly supports general-purpose GPU compute.
Three properties made it attractive for this project:
- it runs on most modern GPUs, including Apple Silicon and mobile hardware,
- it doesn’t require users to install vendor-specific SDKs,
- and it is designed to work both in browsers and in native applications.
WebGPU kernels are written in WGSL (WebGPU Shading Language), while host code can be written in several languages. Rust stood out—not only because I wanted to start a serious Rust project, but because Rust can target both native binaries and WebAssembly (WASM) with the same codebase.
That led to a design goal that shaped everything else: the same solver should run as a standalone executable on a cluster and as a static webpage in a browser.
How the architecture changed
Originally, the workflow looked like this: Python orchestrates everything, finite difference kernels are written in numba-cuda, and CUDA directly talks to the GPU.

In the new design, there are two closely related workflows.
For a standalone executable, Rust handles the main workflow, WGSL implements the finite difference kernels, and WebGPU implementation (wgpu) maps those kernels to Vulkan, Metal, or DX12 depending on the platform.

For a static webpage, the picture expands. Rust still owns the workflow logic, but it compiles to WebAssembly. JavaScript provides the Web APIs. WebGPU implementation (browser engine) handles GPU execution, and browser storage mechanisms (like IndexedDB) stand in for a traditional filesystem.

One important discovery here was how much the WebGPU implementations like wgpu and browser engines do for you. They automatically map WebGPU calls to the correct native backend on each platform. That abstraction is solid and largely invisible once things are set up.
What they do not handle are the browser-specific pieces: fetching files, persisting data, or coordinating with JavaScript. Those still require explicit glue code, and file I/O in particular opens up questions about newer browser features like OPFS that could merit a project of their own.
Writing GPU kernels in WGSL
A big unknown going in was whether WGSL would be usable for real scientific compute kernels. To test that, I asked Claude to translate existing finite difference code directly.
In numba-cuda, a backward finite difference operator and a stress divergence kernel are compact and expressive, with Python syntax and implicit device context.
// Backward differences (centred on k):
// ∂f/∂x ≈ [9(f_i - f_{i-1}) - (f_{i+1} - f_{i-2})] / (8 dx)
@cuda.jit(device=True)
def diff_x(v, i, k, dx, nx, nz):
if i >= 2 and i < nx - 2:
return 9 * (v[k] - v[k-nz]) / (8 * dx) - (v[k+nz] - v[k-2*nz]) / (24 * dx)
else:
return 0
/// ∇·σ^SH → DSY (divergence of SH stress)
@cuda.jit
def div_sy(dsy, sxy, szy, dx, dz, nx, nz):
k, i, j = idxij(nz)
if k < dsy.size:
dsy[k] = diff_x(sxy, i, k, dx, nx, nz) + diff_z(szy, j, k, dz, nx, nz)
In WGSL, the same math is there, but everything is more explicit: types, buffer access, indexing logic, and execution parameters all have to be spelled out. The result is undeniably more verbose, but also very transparent. Once written, the WGSL kernels compiled and ran consistently across platforms. And the core GPU kernels are not too different from the CUDA kernels.
// Backward differences (centred on k):
// ∂f/∂x ≈ [9(f_i - f_{i-1}) - (f_{i+1} - f_{i-2})] / (8 dx)
fn diff_x(fid: u32, i: u32, k: u32) -> f32 {
if i < 2u || i >= params.nx - 2u { return 0.0; }
let nz = params.nz;
let dx = params.dx;
return ( 9.0 * (gf(fid, k) - gf(fid, k - nz))
- (gf(fid, k + nz) - gf(fid, k - 2u * nz)) ) / (8.0 * dx);
}
/// ∇·σ^SH → DSY (divergence of SH stress)
@compute @workgroup_size(64)
fn div_sy(@builtin(global_invocation_id) gid: vec3) {
let k = gid.x;
if k >= params.npt { return; }
let ij = ij_from(k);
sf(F_DSY, k, diff_x(F_SXY, ij.x, k) + diff_z(F_SZY, ij.y, k));
}
Calling those kernels from Rust via wgpu involves a clear but structured sequence: allocate GPU buffers, bind them to the shader, create a compute pipeline, dispatch workgroups, and submit commands to the queue. The ceremony is heavier than in CUDA, but it’s predictable—and crucially, it’s the same whether you’re running in a browser or on a cluster.
What I learned
By the end of this experiment, a few things were clear to me.
WebGPU is real. It’s not just a graphics API in disguise; it’s a broadly supported, forward-looking GPU compute standard.
Rust pairs well with it. Having one host language and one kernel language span native executables and browser deployments is surprisingly powerful.
There are still rough edges. Web APIs and browser file systems don’t disappear just because your compute core is portable. And WGSL demands more boilerplate than CUDA or numba.
But overall, this felt like a glimpse of a different future for research software—one where portability isn’t an afterthought, and where the same scientific code can live in a phone, a cluster job, and a webpage without being rewritten three times.
Acknowledgements
The initial draft of this blog post was generated by Microsoft Copilot by Abhishek Biswas and Tai Sakuma.