Making 36 GB rasters feel instant: my path to Cloud Optimized GeoTIFFs

As an RSE at Princeton, I’m often asked to help researchers move from “I have a huge GeoTIFF on disk” to “I can explore it interactively in a browser without waiting minutes for every pan and zoom.” The stakes are real: when a climate model output or satellite‑derived surface is 30–40 GB, a lab’s ability to interrogate it quickly can make or break an analysis meeting or a paper deadline. That’s where Cloud Optimized GeoTIFFs (COGs) shine.

A regular GeoTIFF is already more than an image—it embeds spatial metadata like the coordinate reference system (CRS), the projection transform, the geographic bounds, and the ground resolution, so you can lay it precisely on a map. But a standard GeoTIFF tends to be monolithic: if you want one view, most tools read a lot of bytes. A COG reorganizes the same data into small tiles (commonly 256×256 pixels), precomputes lower‑resolution overview layers in a power‑of‑two pyramid, and places the metadata at the front of the file. The result: clients can jump to exactly the tiles they need at exactly the right zoom level, and do so with simple HTTP Range requests.

From a 36 GB TIFF to a snappy COG

My starting point was a 36 GB GeoTIFF at full resolution. I tried two build paths.

  • With GDAL (the C++ geospatial workhorse), the core step was:
    gdal_translate -of COG -co OVERVIEW_RESAMPLING=BILINEAR \
      -co COMPRESS=DEFLATE -co BIGTIFF=YES \
      INPUT.tif OUTPUT.cog.tif

    On this dataset, that took about one hour. GDAL is full‑featured but typically comes via a larger conda install (~2 GB).

  • With Rasterio (a Pythonic wrapper around GDAL components) and rio‑cogeo, the equivalent was:
    rio cogeo create INPUT.tif OUTPUT.cog.tif \
      --overview-level 5 --overview-resampling bilinear \
      --cog-profile DEFLATE

    This finished in about 30 minutes, and the output shrank from 36 GB to 23 GB thanks to DEFLATE compression and overviews. Rasterio itself is a much smaller install (~25 MB), which made it convenient for quick experimentation.

Those numbers aren’t theoretical—they reflect the practical wall‑clock I saw, and they highlight the choices you’ll face: compression & predictor, number of overview levels, resampling method (I used bilinear for continuous data), and projection strategy. Each knob nudges file size, visual crispness, and read performance.

The projection fork in the road

Most web maps—Leaflet, MapLibre, and friends—speak “Mercantile” tiles, which assume EPSG:3857 (Web Mercator). You can meet them halfway in two ways:

  1. Pre‑project the raster to EPSG:3857 before (or while) building the COG. With GDAL:
    gdalwarp -t_srs EPSG:3857 -r bilinear INPUT.tif PROJECTED.tif
    # then create the COG as above; I also used -co PREDICTOR=3

    Reprojecting the 36 GB source took ~4–5 hours on our setup. The upside: the viewer can fetch tiles that are already in Web Mercator. The caveat: any resampling to 3857 changes the finest‑resolution cell values, which matters for some quantitative uses.

  2. Generate a “web‑optimized” COG with Rasterio that bakes in pyramid layers aligned for the web:
    rio cogeo create INPUT.tif OUTPUT_webopt.cog.tif \
      --web-optimized --overview-resampling bilinear \
      --cog-profile DEFLATE

    On my dataset, this took ~5–6 hours, produced a ~23 GB COG with 10 overview layers, and delivered snappy map interactions. It still involves resampling, so the same scientific caution applies at the finest scale.

Either way, the projection decision is a trade‑off: precompute for speed and convenience, or keep the source native and project on demand when precision is paramount. I often prefer dynamic reprojection for analysis workflows and precomputed 3857 for public‑facing maps.

Serving and viewing the data

Once you have a COG, the rest is delightfully boring—assuming your server supports HTTP Range requests so clients can fetch only the relevant byte ranges. Here’s the shape of a byte‑range GET in Python:

import requests
headers = {"Range": "bytes=0-65536"}
r = requests.get(url, headers=headers)

That single header is the secret sauce that makes remote, partial reads efficient. Tools like QGIS can add a raster layer from a URL to a server that honors Range; I’ve also used Leaflet in a browser, georaster + Leaflet on the client side, and a small rio‑tiler service on the server that translates x/y/z tile requests into on‑the‑fly tile renders from the COG.

For programmatic access—or to build your own tiler—rio‑tiler is a gem:

from rio_tiler.io import COGReader

with COGReader("https://example.org/data/output.cog.tif") as cog:
    img = cog.tile(x, y, z, tilesize=256, indexes=(1,))
    tile = img.data  # NumPy array (256x256)

On my 36 GB COG, this call returned a 256×256 tile in ~0.16 seconds, and it works with either a local path or a URL that supports Range reads. This is the point at which “big raster on disk” turns into “fast, precise tiles wherever your users are.” And yes, QGIS renders it nicely too.

What I learned and what’s next

The biggest “aha” was that layout beats location: simply moving a giant TIFF to the cloud isn’t enough; reorganizing it as a COG—tiling, precomputing overviews, and front‑loading metadata—unlocks performance that end users can feel. I also came to appreciate the scientific implications of reprojection: Web Mercator is great for maps, less great for certain analyses, and the choice to precompute or not should be explicit in any workflow documentation.

If you’re at Princeton and sitting on a big raster, we can help you pick options that fit your use case. For tools, I used GDAL (https://pypi.org/project/GDAL), Rasterio/rio‑cogeo (https://pypi.org/project/rio-cogeo), rio‑tiler (https://pypi.org/project/rio-tiler), QGIS (https://qgis.org), and Leaflet (https://leafletjs.com). If your server already supports Range, you’re halfway there; if not, we can stand up a small tiler to bridge the gap. My next step is packaging these patterns as a reproducible recipe with defaults (compression, overviews, and a tiler skeleton) so teams can go from raw GeoTIFF to web‑friendly COG in one command.

Posted in Uncategorized