Denoising

Denoising Images with Natural Scene Statistics

A walkthrough of how to use natural image statistics in order to denoise images using Bayesian inference.

Authors Tejasvi Kothapalli¹, Bruno Olshausen¹

Affiliation ¹ UC Berkeley Redwood Center for Theoretical Neuroscience

Published Jan 1, 2026

Tags Denoising, Bayesian Inference, Natural Scenes, Fourier Transform, Steerable Pyramids, Langevin Sampling, 1/f², Wiener Filter

Abstract

Natural Images can be denoised using what we know about natural scene statistics. Namely, natural images have a 1/f² power spectrum. Using this information as our prior, we can use bayesian inference in order to denoise an image which has been corrupted by gaussian noise. Using the same framework, we can also fill in an image where 99% of the true pixel values are missing. Finally, we compare this simple bayesian inference approach to modern AI methods for denoising and filling in.

Introduction

This blog will introduce and demonstrate how to use natural image statistics in order to denoise an image. The first part will explore the 1/f² power spectrum of natural images. The second part will discuss using bayesian inference in order to denoise an image. The third part will implement Simoncelli et al.’s “NOISE REMOVAL VIA BAYESIAN WAVELET CORING” in order to do denoising. The fourth part will discuss using gradient descent with langevin sampling in order to denoise an image. The fifth part will demonstrate filling in an image where a large percentage of pixels are missing. The sixth and final part will compare these methods to standard deep learning techniques for denoising and filling in.

What do we know about natural images? (1/f² power spectrum)

Click here to expand Fourier transform explanation Click here to hide Fourier transform explanation

Fourier's theorem was first proposed in 1807 but not published until 1822 due to the skepticism of many leading mathematicians who did not think it could be true. Today, it provides the theoretical foundation for the Fourier transform which is one of the most widely used tools in all of science and engineering, from radio astronomy and signal processing to neuroscience and modern machine learning. For example, it is used for:

analyzing the frequency content of natural signals such as sound waveforms.
detecting rhythmic activity in brain signals - e.g., EEG, LFP and spike train recordings.
even when there is no periodic structure - characterizing the length scale of correlations over time or space, since the power spectrum is the Fourier transform of the auto-correlation function.
identifying the input-output relationship of linear systems.

Here we will experiment with Fourier decompositions of simple 1D waveforms and 2D images, to gain some intuition about what the Fourier transform tells you about a signal, and how it can be used to manipulate images to demonstrate perceptual phenomena.

Fourier's theorem states that any continuous function $f (x)$ can be described as a weighted sum of sine and cosine waveforms at different frequencies:

f (x) = \sum_{ω = 0}^{\infty} a_{s} (ω) \sin (ω x) + a_{c} (ω) \cos (ω x)

where $a_{s} (ω)$ and $a_{c} (ω)$ are the weights of the sine and cosine components, respectively, and $ω$ is the frequency in $2 π$ radians per unit of $x$ (space or time). Alternatively if we define

A (ω) \equiv \sqrt{{a_{c} (ω)}^{2} + {a_{s} (ω)}^{2}}

ϕ (ω) \equiv \tan^{- 1} \frac{a_{c} (ω)}{a_{s} (ω)}

we may rewrite the equation equivalently as

f (x) = \sum_{ω = 0}^{\infty} A (ω) \sin (ω x + ϕ (ω))

where now $f (x)$ is described simply as a sum of cosine waves, each with amplitude $A (ω)$ and phase-shifted by $ϕ (ω)$ . We refer to $A (ω)$ as the amplitude spectrum and $ϕ (ω)$ as the phase spectrum of $f (x)$ .

To gain some intuition for the power and generality of the Fourier theorem, let us first decompose a simple step-function this way. The function $f (x)$ is

f (x) = {\begin{cases} 1 & for x \geq 0 \\ - 1 & for x < 0 \end{cases}

You are given $f (x)$ as a discretely sampled function. In the image below you can see that even though $f (x)$ looks nothing like a sinusoid, we can nevertheless describe it perfectly by adding sinusoids of different frequencies together, as long as they are given the correct amplitudes and phases.

A step function decomposed into a sum of sinusoids of increasing frequency — A step function (black) reconstructed by summing sinusoids (colored) at increasing frequencies.

The direct method of computing the amplitude and phase spectrum of a function $f (x)$ is the Fourier transform! Looking back at the Fourier theorem, we can compute the Fourier coefficients $a_{s} (ω)$ and $a_{c} (ω)$ from the discretely sampled function $\hat{f} [n] = f (n Δ x), n = 0 : N - 1$ , as follows:

a_{s} (ω_{k}) = \frac{2}{N} \sum_{n = 0}^{N - 1} \hat{f} [n] \sin (ω_{k} n) Δ x

a_{c} (ω_{k}) = \frac{2}{N} \sum_{n = 0}^{N - 1} \hat{f} [n] \cos (ω_{k} n) Δ x

where $ω_{k} = 2 π k / N, k = 0 : N / 2$ . Intuitively, we can think of these equations as measuring the similarity between $\hat{f} [n]$ and the sine function, and the similarity between $\hat{f} [n]$ and the cosine function, for each frequency $ω_{k}$ .

Formally, we can show this gives us exactly the Fourier coefficients $a_{s} (ω)$ and $a_{c} (ω)$ in the Fourier theorem by plugging the expression for $f (x)$ into these equations and utilizing the orthogonality of sines and cosines of different frequencies, or sine and cosine of the same frequency.

Once you have computed $a_{s} (ω)$ and $a_{c} (ω)$ , the amplitude spectrum and phase spectrum can be computed via $A (ω) \equiv \sqrt{{a_{c} (ω)}^{2} + {a_{s} (ω)}^{2}}$ and $ϕ (ω) \equiv \tan^{- 1} (\frac{a_{c} (ω)}{a_{s} (ω)})$ .

A 2D image may also be described in terms of a Fourier decomposition as follows:

I (x, y) = \sum_{ω_{x}, ω_{y}}^{} A (ω_{x}, ω_{y}) \sin (ω_{x} x + ω_{y} y + ϕ (ω_{x}, ω_{y}))

where $ω_{x}$ and $ω_{y}$ correspond to frequencies in the $x$ and $y$ dimensions of the image, respectively. Each sinewave component now corresponds to a 2D grating of a specific spatial-frequency and orientation. You can use the demo below to build up an image from its Fourier components, one component at a time.

Note how the amplitudes fall off with frequency. At some point, the amplitudes are so small that the gratings are not even visible, yet they are clearly needed to reconstruct. This seems paradoxical - gratings which are invisible in isolation nevertheless create structure when combined. The squared amplitudes have a 1/f² [1] fall off relationship, which is shown in the dashed line. This phenomenon of 1/f power spectra is not unique to natural images — it occurs all throughout nature such as in music [2].

Figure 1. An interactive Fourier buildup: reconstruct an image by adding one Fourier component at a time. Top middle panel: The cumulative reconstruction of the image after adding each oriented frequency component. Top right panel: The component number i is the frequency component we are adding and a is the amplitude. As i goes up, a will go down because we are adding frequencies in descending order of amplitude. Each grating is weighted by its associated amplitude. Only components up to ~i=150 are visible and after that the grating is too hard to see because the amplitude is so low. Bottom right panel: The grating shown with full contrast to make it easier to see for visualization purposes. Bottom middle panel: The power spectrum of the image. The y-axis is the squared amplitude of the frequency component, hence the power spectrum. The x-axis is the frequency value. Note the log-log plot. Source: Python, MATLAB.

Bayesian denoising with a natural-image prior

If you want an incredible and easy-to-understand introduction to bayesian inference please read this write-up by Bruno Olshausen [3]. We will use bayesian inference in order to denoise images. Let’s begin with the following problem set-up (copied from the handout write-up [3]): “A classic example of where Bayesian inference is employed is in the problem of estimation, where we must guess the value of an underlying parameter from an observation that is corrupted by noise. Let’s say we have some quantity in the world, $x$ , and our observation of this quantity, $y$ , is corrupted by additive Gaussian noise, $n$ , with zero mean:”

y = x + n

Our job is to make the best guess as to the value of $x$ given the observed value $y$ . Let’s call our best guess $\hat{x}$ . In this set-up, when $x \sim N (μ_{x}, σ_{x}^{2})$ and $n \sim N (μ_{n}, σ_{n}^{2})$ are both gaussian, there is a standard closed-form solution for $\hat{x}$ . This is also known as the Wiener filter:

\hat{x} = \frac{σ_{x}^{2} y + σ_{n}^{2} μ_{x}}{σ_{x}^{2} + σ_{n}^{2}}

Now assume $μ_{x} = 0$ and $σ_{x} = 1.0$ . Then $\hat{x}$ simplifies to the following form:

\hat{x} = a y

a = \frac{σ_{x}^{2}}{σ_{x}^{2} + σ_{n}^{2}}

$a$ is an attenuation factor between 0 and 1. $a$ varies with $σ_{n}$ as shown in the plot below:

Plot showing attenuation factor a as a function of noise standard deviation sigma_n — **Figure 2.** Attenuation factor $a$ as a function of $σ_{n}$ .

Now let's apply the same framework to the problem of denoising an image. We are given a noisy observation $I (\vec{x})$ of an image $I_{0} (\vec{x})$ as follows:

I (\vec{x}) = I_{0} (\vec{x}) + n (\vec{x})

where $\vec{x}$ denotes the pixel position within the 2D image. Our task is to estimate $I_{0}$ from $I$ . We could do this pixel by pixel, but since the signal and noise variance is the same for each pixel all it will do is attenuate the entire image, which isn't very interesting. However if we transform to the frequency domain we can exploit our prior knowledge about the statistics of natural images - namely, they have 1/f² power spectra!

Letting $\tilde{I} (\vec{f})$ denote the Fourier transform of $I (\vec{x})$ , we have

\tilde{I} (\vec{f}) = \tilde{I_{0}} (\vec{f}) + \tilde{n} (\vec{f})

Since the Fourier transform is a unitary transformation (it is like multiplying by an orthonormal matrix), the noise variance in the frequency domain is the same as it was in the space domain. But the signal variance is now different for each frequency as it follows the 1/f² power law:

σ^{2} (\vec{f}) = k \cdot {| \vec{f} |}^{- 2}

We just need to set the constant of proportionality $k$ so that the total signal power in the frequency domain equals the signal power in the space domain, which can be accomplished via

k = \frac{\sum_{\vec{x}}^{} σ^{2} (\vec{x})}{\sum_{\vec{f}}^{} {| \vec{f} |}^{- 2}}

We can then denoise the image frequency by frequency as follows:

\hat{\tilde{I}} (\vec{f}) = a (\vec{f}) \tilde{I} (\vec{f})

where $a (\vec{f})$ is the attenuation factor computed from the signal and noise variance for each frequency. In this case, then

a (\vec{f}) = \frac{σ^{2} (\vec{f})}{σ^{2} (\vec{f}) + σ_{n}^{2}}

Examine the code below which implements the above equations and be sure you understand it. Try running it for different amounts of noise to see how far you can go before the image is unrecoverable.

Figure 3. Wiener denoising in the frequency domain using a 1/f² natural-image prior.

Wavelet Coring

In the previous section we used the Wiener filter to attenuate each frequency. In this section, instead of modifying frequency content, we will take wavelet decomposition of the image and modify the wavelet coefficients. This is a method developed by Simoncelli and Adelson in 1996 [4]. It relies on steerable pyramids [5], which is a method for decomposing an image. If you want a great explanation of steerable pyramids check out the actively maintained pyrtools documentation page about steerable pyramids [6]. Below is an example of the steerable pyramid decomposition. The left figure shows the filters and the right figure shows the activations on the Einstein image for each of the filters.

Steerable pyramid filters showing the frequency decomposition filters — **Figure 4.** Steerable pyramid decomposition. **Left:** The steerable pyramid filters used for decomposition. **Right:** The activations (filter responses) on the Einstein image for each of the filters.

Steerable pyramid activations on the Einstein image showing the response of each filter — **Figure 4.** Steerable pyramid decomposition. **Left:** The steerable pyramid filters used for decomposition. **Right:** The activations (filter responses) on the Einstein image for each of the filters.

You can see from the figure above that there are different levels. The top level (first row of the left figure) responds to high frequency information. The second row responds to medium frequency information while the bottom row responds to low frequency information.

I am going to follow the implementation details of this method based on Professor Rob Fergus' class assignment [7]. Now lets only focus on the first orientation (first column, all three rows). This first orientation will detect vertical edges. In the figure below we see three columns. The first column shows the filter again for reference. The second column shows the filters' outputs for the original uncorrupted image. The third column shows the filters' outputs for the image corrupted with gaussian noise of $σ = 0.075$ . The important thing to notice is that most of the noise shows up in the high frequency filtered output. In the two bottom rows for the original and noisy version, the filtered outputs look roughly the same. However, for the top row– the high frequency filtered output– the noisy version clearly looks a lot more corrupted than the original version.

Comparison of filter outputs for original and noisy images showing noise primarily affects high frequency components — **Figure 5.** Filter outputs comparison. **Left:** Filter reference. **Middle:** Filter outputs for original uncorrupted image. **Right:** Filter outputs for image corrupted with gaussian noise.

Now let's focus only on the highest frequency filter, the first row from the above figure. We can plot out the distribution of the filtered outputs for both the original and noisy version of the image as you can see in the figure below. The top row is the same as the top row from figure 5. The bottom row shows the distributions as histograms. Notably the original filtered output's histogram has a far more sharply peaked distribution at the center than the noisy filtered output's histogram. The main intuition for this method of denoising is to try to transform the noisy filtered output's histogram to match the original filtered output's histogram.

Histogram comparison of filter outputs showing original has sharper peak than noisy version — **Figure 6.** Distribution comparison of highest frequency filter outputs. **Top row:** Filter outputs (same as top row from Figure 5). **Bottom row:** Histogram distributions showing the original has a more sharply peaked distribution at the center compared to the noisy version.

Gradient descent + Langevin sampling

Coming next: interpreting denoising as sampling from the posterior with injected noise.

Figure 7. Gradient descent + Langevin sampling demo with a 1/f² prior.

Filling in images with missing pixels

Coming next: the same Bayesian framework for inpainting when many pixels are missing.

Figure 8. Inpainting by sampling missing pixels under a 1/f² prior.

Comparison to modern deep learning approaches

Coming next: qualitative + quantitative comparison against standard deep denoisers and inpainters.

Discussion

Takeaways, limitations, and next steps.

Acknowledgments

I recently served as the GSI for UC Berkeley's NEU 172L: Cognitive and Computational Lab. This is a brand new course, taught for the first time in Fall 2025. The goal of the course is to introduce students to the experimental and analytical techniques used by cognitive and computational neuroscientists. It was taught by Frederic E Theunissen, Michael Silver, and Bruno Olshausen. I had the pleasure of helping to develop some of the labs that were taught during the course. Part of Bruno's labs were built to teach students about the Fourier transform and Bayesian inference. The content and code in this write-up were initially developed for the course's labs. I had an incredible time building these labs and teaching them to the students, which is what motivated me to make this write-up. I also had the privilege of learning a lot of this content one-on-one from Bruno and I would like to sincerely thank him for his time and patience to teach me everything.

I would also like to thank my Co-GSI Carolyn Irving for being an amazing teammate and also helping develop the labs that eventually became a part of this write up. Finally, I would like to thank the Berkeley Neuroscience Department for giving me the opportunity to GSI NEU 172L and funding me for Fall 2025.

This webpage is inspired by the aesthetic of Transformer Circuits.

Citation Information

For attribution in academic contexts, please cite this work as

Kothapalli, T., & Olshausen, B., "Denoising Images with Natural Scene Statistics", 2026.

BibTeX citation

@article{kothapalli2026denoising,
  author={Kothapalli, Tejasvi and Olshausen, Bruno},
  title={Denoising Images with Natural Scene Statistics},
  year={2026},
  url={https://tejasvikothapalli.github.io/blogs/denoising/}
}

References

Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, 4(12), 2379–2394. [link]
Voss, R. F., & Clarke, J. (1978). “1/f noise” in music: Music from 1/f noise. The Journal of the Acoustical Society of America, 63(1), 258–263. [link]
Olshausen, B. A. (2004). Bayesian probability theory. The Redwood Center for Theoretical Neuroscience, Helen Wills Neuroscience Institute at the University of California at Berkeley, Berkeley, CA. [link]
Simoncelli, E. P., & Adelson, E. H. (1996, September). Noise removal via Bayesian wavelet coring. In Proceedings of 3rd IEEE International Conference on Image Processing (Vol. 1, pp. 379–382). IEEE. [link]
Simoncelli, E. P., & Freeman, W. T. (1995, October). The steerable pyramid: A flexible architecture for multi-scale derivative computation. In Proceedings, International Conference on Image Processing (Vol. 3, pp. 444–447). IEEE. [link]
Simoncelli, E., Young, R., Broderick, W., Fiquet, P. É., Wang, Z., Kadkhodaie, Z., ... & Ward, B. (2024, July). Pyrtools: tools for multi-scale image processing. [link]
Fergus, R. Computational Photography. New York University, Department of Computer Science. [link]