The blessing of dimensionality
for reals
Jonathan “No Trump” Falk writes:
Your post this week on Integration and Differentiation got me thinking, never a good sign. Adding to this, I only really learned in January what Attention means, the foundation of LLM. And what it means is that in a vector space of 14,000 dimensions or so, you can express all manner of nuance… enough to start vitriolic arguments about the humanity of the output.
As I started thinking about this, and as your post crystallized, I have spent 50 years fighting the Curse of Dimensionality. I know this curse in my marrow. Brilliant inferences await me, but the space in which these insights are found is simply too vast to explore. So we simplify, reducing the dimensionality to something that while still vast, is confined to a hyperplane where we can, like Plato, see the projections of truth, not the truth itself.
But then what LLMs and their generation have taught me is that nuance, which is really just the inverse of inference (in that it’s the vast set of all things consistent with some inference) has an amazing boon of dimensionality. There appears to be no thought that can’t be described by a 14,000 dimension or so vector whose tuning has the huge advantage that 14,000-dimensional space is so empty that tiny nuances can be readily distinguished in such a space, so that you can hide uniqueness in the vastness of 14000-dimensional space that you couldn’t recover in a raw search in that same space.
I’m sure this inversion of the Curse of dimensional search into the Boon of nuance in high dimensions is not original to me, but I think it’s worth noting, and your post was the impetus.
I replied by pointing to our of our very earliest blog posts, The blessing of dimensionality, where I wrote:
The phrase “curse of dimensionality” has many meanings (with 18800 references, it loses to “bayesian statistics” in a googlefight, but by less than a factor of 3). In numerical analysis it refers to the difficulty of performing high-dimensional numerical integrals.
But I am bothered when people apply the phrase “curse of dimensionality” to statistical inference.
In statistics, “curse of dimensionality” is often used to refer to the difficulty of fitting a model when many possible predictors are available. But this expression bothers me, because more predictors is more data, and it should not be a “curse” to have more data. Maybe in practice it’s a curse to have more data (just as, in practice, giving people too much good food can make them fat), but “curse” seems a little strong.
With multilevel modeling, there is no curse of dimensionality. When many measurements are taken on each observation, these measurements can themselves be grouped. Having more measurements in a group gives us more data to estimate group-level parameters (such as the standard deviation of the group effects and also coefficients for group-level predictors, if available).
In all the realistic “curse of dimensionality” problems I’ve seen, the dimensions–the predictors–have a structure. The data don’t sit in an abstract K-dimensional space; they are units with K measurements that have names, orderings, etc.
For example, Marina gave us an example in the seminar the other day where the predictors were the values of a spectrum at 100 different wavelengths. The 100 wavelengths are ordered. Certainly it is better to have 100 than 50, and it would be better to have 50 than 10. (This is not a criticism of Marina’s method, I’m just using it as a handy example.)
For an analogous problem: 20 years ago in Bayesian statistics, there was a lot of struggle to develop noninformative prior distributions for highly multivariate problems. Eventually this line of research dwindled because people realized that when many variables are floating around, they will be modeled hierarchically, so that the burden of noninformativity shifts to the far less numerous hyperparameters. And, in fact, when the number of variables in a a group is larger, these hyperparameters are easier to estimate.
I’m not saying the problem is trivial or even easy; there’s a lot of work to be done to spend this blessing wisely.
It’s been over 20 years so worth sharing the point again.
