> Considering that PCA also complicates things from a model interpretability point of view
This is a strange comment since my primary usages of PCA/SVD is as a first step in understanding latent factors which are driving the data. Latent factors typically involve all of the important things that anyone running a business or deciding policy care about: customer engagement, patient well being, employee hapiness, etc all represent latent factors.
If you have ever wanted to perform data analysis and gain some exciting insight into explaining user behavior, PCA/SVD will get you there pretty quickly. It is one of the most powerful tools in my arsenal when I'm working on a project that requires interoperability.
The "loadings" in PC and the V matrix in SVD both contain information about how the original feature space correlates with the new projection. This can easily show thing things like "User's who do X,Y and NOT Z are more likely to purchase".
Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-Frequency matrix you will get a first pass at semantic embedding. You'll notice, for example, that "dog" and "cat" will project onto the new space in a common PC which can be used to interpret "pets".
> I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model
PCA/SVD are a linear transformation of the data and shouldn't give you any performance increase on a linear model. However they can be very helpful in transforming extremely high dimensional, sparse vectors into lower dimensional, dense representations. This can provide a lot of storage/performance benefits.
> NNs, etc. are all able to tease out pretty complicated relationships among features on their own.
PCA is literally identical to an autoencoder minimizing the MSE with no non-linear layers. It is a very good first step towards understanding what your NN will eventually do. After all, all NNs perform a non-linear matrix transformation so that your final vector space is ultimately linearly separable.
Sure, everyone wants to get to the latent factors that really drive the outcome of interest, but I've never seen a situation in which principal components _really_ represent latent factors unless you squint hard at them and want to believe. As for gaining insight and explaining user behavior, I'd much rather just fit a decent model and share some SHAP plots for understanding how your features relate to the target and to each other.
If you like PCA and find it works in your particular domains, all the more power to you. I just don't find it practically useful for fitting better models and am generally suspicious of the insights drawn from that and other unsupervised techniques, especially given how much of the meaning of the results gets imparted by the observer who often has a particular story they'd like to tell.
I've used PCA with good results in the past. My problem essentially simplified down to trying to find nearest neighbours in high dimensional spaces. Distance metrics in high dimensional spaces don't behave nicely. Using PCA to cut reduce the number of dimensions to something more manageable made the problem much more tractable.
This is a strange comment since my primary usages of PCA/SVD is as a first step in understanding latent factors which are driving the data. Latent factors typically involve all of the important things that anyone running a business or deciding policy care about: customer engagement, patient well being, employee hapiness, etc all represent latent factors.
If you have ever wanted to perform data analysis and gain some exciting insight into explaining user behavior, PCA/SVD will get you there pretty quickly. It is one of the most powerful tools in my arsenal when I'm working on a project that requires interoperability.
The "loadings" in PC and the V matrix in SVD both contain information about how the original feature space correlates with the new projection. This can easily show thing things like "User's who do X,Y and NOT Z are more likely to purchase".
Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-Frequency matrix you will get a first pass at semantic embedding. You'll notice, for example, that "dog" and "cat" will project onto the new space in a common PC which can be used to interpret "pets".
> I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model
PCA/SVD are a linear transformation of the data and shouldn't give you any performance increase on a linear model. However they can be very helpful in transforming extremely high dimensional, sparse vectors into lower dimensional, dense representations. This can provide a lot of storage/performance benefits.
> NNs, etc. are all able to tease out pretty complicated relationships among features on their own.
PCA is literally identical to an autoencoder minimizing the MSE with no non-linear layers. It is a very good first step towards understanding what your NN will eventually do. After all, all NNs perform a non-linear matrix transformation so that your final vector space is ultimately linearly separable.