Our preference for training in our own datacenter has nothing to do with wokeness. Did you read the blog post? The reasons are clearly explained.
The climate in Wyoming and Montana are actually worse in terms of climate. San Diego's climate extremes are less extreme than those places. Though moving out of CA is a good idea for power cost reasons, also addressed in the blog.
Yes, it's easy to destroy the servers with a lot of dust and/or high humidity. But with filtering and ensuring humidity never exceeds 45% we've had pretty good results.
I remember visiting a small data center (about half the size of the Comma one) where shoe covers were required. Apparently they were worried about people’s shoes bringing in dust and other contamination.
It's not a static number as it's also based on ambient air temperature in the form of dew point - 45% RH at low temps can be far more dangerous than 65% RH at warm ambient.
Likewise the impact on server longevity is not a finite boundary but rather "exposure over time" gradient that, if exceeding the "low risk" boundary (>-12'C/10'f dew point or >15'C/59'f dry bulb temp) results in lower MTBF than design. This is defined (and server equipment manufacturers conform and build to) ASHRAE TC 9.9. This mean - if you're running your servers above high risk curve for humidity and temperature, you're shortening the life considerably compared to low risk curve.
Generally, 15% RH is considered suboptimal and can be dangerous near freezing temperatures - in San Diego in January there were several 90%+RH scenarios that would have been dangerous for servers even when mixed down with warm exhaust air - furthermore, the outdoor air at 76'f during that period means you have limited capacity to mix in warm exhaust air (which btw came from that same 99%RH input air) without getting into higher-than-ideal intake temps.
Any dew points above 62.5'f are considered high risk for servers - as are any intake temps exceeding 32'C/90'f. You want to be on the midpoint between those and 16'C/65'f temps & -12'C/10'f dew point to have no impact on server longevity or MTBF rates.
As a recent example:
KCASANDI6112 - January 2, 2026
High Low Average
Temperature 73.4 °F 59.9 °F 63.5 °F
Dew Point 68.0 °F 60.0 °F 62.6 °F
Humidity 99 % 81 % 96 %
Precipitation 0.12 in -- --
Lastly, air contaminants - in the form of dust (that can be filtered out) and chemicals (which can't without extensive scrubbing) are probably the most detrimental to server equipment if not properly managed, and require very intentional and frequent filter changes (typically high MERV pleated filters changed on a time or pressure drop signal) to prevent server degradation and equipment risks.
The last consideration is fire suppression - permitted datacenters usually require compliance with separate fire code, such that direct outdoor air exchange without active shutdown and dry suppression is not permitted - this is to prevent a scenario where your equipment catches on fire and a constant supply of fresh oxygen-rich outdoor air turns that into an inferno. Smoke detection systems don't operate well with outdoor-mixed air or any level of airborn particulates.
So - for those reasons - among a few others - open air datacenters are not recommended unless you're doing them at google or meta scale, and in those scenarios you typically have much more extensive systems and purpose-designed hardware in order to operate for the design life of the equipment without issues.
Yes, we still use the azure for user-facing services and the website. They don't need GPUs and don't need expensive resources, so it's not as worth it to bring those in-house.
We also rely on github. It has historically been good a service, but getting worth it.
When you say not everything is open source. I assume you mean the training code is not open source? I'm curious what you would want to learn from that? You wouldn't be able to actually train a model since you wouldn't have access to the data.
The end-user can inspect, audit and understand the decisions their vehicle is making. All you have to do is see how the neural network behaves for different inputs. That's the correct approach, whether you have access to the training code or not.
Comma don't even say _how_ the model works! What layers are there? What learning strategies are they using? What do they do? It's literally a black box! "All you have to do is see how it behaves for different inputs" is just black box reverse engineering! Machine Learning is NOT a magic black box.
Comma have constructed a "stack" of models, just as you would connect a series of functions to make a kernel in the mathematics sense, or a series of algorithms or instructions to make a program. And that stack is entirely closed.
https://medium.com/@chengyao.shen/decoding-comma-ai-openpilo... here is an example of reverse-engineering the driving model. If Comma released this exact sort of documentation, including what ML modeling strategies they were using, what each input and output parameter affected, and how the model was trained, I could maybe consider the system open.
The models are now saved in ONNX format. Which is the most readable format available. You can view the architecture of the model with a basic neural network viewer.
Again, I'm curious what you want to learn from the training code?
Chengyao's medium post is great, but it is only possible because the models, the code that runs them and the code that parses the outputs is fully open source.
My binary is saved in a PE format. Which is the most readable format available. You can view the architecture of the software by opening it in the basic Ghidra pseudocode decompiler. All Windows software is now "fully open source."
Chengyao's Medium post is advanced reverse-engineering work requiring a detailed knowledge of the appearance of specific ML algorithms saved in a binary format. And even with this knowledge, Chengyao was only able to _speculate_ about the behavior of the model and the desired response to certain inputs.
What would satisfy me from Comma, if they were aspiring to some kind of "open" label, would be a detailed document explaining each layer of the ML system and what its goals are - like Chengyao's Medium post, but without the need to reverse-engineer the system and attempt to infer its behavior!
Now, maybe Comma don't aspire to be truly open, in which case, that's fine - In that case, Comma is a closed model with an open-source CAN interceptor on top. So essentially, crowd-sourcing the tedious and high-liability parts (vehicle integration, driving video) while owning the valuable parts (training data and model architecture). Very cool!
What format would you rather the model be saved in? ONNX is the most cross platform and standard as far as I know, and it's also what we use internally.
It's not like a PE format which is compiled from something else higher level.
The climate in Wyoming and Montana are actually worse in terms of climate. San Diego's climate extremes are less extreme than those places. Though moving out of CA is a good idea for power cost reasons, also addressed in the blog.