The comment above is true of ROS, full stop. Even in C++ without involving Python. Especially if you have any real time requirements. It uses sockets for interprocess communication, which adds massive amounts of overhead. Replacing ROS and using straight function calls sped up a simulation of a system I was running from 10x real time to 1000x, for example. That's fine for certain applications but just didn't work for my particular use case.