For example, the Switchboard corpus (300h, 8khz, transcribed audio) is about 16G...

bainsfather · on Oct 9, 2014

Maybe this? http://www.voxforge.org/home - "VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac)." (caveat: I have not recorded on this from (any) of my machines - I don't have the right plugin apparently)

Maybe also: https://librivox.org - has audiobooks read by volunteers, plus the book text.