To be a pedant: it’s not just recycling what’s online. Large language models are trained by building a model of the probability of the next word based on previous words (tokens). That model is trained on a variety of sources including the net but it doesn’t take wholesale elements like some image generation models do.
The reason ChatGPT is different is that most LMs are tuned by predicting the next token on a webpage text, but ChatGPT is tuned with human input from people hired and screened. So instead of optimising for purely guessing the next word, it’s optimised for being accurate to the users intention.
Its basically a fancy equivalent of that game where you just press the next suggested word on your phone keypad like: “I like SBT because I have to pay to be able and I can’t change it.” Just better trained.