• iocase@lemmy.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 days ago

      LLMs are trained by taking a passage of text and masking out the next words. The LLM has to guess what the next word is going to be.

      If you use the output of a fancy ass billion dollar model as your training data, you can duplicate the output style and “knowledge” of the parent model if you show it enough responses. That’s basically what Alibaba did. They prompted the shit out of Claude and used the responses to train their own model which allows you to piggyback off of Claude’s hard work pirating the entire internet. Your cloned model can also be smaller and leaner, being cheaper to operate.

      I said this elsewhere but it’s like taking a block of metal and showing it Porsche 911s until it turned into a Porsche 911 with 95% of the performance, and it also costs ⅕ the cost to maintain and fuel it.

        • iocase@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 days ago

          It’s approximate but yeah you can get roughly in that ballpark. The biggest benefit is making the model weights smaller and cheaper to run. You can fit 5X as many instances on the same server if you distill down while having basically the same output.

          The main caveat is you need to absolutely hammer the main model with questions from all angles to try and get it to present as much of its internalized knowledge as possible. Which is why Anthropic is pissed about this since they’re barely making money off of these prompts to train a more efficient competitor (BTW this is how “mini” or other models are trained. They’re distillates)