• 1 Post
  • 6 Comments
Joined 2 months ago
cake
Cake day: May 11th, 2026

help-circle
  • Holy shit dude it’s a meme joking about how corrupt FIFA is and a conspiracy theory that they pair up strong teams to eliminate them for whoever paid the most, chill out.

    ⚽ Tell me where I hurt you by pointing to it on this soccer ball

    Edit: do you also think professional wrestling is real? Because the leap in logic from “yeah the obviously corrupt organization that has a documented history of abuse and bribery wouldn’t accept large bribes to fix the games they run in secret” is on the same level as believing the punisher really threw mankind off hell in a cell 16 feet through an announcers table.



  • A corruption event held every 4 years where the team that bribes the corruption organization the most get the most favorable placement in the first pairing, and all other strong teams are paired together until they knock each other out.

    You also need to pay fealty to the footy corruption organization like charging abhorrent prices for public transit to go see the games, charging insane prices for anything football related, and even crack down on businesses that sell footbal-related specials or sales without also greasing the palms of the footy corruption organization.


  • It’s approximate but yeah you can get roughly in that ballpark. The biggest benefit is making the model weights smaller and cheaper to run. You can fit 5X as many instances on the same server if you distill down while having basically the same output.

    The main caveat is you need to absolutely hammer the main model with questions from all angles to try and get it to present as much of its internalized knowledge as possible. Which is why Anthropic is pissed about this since they’re barely making money off of these prompts to train a more efficient competitor (BTW this is how “mini” or other models are trained. They’re distillates)


  • LLMs are trained by taking a passage of text and masking out the next words. The LLM has to guess what the next word is going to be.

    If you use the output of a fancy ass billion dollar model as your training data, you can duplicate the output style and “knowledge” of the parent model if you show it enough responses. That’s basically what Alibaba did. They prompted the shit out of Claude and used the responses to train their own model which allows you to piggyback off of Claude’s hard work pirating the entire internet. Your cloned model can also be smaller and leaner, being cheaper to operate.

    I said this elsewhere but it’s like taking a block of metal and showing it Porsche 911s until it turned into a Porsche 911 with 95% of the performance, and it also costs ⅕ the cost to maintain and fuel it.


  • Distillation allows you to make a smaller model that can produce the same outputs as a larger model. Basically they’re pirating all of the hard work anthropic did pirating the entire internet.

    Alibaba gets a model that produces basically the same output for a tiny fraction of the cost to operate the model once it’s finished training. Distillation training also uses basically all of its data from the big model (afaik it’s all of it sourced from the parent model)

    It’s like if you took a lump of metal and showed it Porsche 911s until it turned into a 911 shaped chunk of metal that had 95% of the performance, but it only cost you $3000 for the ingot, and also cost ⅕ the amount in fuel and maintenance.