Anthropic says Alibaba must be punished for largest Claude cloning attack

sanitation@lemmy.today · 3 days ago

Anthropic says Alibaba must be punished for largest Claude cloning attack

ContactClosure@lemmus.org · 20 hours ago

What’s a suitable punishment Anthropic?

(chasing) Whats a suitable punishment motherfucker?

Jiral@lemmy.world · 2 days ago

The thief cries “catch the thief!”

jaxxed@lemmy.world · 2 days ago

New Qwen release incoming!

Q: if you steal a stolen thng, is it stealing?

uuj8za@piefed.social · 3 days ago

vrighter@discuss.tchncs.de · 3 days ago

you can’t just call anything you don’t like “an attack”

weimaraner_of_doom@piefed.social · 2 days ago

How about “terrorism” or “national security threat”?

SkaveRat@discuss.tchncs.de · 2 days ago

I declare an attack!

[object Object]@lemmy.ca · 3 days ago

Okay, so Anthropic distills MY copywriter data and it’s fine.

Alibaba distills Anthropic non-copywritable and that demands retaliation at the nation state level.

Fuck off. The rules are abundantly clear.

flango@lemmy.eco.br · 2 days ago

What’s the science behind cloning?

iocase@lemmy.zip · 2 days ago

LLMs are trained by taking a passage of text and masking out the next words. The LLM has to guess what the next word is going to be.

If you use the output of a fancy ass billion dollar model as your training data, you can duplicate the output style and “knowledge” of the parent model if you show it enough responses. That’s basically what Alibaba did. They prompted the shit out of Claude and used the responses to train their own model which allows you to piggyback off of Claude’s hard work pirating the entire internet. Your cloned model can also be smaller and leaner, being cheaper to operate.

I said this elsewhere but it’s like taking a block of metal and showing it Porsche 911s until it turned into a Porsche 911 with 95% of the performance, and it also costs ⅕ the cost to maintain and fuel it.

flango@lemmy.eco.br · 2 days ago

95% of performance is impressive for a clone

iocase@lemmy.zip · 2 days ago

It’s approximate but yeah you can get roughly in that ballpark. The biggest benefit is making the model weights smaller and cheaper to run. You can fit 5X as many instances on the same server if you distill down while having basically the same output.

The main caveat is you need to absolutely hammer the main model with questions from all angles to try and get it to present as much of its internalized knowledge as possible. Which is why Anthropic is pissed about this since they’re barely making money off of these prompts to train a more efficient competitor (BTW this is how “mini” or other models are trained. They’re distillates)

duckCityComplex@lemmy.world · 2 days ago

The article is not clear on what a “distillation attack” is… what exactly is Alibaba supposed to be getting away with here? The article mentions using many different connections through obfuscation networks and proxies… so that would get them around rate limiting, and maybe enable them to submit many queries on free accounts… just spin up a new account whenever you hit the token limit of an unpaid account. So basically it’s a terms of service violation?

I don’t see why it’s necessarily a huge leg up for a competitor… they are just using the outputs of another model as training data. They still need to train their model, which is the expensive and energy intensive part.

It sounds to me like Anthropic just wants the US Government to help enforce its TOS internationally and force Alibaba to pay for those precious tokens? Because apart from that piece, the “attack” just seems like normal use of the service. If Anthropic’s service has an inherent vulnerability, that’s their problem.

Of course all the other comments about how they stole all their training data in the first place are spot on.

iocase@lemmy.zip · 2 days ago

Distillation allows you to make a smaller model that can produce the same outputs as a larger model. Basically they’re pirating all of the hard work anthropic did pirating the entire internet.

Alibaba gets a model that produces basically the same output for a tiny fraction of the cost to operate the model once it’s finished training. Distillation training also uses basically all of its data from the big model (afaik it’s all of it sourced from the parent model)

It’s like if you took a lump of metal and showed it Porsche 911s until it turned into a 911 shaped chunk of metal that had 95% of the performance, but it only cost you $3000 for the ingot, and also cost ⅕ the amount in fuel and maintenance.

duckCityComplex@lemmy.world · 2 days ago

Ok, thanks for the detailed explanation. I guess if your goal is to make your model sound like another model that makes perfect sense.

Zarxrax@lemmy.world · 3 days ago

Nooooo, you can’t train on OUR data! That’s illegal!!!1