MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers [R]

we_are_mammals@alien.top · 10 months ago

Can’t OpenAI simply check the output for sharing long substrings with the training data (perhaps probabilistically)?

we_are_mammals@alien.top · 10 months ago

It’s overfitting.

Overfitting, by definition, happens when your generalization error goes up.

we_are_mammals@alien.top · 10 months ago

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers [R]

we_are_mammals@alien.top · 10 months ago

It’s a new conjecture, all right. But it’s clearly false.

Consider n=4. Then p=5, q=3, k=1. But 5+1 and 3+1 are not primes.

All you number theorists out there, I think your jobs are safe for the time being.

we_are_mammals@alien.top · 10 months ago

I myself have posted

But the point you were trying to prove was that the discussions were “constant”. How does picking your own threads spanning 2 months support it at all?

The OP didn’t say that the discussions were completely gone. Yes, there are some, but pretty thin and usually glib. I don’t count “Wow! This is exciting. I’ll have to take a look at this awesome new paper!” as discussion. A bot harvesting upvotes could post this.

we_are_mammals@alien.top · 10 months ago

here constantly.

Fortnightly. Finally got a chance to use this word :-) 4 links spanning 2 months.

But even in these picks, take a look at the first one, for example. 10 comments. Only one of them suggests that the commentator looked at the paper itself.

we_are_mammals@alien.top · 10 months ago

Is there an interest in resurrecting technical discussions of the latest research? [D]

we_are_mammals@alien.top · 10 months ago

According to the scaling laws, the loss/error is approximated as

w0 + w1 * pow(num_params, -w2) + w3 * pow(num_tokens, -w4)

Bill wrote before that he’d been meeting with the OpenAI team since 2016, so he’s probably pretty knowledgeable about these things. He might be referring to the fact that, after a while, you will see very diminishing returns while increasing num_params. In the limit, the corresponding term disappears, but the others do not.

we_are_mammals@alien.top · 10 months ago

Bill Gates told a German newspaper that GPT5 wouldn't be much better than GPT4: "there are reasons to believe that we have reached a plateau" [N]

we_are_mammals@alien.top · 10 months ago

a messed-up experiment or a poorly written/plainly incorrect paper that slips through the review system could be your end

Is that true? If your paper is totally wrong, publish a retraction, do not include the paper in your “list of publications”, and move on.

we_are_mammals@alien.top · 10 months ago

Technical discussion seems to be dead in r/MachineLearning, but I’ll ask anyway: Isn’t it strange that in Figure 3 of the first paper, layer 1 has a blurry diagonal, while the rest of them are sharp? I would have expected the opposite: the lowest layer to be very local, and higher layers to be more global.

we_are_mammals@alien.top · 10 months ago

the claimed 117.83x speedup, might be somewhat misleading

If you compare the best implementation of FFF on CUDA to the best implementation of FF on CUDA, then the speed-up they got is 3.15x:

See Page 5 Further comparisons: “On GPU, the PyTorch BMM implementation of FFF delivers a 3.15x speedup over the fastest (Native fused) implementation of FF”

The 40x that u/lexected mentioned seems to apply only when comparing to an apparently much slower FF version.

It’s a pretty cool paper regardless, as far as I can tell from skimming it. But it could benefit from stating more clearly what has been achieved.

we_are_mammals@alien.top · 10 months ago

has 4095 neurons but selectively uses only 12 (0.03%) for inference

an extra 0 in there

we_are_mammals@alien.top · 10 months ago

So the implication here is that the CEO knew about the breakthrough, but hid it from the board?

MSFT did experience a 20% climb over the last month. Maybe it was due to this news leaking out?

we_are_mammals@alien.top · 10 months ago

I think DistilBERT needs to be in Table 2, since it’s their most direct competitor: it trades off accuracy for speed, and requires extra training effort, like their approach.

Still, if they are about 20x faster than DistilBERT using cuBLAS, that’s pretty amazing.

we_are_mammals@alien.top · 10 months ago

78x speedup over the optimized baseline feedforward implementation

So they are 78x faster than MKL using the same number of cores?

we_are_mammals@alien.top · 10 months ago

OpenAI: "We have reached an agreement in principle for Sam to return to OpenAI as CEO" [N]

we_are_mammals@alien.top · 10 months ago

OpenAI: "We have reached an agreement in principle for Sam to return to OpenAI as CEO" [N]

we_are_mammals@alien.top · 10 months ago

Stability AI releases a video model [N]

we_are_mammals@alien.top · 11 months ago

The CEO said it had cost “much more than 100M” of compute to train.

we_are_mammals@alien.top · 11 months ago

Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence [N]