Rendered at 10:25:25 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
qinqiang201 2 hours ago [-]
Could it run on Macbook? Just on GPU device?
OutOfHere 3 hours ago [-]
Will this run on CPU? (as opposed to GPU)
boxed 2 hours ago [-]
Why would you want to? It's like using a hammer for screws.
g-mork 1 hours ago [-]
CPU compute is infinity times less expensive and much easier to work with in general
boxed 55 minutes ago [-]
Less expensive how? The reason GPUs are used is because they are more efficient. You CAN run matmul on CPUs for sure, but it's going to be much slower and take a ton more electricity. So to claim it's "less expensive" is weird.
g-mork 4 minutes ago [-]
This is far too simplistic, you can't discuss perf per watt unless you're talking about a job running at any decent level of utilisation. Numbers like that only matter for planetary scale type services, meanwhile Intel boxes mastered the art of power efficient idle modes decades ago while almost any contemporary GPU still isn't even remotely close, and meanwhile you can pick up 32 core boxes like that for pennies on the dollar.
Even if utilisation weren't a metric, "efficient" can be interpreted in so many ways as to be pointless to try and apply in the general case. I consider any model I can foist into a Lambda function "efficient" because of a variety of secondary concerns you simply cannot meaningfully address with GPU hardware at present (elasticity for example). That it burns more energy per unit output is almost meaningless to consider for any kind of workload where Lambda would be applicable.
It's the same for any edge-deployed software where "does it run on CPU?" translates to "does the general purpose user have a snowball's chance in hell of running it?", having to depend on 4GB of CUDA libraries to run a utility fundamentally changes the nature and applicability of any piece of software
regularfry 1 hours ago [-]
To maximise the VRAM available for an LLM on the same machine. That's why I asked myself the same question, anyway.
Even if utilisation weren't a metric, "efficient" can be interpreted in so many ways as to be pointless to try and apply in the general case. I consider any model I can foist into a Lambda function "efficient" because of a variety of secondary concerns you simply cannot meaningfully address with GPU hardware at present (elasticity for example). That it burns more energy per unit output is almost meaningless to consider for any kind of workload where Lambda would be applicable.
It's the same for any edge-deployed software where "does it run on CPU?" translates to "does the general purpose user have a snowball's chance in hell of running it?", having to depend on 4GB of CUDA libraries to run a utility fundamentally changes the nature and applicability of any piece of software