TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Llama.cpp runs 65B on Asahi Linux at 4tk/s

5 点作者 ingenieroariel大约 2 年前
$ pledge -p &#x27;stdio rpath&#x27; -- &#x2F;home&#x2F;shared&#x2F;src&#x2F;llama.cpp&#x2F;main -m &#x2F;home&#x2F;shared&#x2F;models&#x2F;ggml-model-65b- q4_0.bin -t 16 -p &quot;A proper AI assistant deployment should be very strict in the kind of resources it allows&quot; main: seed = 1681759209 llama.cpp: loading model from &#x2F;home&#x2F;shared&#x2F;models&#x2F;ggml-model-65b-q4_0.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_layer = 80 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 22016 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 65B llama_model_load_internal: ggml ctx size = 146.86 KB llama_model_load_internal: mem required = 41477.67 MB (+ 5120.00 MB per state) llama_init_from_file: kv self size = 1280.00 MB<p>system_info: n_threads = 16 &#x2F; 20 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000 generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0<p>A proper AI assistant deployment should be very strict in the kind of resources it allows its agents to access. It can&#x27;t just send them everywhere willy-nilly, and in fact, the more selective it is, the better for both itself and the user. So what are some good rules for allowing or blocking agent access? One rule is that an agent should only be given access to resources necessary for its job description. For example, a chatbot should only have access to the data required to answer customer questions (assuming it&#x27;s not an ELIZA-type AI which just repeats what you say back at you). Another rule is that agents shouldn&#x27;t be llama_print_timings: load time = 2205.70 ms llama_print_timings: sample time = 61.93 ms &#x2F; 128 runs ( 0.48 ms per run) llama_print_timings: prompt eval time = 4190.82 ms &#x2F; 18 tokens ( 232.82 ms per token) llama_print_timings: eval time = 39469.33 ms &#x2F; 127 runs ( 310.78 ms per run) llama_print_timings: total time = 44072.21 ms pledge -p &#x27;stdio rpath&#x27; -- &#x2F;home&#x2F;shared&#x2F;src&#x2F;llama.cpp&#x2F;main -m -t 16 -p 11:27.47 user 0.658 system 1556% cpu (44.219 wasted time).

暂无评论

暂无评论