$ pledge -p 'stdio rpath' -- /home/shared/src/llama.cpp/main -m /home/shared/models/ggml-model-65b- q4_0.bin -t 16 -p "A proper AI assistant deployment should be very strict in the kind of resources it allows"
main: seed = 1681759209
llama.cpp: loading model from /home/shared/models/ggml-model-65b-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 8192
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 64
llama_model_load_internal: n_layer = 80
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 22016
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size = 146.86 KB
llama_model_load_internal: mem required = 41477.67 MB (+ 5120.00 MB per state)
llama_init_from_file: kv self size = 1280.00 MB<p>system_info: n_threads = 16 / 20 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0<p>A proper AI assistant deployment should be very strict in the kind of resources it allows its agents to access. It can't just send them everywhere willy-nilly, and in fact, the more selective it is, the better for both itself and the user. So what are some good rules for allowing or blocking agent access?
One rule is that an agent should only be given access to resources necessary for its job description. For example, a chatbot should only have access to the data required to answer customer questions (assuming it's not an ELIZA-type AI which just repeats what you say back at you).
Another rule is that agents shouldn't be
llama_print_timings: load time = 2205.70 ms
llama_print_timings: sample time = 61.93 ms / 128 runs ( 0.48 ms per run)
llama_print_timings: prompt eval time = 4190.82 ms / 18 tokens ( 232.82 ms per token)
llama_print_timings: eval time = 39469.33 ms / 127 runs ( 310.78 ms per run)
llama_print_timings: total time = 44072.21 ms
pledge -p 'stdio rpath' -- /home/shared/src/llama.cpp/main -m -t 16 -p 11:27.47 user 0.658 system 1556% cpu (44.219 wasted time).