notes/science/ai/offline-local/llama/cli.txt

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve/start
      Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
      llama3.2:3b
      echo "what is the two multiplied by three?" | ollama run llama3.2:3b
  stop        Stop a running model
  pull        Pull a model from a registry
      llama3.2:3b
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_TMPDIR              Location for temporary files
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection
      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)
      OLLAMA_LOAD_TIMEOUT        How long to allow model loads to stall before giving up (default "5m")

Available keyboard shortcuts:
  Ctrl + a            Move to the beginning of the line (Home)
  Ctrl + e            Move to the end of the line (End)
   Alt + b            Move back (left) one word
   Alt + f            Move forward (right) one word
  Ctrl + k            Delete the sentence after the cursor
  Ctrl + u            Delete the sentence before the cursor
  Ctrl + w            Delete the word before the cursor

  Ctrl + l            Clear the screen
  Ctrl + c            Stop the model from responding
  Ctrl + d            Exit ollama (/bye)

Output

Couldn't find '/home/iharh/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIH8NPUj/JP0gIzX0Wfy3tuezgOyd5lypq59wh9Kr9Vff


time=2025-07-05T08:30:15.765+03:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:45349"
llama_model_loader: loaded meta data with 30 key-value pairs and 255 tensors from ~/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff (version GGUF V3 (latest))
...
time=2025-07-05T08:30:17.518+03:00 level=INFO source=server.go:601 msg="llama runner started in 1.76 seconds"
[GIN] 2025/07/05 - 08:30:17 | 200 |   1.79838588s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2025/07/05 - 08:31:07 | 200 |  965.070709ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/07/05 - 08:31:16 | 200 |   15.374323ms |       127.0.0.1 | POST     "/api/show"


1. start
2. pull
3. run

/api/tags

curl 'http://localhost:11434/api/ps'