Twitter

Some bash tips -- 17 -- SECONDS

Reinventing the wheel is rooted deep inside our DNA. I really try to never reinvent the wheel but I recently got caught (!).

Indeed, when I wanted to know how fast / how slow something was running or just to log the time spent by a script running, I used to write it like that:
start=$(date +%s)                                          <== Epoch seconds
do_something                                               <== Something is running
end=$(date +%s)                                            <== Epoch seconds
echo "Time spent: $(( end-start )) seconds"                <== Difference between number of sec when do_something ends and when it started

Lets try it with a simple sleep as "do_something":
$ start=$(date +%s); sleep 2; end=$(date +%s); echo "Time spent: $(( end-start )) seconds"
Time spent: 2 seconds
$

This looks obvious and straightforward and I honestly never thought there would be an easier way (and this is why I never looked for it !). And that better way is a magic bash variables named SECONDS. SECONDS counts the number of seconds since your shell is invocated and you can reset it to 0 when you wish and then make it count whatever you want for you (!) -- meaning that the above code would now be:
SECONDS=0                                                  <== Reset SECONDS to 0
do_something                                               <== Something is running
echo "Time spent: $SECONDS"                                <== Show the time spent by do_something

Which does exactly what we want it to do in a very simple way:
$ SECONDS=0; sleep 2; echo "Time spent: $SECONDS"
Time spent: 2 seconds
$

Note that this does not work if you expect milliseconds; you'll then have to use the first method using date +%s%3Nto get milliseconds:
$ start=$(date +%s%3N); echo "" > /dev/null ; end=$(date +%s%3N); echo "Time spent: $(( end-start )) milliseconds"
Time spent: 3 milliseconds
$

One less opportunity to reinvent the wheel!
< Previous shell tip / Next shell tip coming soon >

Getting started with OLLAMA

LLM is the new fancy word, everyone's talking about LLMs and it is pretty cool actually, it is always nice to get a breeze of fresh air with this new AI/LLM technology.

So I wanted to test some LLM on a server and/or on my computer using WSL, not just using ChatGPT in my browser. I wanted to see how you can install and interact with a LLM (also how to RAG -- Retrieval-Augmented Generation -- it, more on that later). I tried a few tools and in my opinion, the simplest to use is ollama. It is also free and open source under a permissive MIT licence which basically means "do whatever you want with it, just don't sue me".


1/ Install ollama:
To install ollama, just go to the ollama download page and you'll find the below command like for Linux which you just copy and paste in a terminal:
$ curl -fsSL https://ollama.com/install.sh | sh
>>> Downloading ollama...
######################################################################## 100.0%##O#-#                                                                  ######################################################################## 100.0%
>>> Installing ollama to /usr/local/bin...
[sudo] password for fred:
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.
$
And that's it. Super easy so far! you can check that ollama is running; also check the version:
$ service ollama status
  ● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-05-23 15:22:07 CEST; 27min ago
   Main PID: 826 (ollama)
      Tasks: 18 (limit: 9471)
     Memory: 518.4M
     CGroup: /system.slice/ollama.service
             └─826 /usr/local/bin/ollama serve
			 . . .
$ ollama -v
  ollama version is 0.1.38
$


2/ Get a model:
Now that the tool to interact with a LLM is installed, we need to get one. The list of models available for ollama is available here. At the time this blog is written, there is no command line to easily get that list of available models. I could find many issues opened to request such feature like this one for example but it is not implemented yet. In the meantime, I wrote that simple script to get that list in a nice table from your command line.
$ ./ls_ollama_avail_model.sh
  Model                    : Description
------------------------------------------------------------------------------------------------------------------------
alfred                   : A robust conversational model designed to be used for both chat and instruct use cases.
all-minilm               : Embedding models on very large sentence level datasets.
aya                      : Aya 23, released by Cohere, is a new family of state-of-the-art, multilingual models that support 23 languages.
bakllava                 : BakLLaVA is a multimodal model consisting of the Mistral 7B base model augmented with the LLaVA  architecture.
. . .
llama3                   : Meta Llama 3: The most capable openly available LLM to date
. . .
zephyr                   : Zephyr is a series of fine-tuned versions of the Mistral and Mixtral models that are trained to act as helpful assistants.
------------------------------------------------------------------------------------------------------------------------
$
You'll see that ollama supports a tons of LLMs which means a tons of fun testing these guys!

Let's start by installing llama3 which is the latest LLM from Meta.
$ ollama pull llama3
pulling manifest
pulling 6a0746a1ec1a... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏  12 KB
pulling 8ab4849b038c... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏  254 B
pulling 577073ffcc6c... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏  110 B
pulling 3f8eb4da87fa... 100% ▕███████████████████████████████████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
$
Again: super easy. You can use the ollama list command to see the models installed on your system:
$ ollama list
NAME            ID              SIZE    MODIFIED
llama3:latest   365c0bd3c000    4.7 GB  About a minute ago
$
Note that you can install and use as many model as it pleases you, the only limit is your storage size, as you can see this llama3 is 4.7GB. Note that the models are stored into ~/.ollama/models (/usr/share/ollama/.ollama/models under WSL). To store the models into an alternative destination (a FS with a lot of storage for example), you are supposed to use the OLLAMA_MODELS variable; I tried with no luck and found many bugs related to this so you would need to make a symbolic link to an alternate path for now. This should be fixed eventually though.

3/ Interact with a model:
Now the fun part: interacting with your model! It is a simple as the previous steps:
$ ollama run llama3
>>> in a word, who are you ?
I am LLaMA.
>>> who is luke skywalker father ?
Luke Skywalker's father is Anakin Skywalker, also known as Darth Vader.
>>> exit
It was nice chatting with you! If you want to start a new conversation or ask me something else, feel free to come back and say hi anytime. Until next time, may the Force be with you!
>>>
See how Llama3 concluded the conversation with a may the Force be with you! as I was asking about Star Wars; this guy is kind of smart :) -- and actually CTRL+D closes the conversation, not exit. Interesting to see how the same conversation goes with for example llama2 which is llama3 predecessor's:
$ ollama run llama2
>>> in a word, who are you ?
I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I am trained on a massive dataset of text from the internet and can generate human-like responses to a wide range of topics and questions. I can be used to create chatbots, virtual assistants, and other applications that require natural language understanding and generation capabilities.
>>> who is luke skywalker father ?
Luke Skywalker's father is Anakin Skywalker.
>>> exit
You have entered the command to exit. I am sorry, but I cannot comply with this request as I am a text-based AI assistant and do not have the capability to physically exit or leave a virtual environment. Is there anything else I can help you with?
>>>
This is not looking as good as llama3; let's try with mistral which is a another model with has a great reputation:
$ ollama run mistral
>>> in a word, who are you ?
 Assistant. I'm here to help! How can I assist you today?
 >>> who is luke skywalker father ?
 Luke Skywalker's father is Darth Vader, who is revealed to be Anakin Skywalker in the original Star Wars trilogy. However, this is a twist that is known from the beginning of the sequel trilogy, as Rey reveals Darth Vader's identity to Luke in "The Last Jedi."
>>> exit
 Goodbye for now! Don't hesitate to come back if you have any more questions or need assistance with something else. I'm always here to help!
>>>
Again some differences which makes sense: it is different models. Anyway, I'll leave you to it, I could play with these things all day long :)

4/ Update ollama:
Last but not least, ollama being kind of new, you may need to update it on a regular basis; hopefully, this is also very simple; remember that command line to install ollama? well it is the same which is going to perform the upgrade !:
$ ollama -v
ollama version is 0.1.38
$ curl -fsSL https://ollama.com/install.sh | sh
>>> Downloading ollama...
######################################################################## 100.0%##O#-#
>>> Installing ollama to /usr/local/bin...
[sudo] password for fred:
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
>>> Nvidia GPU detected.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
$ ollama -v
ollama version is 0.1.47
$

That wraps it up for this one; ollama is a very easy tool to use, there is no excuse not using it and playing with these famous LLMs!

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google Colab.
For those who don't know, Google Colab is a hosted jupyter notebook service which requires no setup and where you can access resources like CPU, GPU and TPU for free (you can also get some subscriptions for more GPU power, etc... but free tier is good enough for training and learning).
  • First thing is to head over to Google Colab and create a new notebook
  • Go to Runtime => Change runtype type and choose GPU
  • Click Connect on the top right and you'll get your notebook connected some RAM, some disk and a GPU
To check that the CUDA environment is OK, you can first check which GPU has been allocated to your notebook (type the below commands in a Code cell and CTRL+ENTER or click on the play icon on the right of the cell):
!nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+                                                                                        
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
We got assigned a Tesla T4 GPU which is not a top notch GPU but it is faaaaaaaaar more than what we would ever need for training/learning (and remember, all of this is free; Google's treat).

Let's also check if the CUDA Compiler is ready:
!nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
Great, all good in just a few clicks!

Now it would be nice to run the same deviceQuery sample script as the one we ran on Windows and on WSL. As a reminder, deviceQuery is a sample script provided by NVidia in this git repo.
To achieve that, we need to:
Yeah OK that's boring and it looks complicated so I got you covered; just use that notebook I made for you and click on play (or CTRL + F9 will run all the cells at once !).

In Colab:
  • File => Open Notebook(CTRL + O)
  • Choose Github
  • Paste the link of the notebook: https://github.com/freddenis/cuda-training/blob/main/00_cuda_deviceQuery.ipynb
  • Run cell by cell to see what happens or CTRL + F9 and check the outoput the bottom of the page

And here is the deviceQuery CUDA sample script output on Google Colab using that Tesla T4 GPU:
!time ./deviceQuery
  ./deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla T4"
  CUDA Driver Version / Runtime Version          12.2 / 12.2
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 15102 MBytes (15835660288 bytes)
  (040) Multiprocessors, (064) CUDA Cores/MP:    2560 CUDA Cores   <== This is still a beast of a GPU!
  GPU Max Clock rate:                            1590 MHz (1.59 GHz)
  Memory Clock rate:                             5001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        65536 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 4
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.2, NumDevs = 1
Result = PASS

real	0m0.133s
user	0m0.014s
sys	0m0.114s
What an easy and very powerful tool! (ah yes, it is also free)

Some bash tips -- 17 -- SECONDS

Reinventing the wheel is rooted deep inside our DNA. I really try to never reinvent the wheel but I recently got caught (!). Indeed, wh...