Transformers Math

hidden_size = intermediate_size * 0.25

(still working on this one, don't use.) num_attention_heads = 64 / ( intermediate_size / 1024)

intermediate_size = num_attention_heads * 128

Usually people use multiple of 2 like 128, 256 and 512 for intermediate_size. Higher the intermediate_size the better the capturing of information but more the time required for training

find 4-bit size of LLM (needs more testing, wouldn't recommend)

((size of LLM in GB) / 8) * 4 = (8-bit model size in gb)

find ram requirement for ggml LLM

(size of LLM in GB)* 2.5 = (ram needed to run llm in gb)

usually you times by a number between 2-4

2.5 is a good number to find the ram needed for a ggml LLM

find vram requirement for fine-tuning a transfomers model

(S * 4) * (E / 1024) = (gigs of vram needed to fine-tune the model with a batch size of 1)

S=size of model in GB

E=max_position_embeddings

math to find parameter count for model

(T * D)+(12 * N * D^2)= (parameter count for the ai model)

T=vocab_size

D=n_positions

N=n_layer

source for the parameter count math is

kipp.ly/transformer-param-count

or if that isn't up anymore then you can view it

here