Hyperparameters in LLM

There are several hyperparameter that are generally used to tweak the output of Large Language models (LLM). This post deals with a few common hyperparameters that can be tweaked via API’s.

Temperature

Temperature controls the sharpness of the distribution. It is common to use temperature between 0 and 1.

A higher temperature (t ->1) :

Makes the distribution more uniform
Reduces the difference between high and low probabilities
Increases the probability of selecting less likely outcomes
Leads to more diverse selections
Can lead to robustness by considering more elements

A lower temperature (t->0) :

Makes the distribution more peaked
Increases the difference between high and low probabilities
Decreases the probability of selecting less likely outcomes
Leads to more concetrated selection
can make the model more deterministic

Mathematically, temperature (t) is used to scale the logits before applying softmax function.

softmax(logits/t)

For example, suppose we have distribution over 5 elements [0.1,0.2,03,0.2,0.2]. With high temperature , the distribution becomes more uniform [0.18,0.2,0.22,0.2,0.2] and with low temperature, the distibution becomes more peaked :[0.01, 0.1, 0.78, 0.1, 0.01]

Top P sampling

Top P sampling selects the smallest set of elements whose cumulative probability exceeds a threshold P (eg : .8). This is useful if we want to:

Avoid selecting low probability elements
Capture the most significant portion of the distribution

When to use this method

You need a more flexible and adaptive sampling strategy

Top K sampling

This selects the top K elements with the highest probabilities from the distribution. This method is useful when we want to

Focus on the most likely outcomes
Improve computational efficiency
Reduce the dimensionality of the output space

When to use this method

You have a clear idea of the number of elements that you want to select

Other Hyperparameters

Max length : This limits the total length of the generated text
Repetiton penalty : Reduces the likelihood of repeating tokens.

August 29, 2024