Hyperparameters in LLM
There are several hyperparameter that are generally used to tweak the output of Large Language models (LLM). This post deals with a few common hyperparameters that can be tweaked via API’s.
Temperature
Temperature controls the sharpness of the distribution. It is common to use temperature between 0 and 1.
A higher temperature (t ->1) :
- Makes the distribution more uniform
- Reduces the difference between high and low probabilities
- Increases the probability of selecting less likely outcomes
- Leads to more diverse selections
- Can lead to robustness by considering more elements
A lower temperature (t->0) :
- Makes the distribution more peaked
- Increases the difference between high and low probabilities
- Decreases the probability of selecting less likely outcomes
- Leads to more concetrated selection
- can make the model more deterministic
Mathematically, temperature (t) is used to scale the logits before applying softmax function.
softmax(logits/t)
For example, suppose we have distribution over 5 elements [0.1,0.2,03,0.2,0.2]. With high temperature , the distribution becomes more uniform [0.18,0.2,0.22,0.2,0.2] and with low temperature, the distibution becomes more peaked :[0.01, 0.1, 0.78, 0.1, 0.01]
Top P sampling
Top P sampling selects the smallest set of elements whose cumulative probability exceeds a threshold P (eg : .8). This is useful if we want to:
- Avoid selecting low probability elements
- Capture the most significant portion of the distribution
When to use this method
- You need a more flexible and adaptive sampling strategy
Top K sampling
This selects the top K elements with the highest probabilities from the distribution. This method is useful when we want to
- Focus on the most likely outcomes
- Improve computational efficiency
- Reduce the dimensionality of the output space
When to use this method
- You have a clear idea of the number of elements that you want to select
Other Hyperparameters
- Max length : This limits the total length of the generated text
- Repetiton penalty : Reduces the likelihood of repeating tokens.