Hyperparameters in LLM

There are several hyperparameter that are generally used to tweak the output of Large Language models (LLM). This post deals with a few common hyperparameters that can be tweaked via APIs.

Temperature

Temperature controls the sharpness of the distribution. It is common to use temperature between 0 and 1.

A higher temperature (t ->1) :

  • Makes the distribution more uniform
  • Reduces the difference between high and low probabilities
  • Increases the probability of selecting less likely outcomes
  • Leads to more diverse selections
  • Can lead to robustness by considering more elements

A lower temperature (t->0) :

  • Makes the distribution more peaked
  • Increases the difference between high and low probabilities
  • Decreases the probability of selecting less likely outcomes
  • Leads to more concetrated selection
  • can make the model more deterministic

Mathematically, temperature (t) is used to scale the logits before applying softmax function.

softmax(logits/t)

For example, suppose we have distribution over 5 elements [0.1,0.2,03,0.2,0.2]. With high temperature , the distribution becomes more uniform [0.18,0.2,0.22,0.2,0.2] and with low temperature, the distibution becomes more peaked :[0.01, 0.1, 0.78, 0.1, 0.01]

Top P sampling

Top P sampling selects the smallest set of elements whose cumulative probability exceeds a threshold P (eg : .8). This is useful if we want to:

  1. Avoid selecting low probability elements
  2. Capture the most significant portion of the distribution

When to use this method

  1. You need a more flexible and adaptive sampling strategy

Top K sampling

This selects the top K elements with the highest probabilities from the distribution. This method is useful when we want to

  1. Focus on the most likely outcomes
  2. Improve computational efficiency
  3. Reduce the dimensionality of the output space

When to use this method

  1. You have a clear idea of the number of elements that you want to select

Other Hyperparameters

  1. Max length : This limits the total length of the generated text
  2. Repetiton penalty : Reduces the likelihood of repeating tokens.

Date
August 29, 2024