Tokens are chunks of text into which a given input or output is divided. Chat GPT uses Natural Language Processing and Machine Learning languages, where a token can be as short as one character or as long as one word.
For example, ChatGPT is Great! can be tokenized into the following tokens:
- Chat
- G
- PT
- is
- Great
- !
Each token corresponds to a meaningful unit in the text. Spaces are usually tokens themselves, and punctuation marks can be separate tokens or part of adjacent words.