{"id":950,"date":"2024-09-15T13:14:56","date_gmt":"2024-09-15T11:14:56","guid":{"rendered":"https:\/\/kubicek.ai\/reflection-on-language-models-that-are-not-actually-language-models-at-all\/"},"modified":"2025-07-10T13:34:10","modified_gmt":"2025-07-10T11:34:10","slug":"reflection-on-language-models-that-are-not-actually-language-models-at-all","status":"publish","type":"post","link":"https:\/\/www.kubicek.ai\/en\/reflection-on-language-models-that-are-not-actually-language-models-at-all\/","title":{"rendered":"Reflection on language models that are not actually language models at all"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>Today&#8217;s reflection is based on the recent <\/em><a href=\"https:\/\/x.com\/karpathy\/status\/1835024197506187617\">reflection by Andrej Karpathy<\/a><em>co-founder of OpenAI and former head of AI at Tesla.\nKarpathy comes up with a provocative idea:   <\/em>large language models (large language models or <a href=\"https:\/\/www.kubicek.ai\/lexicon\/llm\/\" class=\"lex-link\">LLM<\/a>)  <em>may not have much to do with language per se.\nThis reflection develops his discussion of LLMs as universal tools and outlines the potential implications of this view for the future of AI. <\/em><\/p>\n\n<h2 class=\"wp-block-heading\">Listen to the article also as an audio transcript<\/h2>\n\n<iframe style=\"border-radius:12px\" src=\"https:\/\/open.spotify.com\/embed\/episode\/6ucMQbGDs4as4lj5Zg6Upo?utm_source=generator&#038;theme=0\" width=\"100%\" height=\"152\" frameborder=\"0\" allowfullscreen=\"\" allow=\"autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture\" loading=\"lazy\"><\/iframe>\n\n<div style=\"height:33px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<p class=\"wp-block-paragraph\">It is remarkable, and perhaps a little puzzling, that the big language models may not actually have much to do with language per se; rather, it is the historical development that has led to this association.\nThat is, although they are called &#8220;linguistic&#8221;, their underlying principles and mechanisms of operation are far from limited to natural language processing. <\/p>\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#abb7c23b\"><em>&#8220;What I&#8217;ve seen though is that the word &#8220;language&#8221; is misleading people to think LLMs are relegated to text applications.&#8221;<\/em> <a href=\"https:\/\/x.com\/karpathy\/status\/1835027990033682852\">Andrej Karpathy at X<\/a><strong>.com<\/strong><\/p>\n<\/blockquote>\n\n<p class=\"wp-block-paragraph\">Originally, these models were developed to work with text, which led to their name.\nHowever, their capabilities extend beyond this area and can be applied to different types of data.\nThese models represent a highly versatile technology for statistical modelling of <a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" class=\"lex-link\">Token<\/a>&#8220;><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a> flows. <strong>Tokens<\/strong> are basic units of information that can represent words, characters, or other discrete elements.  <\/p>\n\n<p class=\"wp-block-paragraph\">In the context of natural language processing, tokens can be, for example, words or even single characters.\nYou can find out what OpenAI considers a <a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" title=\"Token\"><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a> in their language models on their <a href=\"https:\/\/platform.openai.com\/tokenizer\">website<\/a>.\nHowever, in other domains, tokens can represent, for example, pixels in an image, audio frequencies in an audio recording, or movements within a robot&#8217;s action plan.  <\/p>\n\n<p class=\"wp-block-paragraph\">A more accurate name, according to Karpathy, would be<strong>&#8220;autoregressive transformers<\/strong>&#8220;.\nThis would much better describe their true nature and mechanism of operation.\nAutoregressive transformers are models that predict the next <a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" title=\"Token\"><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a> based on the previous tokens in the sequence, which is the basic principle of how LLM works.  <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>Autoregressive <\/strong>means that the model builds on previous tokens to generate each new <a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" title=\"Token\"><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a>.\nLet&#8217;s think of this as a sentence generation example: when the model generates a new word, it takes into account all the previous words it has already generated to correctly predict the next word.\nThis process is similar to how we humans would compose a sentence &#8211; we choose each new word based on the context of the previous words.\nThis approach is &#8220;autoregressive&#8221; because the model returns to its own output to generate the next step.   <\/p>\n\n<p class=\"wp-block-paragraph\">And <strong>&#8220;transformer&#8221; <\/strong>has nothing to do with movie robots from the planet Cybertron transforming into cars and back.\nOr even an unsightly cube with hundreds of electrical wires that increase or decrease voltage.\nIt&#8217;s simply the name of a specific architecture that ensures the model can process long sequences of data efficiently and with minimal constraints.\nThe transformer is the basis of many modern LLMs requiring their high performance.   <\/p>\n\n<p class=\"wp-block-paragraph\">LLM does not inherently care whether tokens represent text fragments, image sections, audio sections, or action choices.\nThat is, these models are capable of handling any data as long as <strong>that data is converted into a sequence of discrete tokens.<\/strong> A discrete <a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" title=\"Token\"><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a> is essentially a basic unit of information that is discrete and immutable, i.e. &#8220;discrete&#8221; in the mathematical sense.\nThus, the key point is that data can be represented as a sequence of tokens, which allows a unified approach to processing.  <\/p>\n\n<p class=\"wp-block-paragraph\">If we can reduce our problem to modeling such flows, we can apply LLM to it.\nThis capability means that LLMs can be used in a variety of domains, not just natural language processing. <\/p>\n\n<p class=\"wp-block-paragraph\">The versatility we are talking about suggests that <strong>the potential of these models extends far beyond language processing and can affect a wide range of disciplines and applications.<\/strong> It can lead to new possibilities in areas ranging from computer vision, audio processing or even bioinformatics.\nIn chemistry, where molecules can be represented as sequences of atoms and bonds, the use of LLM will lead to the prediction of new chemical properties or synthesis pathways. <\/p>\n\n<p class=\"wp-block-paragraph\">In biology, when researching proteins, specifically when generating new amino acid sequences, discrete tokens can be thought of as individual amino acids that make up a protein sequence.\nProteins are made up of chains of amino acids, and each amino acid can be thought of as a single &#8220;<a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" title=\"Token\"><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a>&#8221; within that sequence.\nSpecifically, proteins have complex structures and functions that are determined by the amino acid sequence.\nUsing LLM models, new protein structures can be investigated and predictions made about how they will function or what properties they will have.\nNew proteins then bind to specific molecules, which is essential in the development of new drugs or treatments.    <\/p>\n\n<p class=\"wp-block-paragraph\">With the progressive development of LLM technology, we can witness the convergence of many problems into this unified modeling framework.\nThis means that various problems that were previously solved using specific models and techniques can now be handled through LLM. <\/p>\n\n<p class=\"wp-block-paragraph\">The basic task boils down to predicting the next <a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" title=\"Token\"><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a>, with the meaning and interpretation of these tokens varying according to the specific domain.\nThis unification could facilitate solving complex problems through a unified approach.\nIt can lead to simplification of processes, reduction of development costs and acceleration of innovation in different domains.  <\/p>\n\n<p class=\"wp-block-paragraph\">If this trend does take hold, it would suggest that current deep learning frameworks are perhaps too general for most practical applications.\nThese frameworks offer thousands of operations and layers for arbitrary configuration, providing tremendous flexibility.\nHowever, if the vast majority of problems could be solved using LLM, this flexibility might be redundant.    <strong>A Swiss Army knife is a great thing, but sometimes it&#8217;s better to just use a screwdriver.  <\/strong>This will lead to the development of more specialized tools and frameworks optimized for LLM implementation and training, simplifying the process of model development and deployment.\nSpecialized tools will be more efficient, user-friendly and better adapted to the specific needs of LLM-based applications. <\/p>\n\n<h2 class=\"wp-block-heading\">One ring to rule them all?\nNot always! <\/h2>\n\n<p class=\"wp-block-paragraph\">To claim that this view fully reflects reality would be simplistic.\nIt is likely to be only partially true.\nFor example.\nreal-time systems such as self-driving cars require immediate reactions to a changing environment.\nHere, models are used that can process parallel sensor inputs and quickly generate responses, which may not be optimal for sequential models like LLM.    <\/p>\n\n<p class=\"wp-block-paragraph\">Another aspect is the <strong>structure of the data<\/strong>.\nSome data have complex relationships that are not linear. For example, <strong>graph neural networks<\/strong> are designed to work with data that can be represented as nodes and edges.\nThese structures cannot be easily converted to a sequence of tokens without losing important information.  <\/p>\n\n<p class=\"wp-block-paragraph\">Although LLMs offer a powerful and versatile tool, there are areas where specific architectures and approaches are needed that cannot be easily translated to the problem of next-<a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\" title=\"Token\"><a href=\"https:\/\/www.kubicek.ai\/lexicon\/token\/\">Token<\/a><\/a> prediction.\nSome tasks, such as modeling physical systems, simulating complex interactions, or solving problems with high levels of causality, require a deeper understanding of the structure of the data and the relationships between them.\nThese tasks go beyond the capabilities of current LLMs, which are optimized for sequential data processing.  <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>Sequential data processing<\/strong> means that the data is processed as a sequence, where the order of the elements (tokens) is important.\nFor text, this is obvious because the meaning of a sentence depends mostly on the order of the words.\nIn audio recordings, on the other hand, it is the sequence of sound frequencies that preserve the meaning of speech or music. <strong>Alternatives to sequential processing<\/strong> include, for example, processing data in a matrix or graph structure where the relationships between elements are not linear.\nFor example, in computer vision, convolutional neural networks are used that process an image as a two-dimensional array of pixels rather than as a sequence.   <\/p>\n\n<p class=\"wp-block-paragraph\">Although it is theoretically possible to represent different types of data as tokens, in practice important information may be lost or the difficulty of training the model may increase.\nOptimizing and adapting LLM for specific tasks can be complex and may not always yield the best results.\nIn some cases, specialized models may simply provide better performance and efficiency.  <\/p>\n\n<p class=\"wp-block-paragraph\">Large language models represent a huge step forward due to their ability to model a wide range of problems through a unified paradigm.\nThis versatility opens up new possibilities and can accelerate development in many areas.\nAt the same time, however, it is essential to <strong>maintain a critical perspective and be aware of their limitations<\/strong>.  <\/p>\n\n<p class=\"wp-block-paragraph\">The future of artificial intelligence will not be a one-size-fits-all approach, but a combination of different methods and tools that will work together and complement each other.  <strong>Different problems may require different approaches.  <\/strong>It is therefore important to have tools and frameworks that can handle these different situations.<\/p>\n\n<p class=\"wp-block-paragraph\">Only in this way will we be able to effectively solve both general and highly specific problems, and fully exploit the potential that these great technologies offer us!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today&#8217;s reflection is based on the recent reflection by Andrej Karpathyco-founder of OpenAI and former head of AI at Tesla. Karpathy comes up with a provocative idea: large language models (large language models or LLM) may not have much to do with language per se. This reflection develops his discussion of LLMs as universal tools [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1372,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","_seopress_robots_follow":"","_seopress_robots_imageindex":"","_seopress_robots_snippet":"","_seopress_robots_primary_cat":"","_seopress_robots_breadcrumbs":"","_seopress_robots_freeze_modified_date":"","_seopress_robots_custom_modified_date":"","_seopress_robots_canonical":"","_seopress_social_fb_title":"","_seopress_social_fb_desc":"","_seopress_social_fb_img":"","_seopress_social_fb_img_attachment_id":0,"_seopress_social_fb_img_width":0,"_seopress_social_fb_img_height":0,"_seopress_social_twitter_title":"","_seopress_social_twitter_desc":"","_seopress_social_twitter_img":"","_seopress_social_twitter_img_attachment_id":0,"_seopress_social_twitter_img_width":0,"_seopress_social_twitter_img_height":0,"_seopress_redirections_value":"","_seopress_redirections_enabled":"","_seopress_redirections_enabled_regex":"","_seopress_redirections_logged_status":"","_seopress_redirections_param":"","_seopress_redirections_type":0,"_seopress_analysis_target_kw":"","_seopress_news_disabled":"","_seopress_video_disabled":"","_seopress_video":[],"_seopress_pro_schemas_manual":[],"_seopress_pro_rich_snippets_disable_all":"","_seopress_pro_rich_snippets_disable":[],"_seopress_pro_schemas":[],"footnotes":""},"categories":[10,1],"tags":[],"cat_tool":[],"class_list":["post-950","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","category-uncategorized-cs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/posts\/950","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/comments?post=950"}],"version-history":[{"count":38,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/posts\/950\/revisions"}],"predecessor-version":[{"id":2884,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/posts\/950\/revisions\/2884"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/media\/1372"}],"wp:attachment":[{"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/media?parent=950"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/categories?post=950"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/tags?post=950"},{"taxonomy":"cat_tool","embeddable":true,"href":"https:\/\/www.kubicek.ai\/en\/wp-json\/wp\/v2\/cat_tool?post=950"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}