What we want is a machine that can learn from experience… the possibility of letting the machine alter its own instructions provides the mechanism for this.

– Alan Turing, 1947

Text

Image

Audio

Video

3D Models

Numbers

Modalities

Technologies

GANs

VAEs

Autoregressive Models

Diffusion Models

Flow-based Models

Transformers

ModalitiesTextImageAudioVideo3D ModelsText GenerationText-to-SpeechImage GenerationImage-to-Image TranslationMusic GenerationVoice SynthesisVideo GenerationDeepfakes3D Reconstruction3D Object Generation

About Diffusion Models

Learning by destroying

Learning by destroying

Score based generative models

See https://arxiv.org/abs/2112.07068

About Language Models

timeline
    title History of Language Models
    1940's : Turing Lecture
    1950's : Turing Test
         : Artificial Intelligence Coined
    1960's : Eliza (chatbot) 
    2010's : IBM Watson
         : Transformers (architecture)
         : ULMFit
         : GPT-1
         : BERT
         : GPT-2
    2020 : Meena (chatbot)
        : BlenderBot (chatbot)
        : GPT-3
        : GPT-3 writes a newspaper column
    2021 : The Pile v1 (dataset)
        : Wudao 1.0
        : GPT-J
        : LaMDA (chatbot)
        : Wudao 2.0
        : M6 1T
        : Jurassic-1
        : Megatron
        : M6 10T
        : BERT 200B
    2022 : Chinchilla
        : BLOOM
        : PaLM
        : LaMDA 2 (chatbot)
        : GPT-3.5
        : ChatGPT
    2023 : LLaMA
        : Alpaca
        : GPT-4
        : PaLM 2
        : Phi-1
        : Claude 2
        : LLama 2
        : Falcon
        : ERNIE 4.0
        : Grok-1
        : Gemini
    2024 : Sora
        : Qwen2
        : Yi-XLarge
        : Inflection-2.5
        : SenseNova 5.0
        : Gemini 1.5
        : Claude 3
        : GPT-4o
        : LLama 3.1
        : Phi-3.5
        : SmolLM
        : Grok-2
        : Mistral NeMo
        : LLaVA

Open source models are the future

Word Embeddings

“I love you very much”

In the eyes of a Language Model

V1 V2 V3 V4 V512
I -0.24 0.55 0.87 0.41 2.20
love 0.47 -0.54 0.12 -0.05 -0.73
you 0.11 -0.21 2.37 0.74 0.13
very -0.05 0.21 -0.86 -0.68 -0.95
much 0.74 0.25 0.02 -1.38 0.46

“If I should think of love I’d think of you, your arms uplifted, Tying your hair in plaits above, The lyre shape of your arms and shoulders, The soft curve of your winding head. No melody is sweeter, nor could Orpheus So have bewitched. I think of this, And all my universe becomes perfection. But were you in my arms, dear love, The happiness would take my breath away, No thought could match that ecstasy, No song encompass it, no other worlds. If I should think of love, I’d think of you.”

– Shakespeare

Why is this useful?

What are LLM’s trying to do?

The 20-80 rule about training

Image by Holistic AI

Where do facts live?

Content generation

In March 2021, GPT-3 was typing 3.1 million words per minute, non-stop, 24×7. With the general availability of the model, I expect that number is a lot higher now… (Nov/2021).

  • Per day = 4,500,000,000 (4.5 billion)
  • Per hour = 187,500,000 (187.5 million)
  • Per minute = 3,125,000 (3.125 million)

Hardware

The supercomputer developed for OpenAI (May 2020) is a single system with more than 285,000 CPU cores, 10,000 GPUs 1 and 400 gigabits per second of network connectivity for each GPU server.

– https://blogs.microsoft.com/ai/openai-azure-supercomputer/

Training time

Training GPT-3 with 175 billion parameters would require approximately 288 years with a single V100 NVIDIA GPU.

– https://arxiv.org/pdf/2104.04473.pdf

Understanding

Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties.

– https://arxiv.org/pdf/2108.07258

AI Content is eating the web

145M words per minute

Understanding Prompts and Templates for Large Language Models

Template

What it is

  • A template is a pre-defined structure for a prompt.
  • It can include placeholders for specific information.
  • Templates allow for generating multiple prompts with a similar structure.

What it looks like

{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>

Llama 3.1 Template

{{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|>
{{- if .System }}

{{ .System }}
{{- end }}
{{- if .Tools }}

Cutting Knowledge Date: December 2023

When you receive a tool call response, use the output to format an answer to the orginal user question.

You are a helpful assistant with tool calling capabilities.
{{- end }}<|eot_id|>
{{- end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{ range $.Tools }}
{{- . }}
{{ end }}
Question: {{ .Content }}<|eot_id|>
{{- else }}

{{ .Content }}<|eot_id|>
{{- end }}{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}
{{ range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}

{{ .Content }}
{{- end }}{{ if not $last }}<|eot_id|>{{ end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}
{{- end }}

Beyond prompts and templates

Request

{
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue? Use one sentence in your answer."
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0,
    "num_ctx": 1024
  }
}

Output

{
  "message": {
    "role": "assistant",
    "content": "The sky appears to be a brilliant shade of blue due to the scattering of shorter wavelengths by atmospheric molecules, which scatters sunlight more than longer wavelengths."
  }
}

Temperature matters a lot

  • Question: “Why is the sky blue? Use one sentence in your answer.”
  • Temperature 0: “The sky appears to be a brilliant shade of blue due to the scattering of shorter wavelengths by atmospheric molecules, which scatters sunlight more than longer wavelengths.”
  • Temperature 100: “The sky’s bluish appearance is attributed to a fascinating combination of light refraction, scattering, and human perception. At around 500 nanometers in wavelength (i.e., a midpoint between blue and red on the color wheel), these wavelengths are responsible for absorbing other frequencies of light while bouncing off molecules in the air and water vapor present within our atmosphere. As a result, these shorter wavelengths primarily pass through us while the longer waves – which comprise both our surroundings’ surfaces as well as their reflected sunlight – get scattered outward in every direction by particles that exist within them such as dust particles, tiny air bubbles or pollen grains found throughout our world’s ecosystem…”

But there are many others

{
  "model": "llama3.1",
  "prompt": "Why is the sky blue?",
  "stream": false,
  "options": {
    "num_keep": 5,
    "seed": 42,
    "num_predict": 100,
    "top_k": 20, "top_p": 0.9, "min_p": 0.0,
    "tfs_z": 0.5,
    "typical_p": 0.7,
    "repeat_last_n": 33,
    "temperature": 0.8,
    "repeat_penalty": 1.2, "presence_penalty": 1.5, "frequency_penalty": 1.0,
    "mirostat": 1, "mirostat_tau": 0.8, "mirostat_eta": 0.6,
    "penalize_newline": true,
    "stop": ["\n", "user:"],
    "numa": false,
    "num_ctx": 1024,
    "num_batch": 2,
    "num_gpu": 1,
    "main_gpu": 0,
    "low_vram": false,
    "f16_kv": true,
    "vocab_only": false,
    "use_mmap": true,
    "use_mlock": false,
    "num_thread": 8
  }
}

Applications

Conclusions

Thank you!

Michael Green, CEO Desupervised mike@desupervised.io +4531766142

Backup