Running your own Large Language Models

A Small Journey in AI

Michael Green

2024-05-28

About Language Models

timeline
    title History of Language Models
    1940's : Turing Lecture
    1950's : Turing Test
         : Artificial Intelligence Coined
    1960's : Eliza (chatbot) 
    2010's : IBM Watson
         : Transformers (architecture)
         : ULMFit
         : GPT-1
         : BERT
         : GPT-2
    2020 : Meena (chatbot)
        : BlenderBot (chatbot)
        : GPT-3
        : GPT-3 writes a newspaper column
    2021 : The Pile v1 (dataset)
        : Wudao 1.0
        : GPT-J
        : LaMDA (chatbot)
        : Wudao 2.0
        : M6 1T
        : Jurassic-1
        : Megatron
        : M6 10T
        : BERT 200B
    2022 : Chinchilla
        : BLOOM
        : PaLM
        : LaMDA 2 (chatbot)
        : GPT-3.5
        : ChatGPT
    2023 : LLaMA
        : Alpaca
        : GPT-4
        : PaLM 2
        : Phi-1
        : Claude 2
        : LLama 2
        : Falcon
        : ERNIE 4.0
        : Grok-1
        : Gemini
    2024 : Sora
        : Gemini 1.5
        : Claude 3
        : LLama 3
        : Phi-3

Word Embeddings

“I love you very much”

In the eyes of a Language Model

	V1	V2	V3	V4	…	V512
I	0.60	0.00	1.39	-0.60	…	-0.64
love	0.00	-2.12	-1.37	-1.10	…	-1.20
you	-1.55	-0.24	1.90	0.48	…	-0.76
very	0.27	-0.81	-2.46	1.36	…	0.62
much	2.33	0.59	-1.26	-0.76	…	0.28

“If I should think of love I’d think of you, your arms uplifted, Tying your hair in plaits above, The lyre shape of your arms and shoulders, The soft curve of your winding head. No melody is sweeter, nor could Orpheus So have bewitched. I think of this, And all my universe becomes perfection. But were you in my arms, dear love, The happiness would take my breath away, No thought could match that ecstasy, No song encompass it, no other worlds. If I should think of love, I’d think of you.”

– Shakespeare

“If I should think of love I’d think of you, your arms uplifted, Tying your hair in plaits above, The lyre shape of your arms and shoulders, The soft curve of your winding head. No melody is sweeter, nor could Orpheus So have bewitched. I think of this, And all my universe becomes perfection. But were you in my arms, dear love, The happiness would take my breath away, No thought could match that ecstasy, No song encompass it, no other worlds. If I should think of love, I’d think of you.”

– Shakespeare

Why is this useful?

Tidbits about LLMs

Content generation

In March 2021, GPT-3 was typing 3.1 million words per minute, non-stop, 24×7. With the general availability of the model, I expect that number is a lot higher now… (Nov/2021).

Per day = 4,500,000,000 (4.5 billion)
Per hour = 187,500,000 (187.5 million)
Per minute = 3,125,000 (3.125 million)

Hardware

The supercomputer developed for OpenAI (May 2020) is a single system with more than 285,000 CPU cores, 10,000 GPUs ¹ and 400 gigabits per second of network connectivity for each GPU server.

– https://blogs.microsoft.com/ai/openai-azure-supercomputer/

Training time

Training GPT-3 with 175 billion parameters would require approximately 288 years with a single V100 NVIDIA GPU.

– https://arxiv.org/pdf/2104.04473.pdf

Understanding

Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties.

– https://arxiv.org/pdf/2108.07258

The 20-80 rule about training

Image by Holistic AI

Enough talk!

Let’s make our own!

The What

We’re going to deploy a quantized model of Phi-3 which can run on consumer hardware

The Why

Because we can

Running your own language model enables you to be productive, safe, compliant, anonymous and autonomous. Also without an internet connection..

Applications

Coding copilot
Text summarization
Marketing materials
Copywriting
Information
Relaxing
Translation
Automation

The process

Decide on a model
Choose a server
Choose a UI

Understanding Prompts and Templates for Large Language Models

What are prompts?

System Prompt

A system prompt sets the overall goal or task for the LLM.
It provides context about the interaction between the user and the LLM.
Examples of system prompts include
- “What would you like me to do today?”
- “Write a story based on the following keywords.”

Prompt

A prompt is a specific instruction given to the LLM to complete a task.
It can be a question, a statement, or a creative writing prompt.
The quality of the prompt heavily influences the quality of the LLM’s output.

Template

What it is

A template is a pre-defined structure for a prompt.
It can include placeholders for specific information.
Templates allow for generating multiple prompts with a similar structure.

What it looks like

{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>

Demo time!

Michael Green, CEO Desupervised mike@desupervised.io +4531766142