A Journey in Neural Networks
2024-09-01
What we want is a machine that can learn from experience… the possibility of letting the machine alter its own instructions provides the mechanism for this.
– Alan Turing, 1947
See https://arxiv.org/abs/2112.07068
“I love you very much”
V1 | V2 | V3 | V4 | … | V512 | |
---|---|---|---|---|---|---|
I | -0.24 | 0.55 | 0.87 | 0.41 | … | 2.20 |
love | 0.47 | -0.54 | 0.12 | -0.05 | … | -0.73 |
you | 0.11 | -0.21 | 2.37 | 0.74 | … | 0.13 |
very | -0.05 | 0.21 | -0.86 | -0.68 | … | -0.95 |
much | 0.74 | 0.25 | 0.02 | -1.38 | … | 0.46 |
“If I should think of love I’d think of you, your arms uplifted, Tying your hair in plaits above, The lyre shape of your arms and shoulders, The soft curve of your winding head. No melody is sweeter, nor could Orpheus So have bewitched. I think of this, And all my universe becomes perfection. But were you in my arms, dear love, The happiness would take my breath away, No thought could match that ecstasy, No song encompass it, no other worlds. If I should think of love, I’d think of you.”
– Shakespeare
Content generation
In March 2021, GPT-3 was typing 3.1 million words per minute, non-stop, 24×7. With the general availability of the model, I expect that number is a lot higher now… (Nov/2021).
Hardware
The supercomputer developed for OpenAI (May 2020) is a single system with more than 285,000 CPU cores, 10,000 GPUs 1 and 400 gigabits per second of network connectivity for each GPU server.
– https://blogs.microsoft.com/ai/openai-azure-supercomputer/
Training time
Training GPT-3 with 175 billion parameters would require approximately 288 years with a single V100 NVIDIA GPU.
– https://arxiv.org/pdf/2104.04473.pdf
Understanding
Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties.
– https://arxiv.org/pdf/2108.07258
145M words per minute
What it is
{{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|>
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}
Cutting Knowledge Date: December 2023
When you receive a tool call response, use the output to format an answer to the orginal user question.
You are a helpful assistant with tool calling capabilities.
{{- end }}<|eot_id|>
{{- end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}
Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
{{ range $.Tools }}
{{- . }}
{{ end }}
Question: {{ .Content }}<|eot_id|>
{{- else }}
{{ .Content }}<|eot_id|>
{{- end }}{{ if $last }}<|start_header_id|>assistant<|end_header_id|>
{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}
{{ range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}
{{ .Content }}
{{- end }}{{ if not $last }}<|eot_id|>{{ end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>
{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>
{{ end }}
{{- end }}
{{- end }}
{
"messages": [
{
"role": "user",
"content": "Why is the sky blue? Use one sentence in your answer."
}
],
"options": {
"seed": 101,
"temperature": 0,
"num_ctx": 1024
}
}
{
"model": "llama3.1",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20, "top_p": 0.9, "min_p": 0.0,
"tfs_z": 0.5,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2, "presence_penalty": 1.5, "frequency_penalty": 1.0,
"mirostat": 1, "mirostat_tau": 0.8, "mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"f16_kv": true,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"num_thread": 8
}
}
Michael Green, CEO Desupervised mike@desupervised.io +4531766142