dpsc:深度学习短课程学习

Andrew Ng的Deep Learning短课程Short Courses | Learn Generative AI from DeepLearning.AI,此外还有Cousera上的课程.学的东西比较实用还比较新.

这些课程通常会使用一些公司的产品,比如Hugging Face的Gradio,diffusers,transformers,或者W&B的wandb等等(这两个我平常都在用),此外还有谷歌、微软以及Langchain,这些工具都比较实用. 如果关注生成领域,那Diffusion Model肯定要看,如果关注LLM以及chatbot那Langchain最好利用起来,如果自己训练部署模型,那也可以使用wandb.

这里我主要关注三部分:生成式人工智能,LLM,模型部署和训练辅助工具.

Reinforcement Learning from Human Feedback

image-20231224221652027

image-20231224231207302

Evaluating and Debugging Generative AI Models Using Weights and Biases

作为模型训练者可能会用到的网站.文档Python Library | Weights & Biases Documentation (wandb.ai)

1
2
3
4
5
wandb.init(
project="gpt5",
config=config,
)
wandb.log(metrics)

首先找到项目(如果没有就会另外创建),并且会根据config创建一个run.使用wandb.log输出最后结果.wandb会保存运行时系统环境信息,github repo甚至仓库文件信息.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def train_model(config):
"Train a model with a given config"

wandb.init(
project="gpt5",
config=config,
)

# Get the data
train_dl, valid_dl = get_dataloaders(DATA_DIR,
config.batch_size,
config.slice_size,
config.valid_pct)
n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config.batch_size)

# A simple MLP model
model = get_model(config.dropout)

# Make the loss and optimizer
loss_func = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=config.lr)

example_ct = 0

for epoch in tqdm(range(config.epochs), total=config.epochs):
model.train()

for step, (images, labels) in enumerate(train_dl):
images, labels = images.to(DEVICE), labels.to(DEVICE)

outputs = model(images)
train_loss = loss_func(outputs, labels)
optimizer.zero_grad()
train_loss.backward()
optimizer.step()

example_ct += len(images)
metrics = {
"train/train_loss": train_loss,
"train/epoch": epoch + 1,
"train/example_ct": example_ct
}
wandb.log(metrics)

# Compute validation metrics, log images on last epoch
val_loss, accuracy = validate_model(model, valid_dl, loss_func)
# Compute train and validation metrics
val_metrics = {
"val/val_loss": val_loss,
"val/val_accuracy": accuracy
}
wandb.log(val_metrics)

wandb.finish()

如何在同一个run更改,更新现有运行的配置.下面是我匿名使用wandb跑的一次run

1
2
3
4
5
6
import wandb
api = wandb.Api()

run = api.run("anony-mouse-988582345570149472/gpt5/<run_id>")
run.config["key"] = updated_value
run.update()

将单个运行的指标导出到CSV文件

1
2
3
4
5
6
7
8
9
import wandb
api = wandb.Api()

# run is specified by <entity>/<project>/<run_id>
run = api.run("anony-mouse-946987442323310233/dlai_sprite_diffusion/<run_id>")

# save the metrics for the run to a csv file
metrics_dataframe = run.history()
metrics_dataframe.to_csv("metrics.csv")

读取运行指标

1
2
3
4
5
6
7
import wandb
api = wandb.Api()

run = api.run("anony-mouse-946987442323310233/dlai_sprite_diffusion/<run_id>")
if run.state == "finished":
for i, row in run.history().iterrows():
print(row["_timestamp"], row["accuracy"])

当您从历史中提取数据时,默认情况下会对其采样到500点。使用run.scan_history()获取所有记录的数据点。下面是下载所有记录在历史中的丢失数据点的示例。

1
2
3
4
5
6
import wandb
api = wandb.Api()

run = api.run("anony-mouse-946987442323310233/dlai_sprite_diffusion/<run_id>")
history = run.scan_history()
losses = [row["loss"] for row in history]

wandb Table

1
table = wandb.Table(columns=["input_noise", "ddpm", "ddim", "class"])
1
2
3
4
5
6
7
8
9
10
for noise, ddpm_s, ddim_s, c in zip(noises, 
ddpm_samples,
ddim_samples,
to_classes(ctx_vector)):

# add data row by row to the Table
table.add_data(wandb.Image(noise),
wandb.Image(ddpm_s),
wandb.Image(ddim_s),
c)
1
2
3
4
5
with wandb.init(project="dlai_sprite_diffusion", 
job_type="samplers_battle",
config=config):

wandb.log({"samplers_table":table})

先创建wandb.Table,再使用其添加数据,最后使用log推上去

1
2
3
4
5
6
7
8
9
table = wandb.Table(columns=["prompt", "generation"])

for prompt in prompts:
input_ids = tokenizer.encode(prefix + prompt, return_tensors="pt")
output = model.generate(input_ids, do_sample=True, max_new_tokens=50, top_p=0.3)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
table.add_data(prefix + prompt, output_text)

wandb.log({'tiny_generations': table})

使用W&B sweep进行超参搜索调整

在高维超参数空间中搜索以找到最具性能的模型可能会变得非常困难。超参数扫描提供了一种有组织、高效的方式来进行一系列模型的战斗,并选择最准确的模型。它们通过自动搜索超参数值的组合(例如学习率、批量大小、隐藏层的数量、优化器类型)来找到最优化的值。Sweep结合了一种尝试一堆超参数值的策略和评估代码

  1. 定义sweep:通过创建一个字典或YAML文件来实现这一点,该文件指定了要搜索的参数、搜索策略、优化指标等。

首先选择搜索策略,包括网格,随机和贝叶斯.

  • grid Search – Iterate over every combination of hyperparameter values. Very effective, but can be computationally costly.
  • random Search – Select each new combination at random according to provided distributions. Surprisingly effective!
  • bayesian Search – Create a probabilistic model of metric score as a function of the hyperparameters, and choose parameters with high probability of improving the metric. Works well for small numbers of continuous parameters but scales poorly.
1
2
3
4
5
6
7
8
9
sweep_config = {
'method': 'random'
}
metric = {
'name': 'loss',
'goal': 'minimize'
}

sweep_config['metric'] = metric

然后是训练网络的一些超参

1
2
3
4
5
6
7
8
9
10
11
12
13
parameters_dict = {
'optimizer': {
'values': ['adam', 'sgd']
},
'fc_layer_size': {
'values': [128, 256, 512]
},
'dropout': {
'values': [0.3, 0.4, 0.5]
},
}

sweep_config['parameters'] = parameters_dict

通常情况下,有些超参数我们不想在这次扫描中发生变化,但我们仍然想在扫描_配置中设置。

1
2
3
4
parameters_dict.update({
'epochs': {
'value': 1}
})

`rand搜索策略可以指定一个正态分布进行选参数,默认是均匀分布.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
parameters_dict.update({
'learning_rate': {
# a flat distribution between 0 and 0.1
'distribution': 'uniform',
'min': 0,
'max': 0.1
},
'batch_size': {
# integers between 32 and 256
# with evenly-distributed logarithms
'distribution': 'q_log_uniform_values',
'q': 8,
'min': 32,
'max': 256,
}
})

所以参数包括metho,metricparameter.此外还有一些这里就不介绍了

  1. 初始化扫描:用一行代码初始化扫描并传入扫描配置字典:sweep_id=wandb.sweep(sweep_config)
1
sweep_id = wandb.sweep(sweep_config, project="pytorch-sweeps-demo")

创建一个sweep

  1. 运行扫描代理:也可以用一行代码完成,我们调用wandb.agent()并传递要运行的sweep_id,以及一个定义模型架构并对其进行训练的函数:wandb.agent(sweep_id,function=train)

开始正常的训练.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def train(config=None):
# Initialize a new wandb run
with wandb.init(config=config):
# If called by wandb.agent, as below,
# this config will be set by Sweep Controller
config = wandb.config

loader = build_dataset(config.batch_size)
network = build_network(config.fc_layer_size, config.dropout)
optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

for epoch in range(config.epochs):
avg_loss = train_epoch(network, loader, optimizer)
wandb.log({"loss": avg_loss, "epoch": epoch})
1
wandb.agent(sweep_id, train, count=5)

使用Sweep Controller返回的随机生成的超参数值,启动一个运行训练5次的agents

image-20231229183908912

另外课程还讲了Tracer等,现在我用不上….主要还是上传loss和acc这些结果.

How Diffusion Models Work

Diffusion Models在前段时间非常火,也是现在prompt生成图像的主要模型.

image-20231225195441214

image-20231225200851364

image-20231228224804703

image-20231228232748110

训练的预测噪声网络是Unet

image-20231229151559530

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
class ContextUnet(nn.Module):
def __init__(self, in_channels, n_feat=256, n_cfeat=10, height=28): # cfeat - context features
super(ContextUnet, self).__init__()

# number of input channels, number of intermediate feature maps and number of classes
self.in_channels = in_channels
self.n_feat = n_feat
self.n_cfeat = n_cfeat
self.h = height #assume h == w. must be divisible by 4, so 28,24,20,16...

# Initialize the initial convolutional layer
self.init_conv = ResidualConvBlock(in_channels, n_feat, is_res=True)

# Initialize the down-sampling path of the U-Net with two levels
self.down1 = UnetDown(n_feat, n_feat) # down1 #[10, 256, 8, 8]
self.down2 = UnetDown(n_feat, 2 * n_feat) # down2 #[10, 256, 4, 4]

# original: self.to_vec = nn.Sequential(nn.AvgPool2d(7), nn.GELU())
self.to_vec = nn.Sequential(nn.AvgPool2d((4)), nn.GELU())

# Embed the timestep and context labels with a one-layer fully connected neural network
self.timeembed1 = EmbedFC(1, 2*n_feat)
self.timeembed2 = EmbedFC(1, 1*n_feat)
self.contextembed1 = EmbedFC(n_cfeat, 2*n_feat)
self.contextembed2 = EmbedFC(n_cfeat, 1*n_feat)

# Initialize the up-sampling path of the U-Net with three levels
self.up0 = nn.Sequential(
nn.ConvTranspose2d(2 * n_feat, 2 * n_feat, self.h//4, self.h//4), # up-sample
nn.GroupNorm(8, 2 * n_feat), # normalize
nn.ReLU(),
)
self.up1 = UnetUp(4 * n_feat, n_feat)
self.up2 = UnetUp(2 * n_feat, n_feat)

# Initialize the final convolutional layers to map to the same number of channels as the input image
self.out = nn.Sequential(
nn.Conv2d(2 * n_feat, n_feat, 3, 1, 1), # reduce number of feature maps #in_channels, out_channels, kernel_size, stride=1, padding=0
nn.GroupNorm(8, n_feat), # normalize
nn.ReLU(),
nn.Conv2d(n_feat, self.in_channels, 3, 1, 1), # map to same number of channels as input
)

def forward(self, x, t, c=None):
"""
x : (batch, n_feat, h, w) : input image
t : (batch, n_cfeat) : time step
c : (batch, n_classes) : context label
"""
# x is the input image, c is the context label, t is the timestep, context_mask says which samples to block the context on

# pass the input image through the initial convolutional layer
x = self.init_conv(x)
# pass the result through the down-sampling path
down1 = self.down1(x) #[10, 256, 8, 8]
down2 = self.down2(down1) #[10, 256, 4, 4]

# convert the feature maps to a vector and apply an activation
hiddenvec = self.to_vec(down2)

# mask out context if context_mask == 1
if c is None:
c = torch.zeros(x.shape[0], self.n_cfeat).to(x)

# embed context and timestep
cemb1 = self.contextembed1(c).view(-1, self.n_feat * 2, 1, 1) # (batch, 2*n_feat, 1,1)
temb1 = self.timeembed1(t).view(-1, self.n_feat * 2, 1, 1)
cemb2 = self.contextembed2(c).view(-1, self.n_feat, 1, 1)
temb2 = self.timeembed2(t).view(-1, self.n_feat, 1, 1)
#print(f"uunet forward: cemb1 {cemb1.shape}. temb1 {temb1.shape}, cemb2 {cemb2.shape}. temb2 {temb2.shape}")


up1 = self.up0(hiddenvec)
up2 = self.up1(cemb1*up1 + temb1, down2) # add and multiply embeddings
up3 = self.up2(cemb2*up2 + temb2, down1)
out = self.out(torch.cat((up3, x), 1))
return out

训练的时候是random一个timestamp得到噪声,sampling的时候是根据步数进行依次sampling

control and speed up

加入context_vector进行控制输出,使用DDIM替换DDPM进行加速samping

image-20231229152240050

image-20231229152302772

image-20231229152531542

1
2
3
4
5
6
7
8
9
10
# define sampling function for DDIM   
# removes the noise using ddim
def denoise_ddim(x, t, t_prev, pred_noise):
ab = ab_t[t]
ab_prev = ab_t[t_prev]

x0_pred = ab_prev.sqrt() / ab.sqrt() * (x - (1 - ab).sqrt() * pred_noise)
dir_xt = (1 - ab_prev).sqrt() * pred_noise

return x0_pred + dir_xt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# fast sampling algorithm with context
@torch.no_grad()
def sample_ddim_context(n_sample, context, n=20):
# x_T ~ N(0, 1), sample initial noise
samples = torch.randn(n_sample, 3, height, height).to(device)

# array to keep track of generated steps for plotting
intermediate = []
step_size = timesteps // n
for i in range(timesteps, 0, -step_size):
print(f'sampling timestep {i:3d}', end='\r')

# reshape time tensor
t = torch.tensor([i / timesteps])[:, None, None, None].to(device)

eps = nn_model(samples, t, c=context) # predict noise e_(x_t,t)
samples = denoise_ddim(samples, i, i - step_size, eps)
intermediate.append(samples.detach().cpu().numpy())

intermediate = np.stack(intermediate)
return samples, intermediate

ChatGPT Prompt Engineering for Developers

使用ChatGPT辅助,感觉已经成为现代社会一种普通工具了

image-20231228232329050

image-20231228232417399

Finetuning Large Language Models

大模型的finetune,了解原理即可

image-20231228232508902

Opensource models with huggingface

非常全面的使用hugging face模型的课程,可以用于部署一些接口.

Getting started with Mistral

如果你想玩玩开源模型可以试试这个Mistral.此外也有llama2&&3的玩意

Prompt Engineering for Vision Models

-------------本文结束感谢您的阅读-------------
感谢阅读.

欢迎关注我的其它发布渠道