dpsc:深度学习短课程学习

Andrew Ng的Deep Learning短课程Short Courses | Learn Generative AI from DeepLearning.AI,此外还有Cousera上的课程.学的东西比较实用还比较新.

这些课程通常会使用一些公司的产品,比如Hugging Face的Gradio,diffusers,transformers,或者W&B的wandb等等(这两个我平常都在用),此外还有谷歌、微软以及Langchain,这些工具都比较实用. 如果关注生成领域,那Diffusion Model肯定要看,如果关注LLM以及chatbot那Langchain最好利用起来,如果自己训练部署模型,那也可以使用wandb.

这里我主要关注三部分:生成式人工智能,LLM,模型部署和训练辅助工具.

Reinforcement Learning from Human Feedback

Evaluating and Debugging Generative AI Models Using Weights and Biases

作为模型训练者可能会用到的网站.文档Python Library | Weights & Biases Documentation (wandb.ai)

wandb.init(
    project="gpt5",
    config=config,
)
wandb.log(metrics)

首先找到项目(如果没有就会另外创建),并且会根据config创建一个run.使用wandb.log输出最后结果.wandb会保存运行时系统环境信息,github repo甚至仓库文件信息.

def train_model(config):
    "Train a model with a given config"
    
    wandb.init(
        project="gpt5",
        config=config,
    )

    # Get the data
    train_dl, valid_dl = get_dataloaders(DATA_DIR, 
                                         config.batch_size, 
                                         config.slice_size, 
                                         config.valid_pct)
    n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config.batch_size)

    # A simple MLP model
    model = get_model(config.dropout)

    # Make the loss and optimizer
    loss_func = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=config.lr)

    example_ct = 0

    for epoch in tqdm(range(config.epochs), total=config.epochs):
        model.train()

        for step, (images, labels) in enumerate(train_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            outputs = model(images)
            train_loss = loss_func(outputs, labels)
            optimizer.zero_grad()
            train_loss.backward()
            optimizer.step()

            example_ct += len(images)
            metrics = {
                "train/train_loss": train_loss,
                "train/epoch": epoch + 1,
                "train/example_ct": example_ct
            }
            wandb.log(metrics)
            
        # Compute validation metrics, log images on last epoch
        val_loss, accuracy = validate_model(model, valid_dl, loss_func)
        # Compute train and validation metrics
        val_metrics = {
            "val/val_loss": val_loss,
            "val/val_accuracy": accuracy
        }
        wandb.log(val_metrics)
    
    wandb.finish()

如何在同一个run更改,更新现有运行的配置.下面是我匿名使用wandb跑的一次run

import wandb
api = wandb.Api()

run = api.run("anony-mouse-988582345570149472/gpt5/<run_id>")
run.config["key"] = updated_value
run.update()

将单个运行的指标导出到CSV文件

import wandb
api = wandb.Api()

# run is specified by <entity>/<project>/<run_id>
run = api.run("anony-mouse-946987442323310233/dlai_sprite_diffusion/<run_id>")

# save the metrics for the run to a csv file
metrics_dataframe = run.history()
metrics_dataframe.to_csv("metrics.csv")

读取运行指标

import wandb
api = wandb.Api()

run = api.run("anony-mouse-946987442323310233/dlai_sprite_diffusion/<run_id>")
if run.state == "finished":
    for i, row in run.history().iterrows():
      print(row["_timestamp"], row["accuracy"])

当您从历史中提取数据时，默认情况下会对其采样到500点。使用run.scan_history（）获取所有记录的数据点。下面是下载所有记录在历史中的丢失数据点的示例。

import wandb
api = wandb.Api()

run = api.run("anony-mouse-946987442323310233/dlai_sprite_diffusion/<run_id>")
history = run.scan_history()
losses = [row["loss"] for row in history]

wandb Table

1	table = wandb.Table(columns=["input_noise", "ddpm", "ddim", "class"])

for noise, ddpm_s, ddim_s, c in zip(noises, 
                                    ddpm_samples, 
                                    ddim_samples, 
                                    to_classes(ctx_vector)):
    
    # add data row by row to the Table
    table.add_data(wandb.Image(noise),
                   wandb.Image(ddpm_s), 
                   wandb.Image(ddim_s),
                   c)

with wandb.init(project="dlai_sprite_diffusion", 
                job_type="samplers_battle", 
                config=config):
    
    wandb.log({"samplers_table":table})

先创建wandb.Table,再使用其添加数据,最后使用log推上去

table = wandb.Table(columns=["prompt", "generation"])

for prompt in prompts:
    input_ids = tokenizer.encode(prefix + prompt, return_tensors="pt")
    output = model.generate(input_ids, do_sample=True, max_new_tokens=50, top_p=0.3)
    output_text = tokenizer.decode(output[0], skip_special_tokens=True)
    table.add_data(prefix + prompt, output_text)
    
wandb.log({'tiny_generations': table})

使用W&B sweep进行超参搜索调整

在高维超参数空间中搜索以找到最具性能的模型可能会变得非常困难。超参数扫描提供了一种有组织、高效的方式来进行一系列模型的战斗，并选择最准确的模型。它们通过自动搜索超参数值的组合（例如学习率、批量大小、隐藏层的数量、优化器类型）来找到最优化的值。Sweep结合了一种尝试一堆超参数值的策略和评估代码

定义sweep：通过创建一个字典或YAML文件来实现这一点，该文件指定了要搜索的参数、搜索策略、优化指标等。

首先选择搜索策略,包括网格,随机和贝叶斯.

grid Search – Iterate over every combination of hyperparameter values. Very effective, but can be computationally costly.
random Search – Select each new combination at random according to provided distributions. Surprisingly effective!
bayesian Search – Create a probabilistic model of metric score as a function of the hyperparameters, and choose parameters with high probability of improving the metric. Works well for small numbers of continuous parameters but scales poorly.

sweep_config = {
    'method': 'random'
    }
metric = {
    'name': 'loss',
    'goal': 'minimize'   
    }

sweep_config['metric'] = metric

然后是训练网络的一些超参

parameters_dict = {
    'optimizer': {
        'values': ['adam', 'sgd']
        },
    'fc_layer_size': {
        'values': [128, 256, 512]
        },
    'dropout': {
          'values': [0.3, 0.4, 0.5]
        },
    }

sweep_config['parameters'] = parameters_dict

通常情况下，有些超参数我们不想在这次扫描中发生变化，但我们仍然想在扫描_配置中设置。

parameters_dict.update({
    'epochs': {
        'value': 1}
    })

`rand搜索策略可以指定一个正态分布进行选参数,默认是均匀分布.

parameters_dict.update({
    'learning_rate': {
        # a flat distribution between 0 and 0.1
        'distribution': 'uniform',
        'min': 0,
        'max': 0.1
      },
    'batch_size': {
        # integers between 32 and 256
        # with evenly-distributed logarithms 
        'distribution': 'q_log_uniform_values',
        'q': 8,
        'min': 32,
        'max': 256,
      }
    })

所以参数包括metho,metric和parameter.此外还有一些这里就不介绍了

初始化扫描：用一行代码初始化扫描并传入扫描配置字典：sweep_id=wandb.sweep（sweep_config）

1	sweep_id = wandb.sweep(sweep_config, project="pytorch-sweeps-demo")

创建一个sweep

运行扫描代理：也可以用一行代码完成，我们调用wandb.agent（）并传递要运行的sweep_id，以及一个定义模型架构并对其进行训练的函数：wandb.agent（sweep_id，function=train）

开始正常的训练.

import torch
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        loader = build_dataset(config.batch_size)
        network = build_network(config.fc_layer_size, config.dropout)
        optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

        for epoch in range(config.epochs):
            avg_loss = train_epoch(network, loader, optimizer)
            wandb.log({"loss": avg_loss, "epoch": epoch})

1	wandb.agent(sweep_id, train, count=5)

使用Sweep Controller返回的随机生成的超参数值，启动一个运行训练5次的agents

另外课程还讲了Tracer等,现在我用不上….主要还是上传loss和acc这些结果.

How Diffusion Models Work

Diffusion Models在前段时间非常火,也是现在prompt生成图像的主要模型.

训练的预测噪声网络是Unet

class ContextUnet(nn.Module):
    def __init__(self, in_channels, n_feat=256, n_cfeat=10, height=28):  # cfeat - context features
        super(ContextUnet, self).__init__()

        # number of input channels, number of intermediate feature maps and number of classes
        self.in_channels = in_channels
        self.n_feat = n_feat
        self.n_cfeat = n_cfeat
        self.h = height  #assume h == w. must be divisible by 4, so 28,24,20,16...

        # Initialize the initial convolutional layer
        self.init_conv = ResidualConvBlock(in_channels, n_feat, is_res=True)

        # Initialize the down-sampling path of the U-Net with two levels
        self.down1 = UnetDown(n_feat, n_feat)        # down1 #[10, 256, 8, 8]
        self.down2 = UnetDown(n_feat, 2 * n_feat)    # down2 #[10, 256, 4,  4]
        
         # original: self.to_vec = nn.Sequential(nn.AvgPool2d(7), nn.GELU())
        self.to_vec = nn.Sequential(nn.AvgPool2d((4)), nn.GELU())

        # Embed the timestep and context labels with a one-layer fully connected neural network
        self.timeembed1 = EmbedFC(1, 2*n_feat)
        self.timeembed2 = EmbedFC(1, 1*n_feat)
        self.contextembed1 = EmbedFC(n_cfeat, 2*n_feat)
        self.contextembed2 = EmbedFC(n_cfeat, 1*n_feat)

        # Initialize the up-sampling path of the U-Net with three levels
        self.up0 = nn.Sequential(
            nn.ConvTranspose2d(2 * n_feat, 2 * n_feat, self.h//4, self.h//4), # up-sample  
            nn.GroupNorm(8, 2 * n_feat), # normalize                       
            nn.ReLU(),
        )
        self.up1 = UnetUp(4 * n_feat, n_feat)
        self.up2 = UnetUp(2 * n_feat, n_feat)

        # Initialize the final convolutional layers to map to the same number of channels as the input image
        self.out = nn.Sequential(
            nn.Conv2d(2 * n_feat, n_feat, 3, 1, 1), # reduce number of feature maps   #in_channels, out_channels, kernel_size, stride=1, padding=0
            nn.GroupNorm(8, n_feat), # normalize
            nn.ReLU(),
            nn.Conv2d(n_feat, self.in_channels, 3, 1, 1), # map to same number of channels as input
        )

    def forward(self, x, t, c=None):
        """
        x : (batch, n_feat, h, w) : input image
        t : (batch, n_cfeat)      : time step
        c : (batch, n_classes)    : context label
        """
        # x is the input image, c is the context label, t is the timestep, context_mask says which samples to block the context on

        # pass the input image through the initial convolutional layer
        x = self.init_conv(x)
        # pass the result through the down-sampling path
        down1 = self.down1(x)       #[10, 256, 8, 8]
        down2 = self.down2(down1)   #[10, 256, 4, 4]
        
        # convert the feature maps to a vector and apply an activation
        hiddenvec = self.to_vec(down2)
        
        # mask out context if context_mask == 1
        if c is None:
            c = torch.zeros(x.shape[0], self.n_cfeat).to(x)
            
        # embed context and timestep
        cemb1 = self.contextembed1(c).view(-1, self.n_feat * 2, 1, 1)     # (batch, 2*n_feat, 1,1)
        temb1 = self.timeembed1(t).view(-1, self.n_feat * 2, 1, 1)
        cemb2 = self.contextembed2(c).view(-1, self.n_feat, 1, 1)
        temb2 = self.timeembed2(t).view(-1, self.n_feat, 1, 1)
        #print(f"uunet forward: cemb1 {cemb1.shape}. temb1 {temb1.shape}, cemb2 {cemb2.shape}. temb2 {temb2.shape}")


        up1 = self.up0(hiddenvec)
        up2 = self.up1(cemb1*up1 + temb1, down2)  # add and multiply embeddings
        up3 = self.up2(cemb2*up2 + temb2, down1)
        out = self.out(torch.cat((up3, x), 1))
        return out

训练的时候是random一个timestamp得到噪声,sampling的时候是根据步数进行依次sampling

control and speed up

加入context_vector进行控制输出,使用DDIM替换DDPM进行加速samping

# define sampling function for DDIM   
# removes the noise using ddim
def denoise_ddim(x, t, t_prev, pred_noise):
    ab = ab_t[t]
    ab_prev = ab_t[t_prev]
    
    x0_pred = ab_prev.sqrt() / ab.sqrt() * (x - (1 - ab).sqrt() * pred_noise)
    dir_xt = (1 - ab_prev).sqrt() * pred_noise

    return x0_pred + dir_xt

# fast sampling algorithm with context
@torch.no_grad()
def sample_ddim_context(n_sample, context, n=20):
    # x_T ~ N(0, 1), sample initial noise
    samples = torch.randn(n_sample, 3, height, height).to(device)  

    # array to keep track of generated steps for plotting
    intermediate = [] 
    step_size = timesteps // n
    for i in range(timesteps, 0, -step_size):
        print(f'sampling timestep {i:3d}', end='\r')

        # reshape time tensor
        t = torch.tensor([i / timesteps])[:, None, None, None].to(device)

        eps = nn_model(samples, t, c=context)    # predict noise e_(x_t,t)
        samples = denoise_ddim(samples, i, i - step_size, eps)
        intermediate.append(samples.detach().cpu().numpy())

    intermediate = np.stack(intermediate)
    return samples, intermediate

ChatGPT Prompt Engineering for Developers

使用ChatGPT辅助,感觉已经成为现代社会一种普通工具了

Finetuning Large Language Models

大模型的finetune,了解原理即可

Opensource models with huggingface

非常全面的使用hugging face模型的课程,可以用于部署一些接口.

Getting started with Mistral

如果你想玩玩开源模型可以试试这个Mistral.此外也有llama2&&3的玩意

Sekyoro的博客小屋