导航菜单
首页 » 问答 » 正文

Step by Step,城市街景图片的语义分割教程

哈喽大家新年好,我是博主张喵喵。无需多言,街景影像(SVI)在过去几年中在城市研究中是备受关注的热点之一,街景具有相当多的优势,它可以覆盖世界上一半以上的人口,提供了一个有价值的大规模城市数据来源,提供城市的近景影像,这是航空或卫星图像等其他常用数据源无法提供的。街景影像的发展在很大程度上是由SVI数据的激增(谷歌、百度街景等服务的覆盖和发展),机器学习和计算机视觉的进步推动的。这些进步能够自动提取各种信息,以及不断增长的计算能力,以促进处理大量图像。我们以 view image作为关键词在web of 检索可以发现,每年的相关研究有800篇左右。目前,针对街景影像的处理手段已经从设计各种诸如 SIFT(Scale- )的方法进行特征提取到如今全面拥抱深度学习技术的进展,其中语义分割和目标检测是用的最多的两种技术,该系列随后还会带来GAN和扩散模型、多模态预训练模型在SVI中的最新进展。

语义分割是将标签或类别与图片的每个像素关联的一种深度学习算法。它用来识别构成可区分类别的像素集合。例如,自动驾驶汽车需要识别车辆、行人、交通信号、人行道和其他道路特征。语义分割可用于多种应用场合,比如自动驾驶、医学成像和工业检测。在城市中语义分割尤为重要,诸如特斯拉这类自动驾驶企业的纯视觉模型依赖相当多的训练标注数据,使得汽车更加智能。很不幸商业公司很难开放自己的数据,好在学界推出了和城市语义分割数据集,使我们相当容易的调用模型实现我们的任务。

图:语义分割示意

我们依托博主分享的库来进行逐步操作。首先把他clone下来:

它提供了如何自己训练一个分割模型,这里我们不需要,只需要调用他训练好的模型。我们在根目录下新建一个.py文件,然后把以下一大段代码(修改自作者的test文件)拷贝过去,加载预训练模型,torch设为测试模式(model.eval()):

import osimport loggingimport argparseimport cv2import numpy as npimport torchimport torch.backends.cudnn as cudnnimport torch.nn.functional as Fimport torch.nn.parallelimport torch.utils.datafrom util import configfrom util.util import colorize
cv2.ocl.setUseOpenCL(False)
def get_parser(): parser = argparse.ArgumentParser(description='PyTorch Semantic Segmentation') parser.add_argument('--config', type=str, default='config/cityscapes/cityscapes_pspnet101.yaml', help='config file') parser.add_argument('--image', type=str, default='sv.png', help='input image') parser.add_argument('opts', help='see config/cityscapes/cityscapes_pspnet101.yaml for all options', default=None, nargs=argparse.REMAINDER) args = parser.parse_args() assert args.config #is not None cfg = config.load_cfg_from_cfg_file(args.config) cfg.image = args.image if args.opts is not None: cfg = config.merge_cfg_from_list(cfg, args.opts)    return cfg
def get_logger(): logger_name = "main-logger" logger = logging.getLogger(logger_name) logger.setLevel(logging.INFO) handler = logging.StreamHandler() fmt = "[%(asctime)s %(levelname)s %(filename)s line %(lineno)d %(process)d] %(message)s" handler.setFormatter(logging.Formatter(fmt)) logger.addHandler(handler)    return logger
def check(args): assert args.classes > 1 assert args.zoom_factor in [1, 2, 4, 8] assert args.split in ['train', 'val', 'test'] if args.arch == 'psp': assert (args.train_h - 1) % 8 == 0 and (args.train_w - 1) % 8 == 0 elif args.arch == 'psa': if args.compact: args.mask_h = (args.train_h - 1) // (8 * args.shrink_factor) + 1 args.mask_w = (args.train_w - 1) // (8 * args.shrink_factor) + 1 else: assert (args.mask_h is None and args.mask_w is None) or (args.mask_h is not None and args.mask_w is not None) if args.mask_h is None and args.mask_w is None: args.mask_h = 2 * ((args.train_h - 1) // (8 * args.shrink_factor) + 1) - 1 args.mask_w = 2 * ((args.train_w - 1) // (8 * args.shrink_factor) + 1) - 1 else: assert (args.mask_h % 2 == 1) and (args.mask_h >= 3) and ( args.mask_h <= 2 * ((args.train_h - 1) // (8 * args.shrink_factor) + 1) - 1) assert (args.mask_w % 2 == 1) and (args.mask_w >= 3) and ( args.mask_w <= 2 * ((args.train_h - 1) // (8 * args.shrink_factor) + 1) - 1) else:        raise Exception('architecture not supported yet'.format(args.arch))
def net_process(model, image, mean, std=None, flip=True): input = torch.from_numpy(image.transpose((2, 0, 1))).float() if std is None: for t, m in zip(input, mean): t.sub_(m) else: for t, m, s in zip(input, mean, std): t.sub_(m).div_(s) input = input.unsqueeze(0).cuda() if flip: input = torch.cat([input, input.flip(3)], 0) with torch.no_grad(): output = model(input) _, _, h_i, w_i = input.shape _, _, h_o, w_o = output.shape if (h_o != h_i) or (w_o != w_i): output = F.interpolate(output, (h_i, w_i), mode='bilinear', align_corners=True) output = F.softmax(output, dim=1) if flip: output = (output[0] + output[1].flip(2)) / 2 else: output = output[0] output = output.data.cpu().numpy() output = output.transpose(1, 2, 0)    return output
def scale_process(model, image, classes, crop_h, crop_w, h, w, mean, std=None, stride_rate=2/3): ori_h, ori_w, _ = image.shape pad_h = max(crop_h - ori_h, 0) pad_w = max(crop_w - ori_w, 0) pad_h_half = int(pad_h / 2) pad_w_half = int(pad_w / 2) if pad_h > 0 or pad_w > 0: image = cv2.copyMakeBorder(image, pad_h_half, pad_h - pad_h_half, pad_w_half, pad_w - pad_w_half, cv2.BORDER_CONSTANT, value=mean) new_h, new_w, _ = image.shape stride_h = int(np.ceil(crop_h*stride_rate)) stride_w = int(np.ceil(crop_w*stride_rate)) grid_h = int(np.ceil(float(new_h-crop_h)/stride_h) + 1) grid_w = int(np.ceil(float(new_w-crop_w)/stride_w) + 1) prediction_crop = np.zeros((new_h, new_w, classes), dtype=float) count_crop = np.zeros((new_h, new_w), dtype=float) for index_h in range(0, grid_h): for index_w in range(0, grid_w): s_h = index_h * stride_h e_h = min(s_h + crop_h, new_h) s_h = e_h - crop_h s_w = index_w * stride_w e_w = min(s_w + crop_w, new_w) s_w = e_w - crop_w image_crop = image[s_h:e_h, s_w:e_w].copy() count_crop[s_h:e_h, s_w:e_w] += 1 prediction_crop[s_h:e_h, s_w:e_w, :] += net_process(model, image_crop, mean, std) prediction_crop /= np.expand_dims(count_crop, 2) prediction_crop = prediction_crop[pad_h_half:pad_h_half+ori_h, pad_w_half:pad_w_half+ori_w] prediction = cv2.resize(prediction_crop, (w, h), interpolation=cv2.INTER_LINEAR)    return prediction
if __name__ == '__main__': global args, logger args = get_parser()
check(args) logger = get_logger() os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(str(x) for x in args.test_gpu) logger.info(args) logger.info("=> creating model ...") logger.info("Classes: {}".format(args.classes))
value_scale = 255 mean = [0.485, 0.456, 0.406] mean = [item * value_scale for item in mean] std = [0.229, 0.224, 0.225] std = [item * value_scale for item in std] colors = np.loadtxt(args.colors_path).astype('uint8')
print(args.arch)
if args.arch == 'psp': from model.pspnet import PSPNet model = PSPNet(layers=args.layers, classes=args.classes, zoom_factor=args.zoom_factor, pretrained=False) elif args.arch == 'psa': from model.psanet import PSANet model = PSANet(layers=args.layers, classes=args.classes, zoom_factor=args.zoom_factor, compact=args.compact, shrink_factor=args.shrink_factor, mask_h=args.mask_h, mask_w=args.mask_w, normalization_factor=args.normalization_factor, psa_softmax=args.psa_softmax, pretrained=False) logger.info(model) model = torch.nn.DataParallel(model).cuda() cudnn.benchmark = True if os.path.isfile(args.model_path): logger.info("=> loading checkpoint '{}'".format(args.model_path)) checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['state_dict'], strict=False) logger.info("=> loaded checkpoint '{}'".format(args.model_path)) else: raise RuntimeError("=> no checkpoint found at '{}'".format(args.model_path))
    print(next(model.parameters()).device)  # 输出:cuda:0     classes = args.classes base_size = args.base_size crop_h = args.test_h crop_w = args.test_w scales = args.scales model=model.eval()
#这里修改自己的文件,输入街景目录,进行循环计算保存 image_path = 'sv.png' image = cv2.imread(image_path, cv2.IMREAD_COLOR) # BGR 3 channel ndarray wiht shape H * W * 3 image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # convert cv2 read image from BGR order to RGB order h, w, _ = image.shape prediction = np.zeros((h, w, classes), dtype=float) for scale in scales: long_size = round(scale * base_size) new_h = long_size new_w = long_size if h > w: new_w = round(long_size / float(h) * w) else: new_h = round(long_size / float(w) * h) image_scale = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR) prediction += scale_process(model, image_scale, classes, crop_h, crop_w, h, w, mean, std) prediction = scale_process(model, image_scale, classes, crop_h, crop_w, h, w, mean, std) prediction = np.argmax(prediction, axis=2) gray = np.uint8(prediction) color = colorize(gray, colors)
print('gray',gray)    print('color',color)

作者使用了函数,我们就修改他的默认值就好了,使用.yaml配置,就是说使用基于数据和网络训练的语义分割模型。有其他网络需要自行修改,101指的是使用结构。.yaml文件参数如下:

DATA:  data_root: dataset/cityscapes  train_list: dataset/cityscapes/list/fine_train.txt  val_list: dataset/cityscapes/list/fine_val.txt  classes: 19
TRAIN: arch: psp layers: 101 sync_bn: True # adopt syncbn or not train_h: 713 train_w: 713 scale_min: 0.5 # minimum random scale scale_max: 2.0 # maximum random scale rotate_min: -10 # minimum random rotate rotate_max: 10 # maximum random rotate zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8] ignore_label: 255 aux_weight: 0.4 train_gpu: [0, 1, 2, 3, 4, 5, 6, 7] workers: 16 # data loader workers batch_size: 16 # batch size for training batch_size_val: 8 # batch size for validation during training, memory and speed tradeoff base_lr: 0.01 epochs: 200 start_epoch: 0 power: 0.9 momentum: 0.9 weight_decay: 0.0001 manual_seed: print_freq: 10 save_freq: 1 save_path: exp/cityscapes/pspnet101/model weight: # path to initial weight (default: none) resume: # path to latest checkpoint (default: none) evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommendDistributed: dist_url: tcp://127.0.0.1:6789 dist_backend: 'nccl' multiprocessing_distributed: True world_size: 1 rank: 0
TEST: test_list: dataset/cityscapes/list/fine_val.txt split: val # split in [train, val and test] base_size: 2048 # based size for scaling test_h: 713 test_w: 713 scales: [1.0] # evaluation scales, ms as [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] has_prediction: False # has prediction already or not index_start: 0 # evaluation start index in list index_step: 0 # evaluation step index in list, 0 means to end test_gpu: [0] model_path: exp/cityscapes/pspnet101/model/train_epoch_200.pth # evaluation model path save_folder: exp/cityscapes/pspnet101/result/epoch_200/val/ss # results save folder colors_path: data/cityscapes/cityscapes_colors.txt # path of dataset colors names_path: data/cityscapes/cityscapes_names.txt # path of dataset category names

也就是说有19类,预训练模型地址在: exp///model/.pth 。作者在谷歌云盘上托管了该模型文件(/drive//-)。考虑到网络问题大部分人可能访问不到,没有关系,我们一会会自己上传一个库给大家。

在root目录下放一个sv.png文件,在exp///model目录放模型文件然后run这段代码。模型输出长这样,0-18是属于各类的索引:

索引对照表:

roadsidewalkbuildingwallfencepoletraffic lighttraffic signvegetationterrainskypersonridercartruckbustrainmotorcyclebicycle

vegetation = (prediction == 8)print('绿视率为:', str(len(prediction[vegetation]) / (prediction.shape[0] * prediction.shape[1])))

我们选择index=8就是绿视率啦,计算出来是这样,植被的图片占比:

我们把分割的结果映射到原图上去:

image_name = '喵喵测试'color_path = image_name + '_color.png'cv2.imwrite(gray_path, gray)color.save(color_path)logger.info("=> Prediction saved in {}".format(color_path))

hhh,这是识别的结果,大家应该猜到是哪里了,该数据集在城市中训练的,大家用街景会有更好的精度。我们设置一下透明度,看看结果:

import matplotlib.image as mpimgimport matplotlib.pyplot as pltplt.figure(figsize=(105))plt.imshow(image)plt.imshow(color, alpha=0.5)plt.axis('off')plt.savefig('cd2_deeplab_ade.jpg', dpi=150, bbox_inches='tight')

嗯,更丑了。

大家需要不同的模型,如psp或者psa,不同的种类划分(ADE类更多),都可以自行修改参数,非常简单,点击阅读原文,查看代码。

祝大家新春快乐!如有帮助请cite我们街景相关的文章:

@article{zhang2022migratable,  title={Migratable urban street scene sensing method based on vision language pre-trained model},  author={Zhang, Yan and Zhang, Fan and Chen, Nengcheng},  journal={International Journal of Applied Earth Observation and Geoinformation},  volume={113},  pages={102989},  year={2022},  publisher={Elsevier}}@article{zhang2021multi,  title={Multi-source sensor based urban habitat and resident health sensing: A case study of Wuhan, China},  author={Zhang, Yan and Chen, Nengcheng and Du, Wenying and Li, Yingbing and Zheng, Xiang},  journal={Building and Environment},  volume={198},  pages={107883},  year={2021},  publisher={Elsevier}}

评论(0)

二维码