哈喽大家新年好,我是博主张喵喵。无需多言,街景影像(SVI)在过去几年中在城市研究中是备受关注的热点之一,街景具有相当多的优势,它可以覆盖世界上一半以上的人口,提供了一个有价值的大规模城市数据来源,提供城市的近景影像,这是航空或卫星图像等其他常用数据源无法提供的。街景影像的发展在很大程度上是由SVI数据的激增(谷歌、百度街景等服务的覆盖和发展),机器学习和计算机视觉的进步推动的。这些进步能够自动提取各种信息,以及不断增长的计算能力,以促进处理大量图像。我们以 view image作为关键词在web of 检索可以发现,每年的相关研究有800篇左右。目前,针对街景影像的处理手段已经从设计各种诸如 SIFT(Scale- )的方法进行特征提取到如今全面拥抱深度学习技术的进展,其中语义分割和目标检测是用的最多的两种技术,该系列随后还会带来GAN和扩散模型、多模态预训练模型在SVI中的最新进展。
语义分割是将标签或类别与图片的每个像素关联的一种深度学习算法。它用来识别构成可区分类别的像素集合。例如,自动驾驶汽车需要识别车辆、行人、交通信号、人行道和其他道路特征。语义分割可用于多种应用场合,比如自动驾驶、医学成像和工业检测。在城市中语义分割尤为重要,诸如特斯拉这类自动驾驶企业的纯视觉模型依赖相当多的训练标注数据,使得汽车更加智能。很不幸商业公司很难开放自己的数据,好在学界推出了和城市语义分割数据集,使我们相当容易的调用模型实现我们的任务。
图:语义分割示意
我们依托博主分享的库来进行逐步操作。首先把他clone下来:
它提供了如何自己训练一个分割模型,这里我们不需要,只需要调用他训练好的模型。我们在根目录下新建一个.py文件,然后把以下一大段代码(修改自作者的test文件)拷贝过去,加载预训练模型,torch设为测试模式(model.eval()):
import os
import logging
import argparse
import cv2
import numpy as np
import torch
import torch.backends.cudnn as cudnn
import torch.nn.functional as F
import torch.nn.parallel
import torch.utils.data
from util import config
from util.util import colorize
cv2.ocl.setUseOpenCL(False)
def get_parser():
parser = argparse.ArgumentParser(description='PyTorch Semantic Segmentation')
parser.add_argument('--config', type=str, default='config/cityscapes/cityscapes_pspnet101.yaml', help='config file')
parser.add_argument('--image', type=str, default='sv.png', help='input image')
parser.add_argument('opts', help='see config/cityscapes/cityscapes_pspnet101.yaml for all options', default=None, nargs=argparse.REMAINDER)
args = parser.parse_args()
assert args.config #is not None
cfg = config.load_cfg_from_cfg_file(args.config)
cfg.image = args.image
if args.opts is not None:
cfg = config.merge_cfg_from_list(cfg, args.opts)
return cfg
def get_logger():
logger_name = "main-logger"
logger = logging.getLogger(logger_name)
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
fmt = "[%(asctime)s %(levelname)s %(filename)s line %(lineno)d %(process)d] %(message)s"
handler.setFormatter(logging.Formatter(fmt))
logger.addHandler(handler)
return logger
def check(args):
assert args.classes > 1
assert args.zoom_factor in [1, 2, 4, 8]
assert args.split in ['train', 'val', 'test']
if args.arch == 'psp':
assert (args.train_h - 1) % 8 == 0 and (args.train_w - 1) % 8 == 0
elif args.arch == 'psa':
if args.compact:
args.mask_h = (args.train_h - 1) // (8 * args.shrink_factor) + 1
args.mask_w = (args.train_w - 1) // (8 * args.shrink_factor) + 1
else:
assert (args.mask_h is None and args.mask_w is None) or (args.mask_h is not None and args.mask_w is not None)
if args.mask_h is None and args.mask_w is None:
args.mask_h = 2 * ((args.train_h - 1) // (8 * args.shrink_factor) + 1) - 1
args.mask_w = 2 * ((args.train_w - 1) // (8 * args.shrink_factor) + 1) - 1
else:
assert (args.mask_h % 2 == 1) and (args.mask_h >= 3) and (
args.mask_h <= 2 * ((args.train_h - 1) // (8 * args.shrink_factor) + 1) - 1)
assert (args.mask_w % 2 == 1) and (args.mask_w >= 3) and (
args.mask_w <= 2 * ((args.train_h - 1) // (8 * args.shrink_factor) + 1) - 1)
else:
raise Exception('architecture not supported yet'.format(args.arch))
def net_process(model, image, mean, std=None, flip=True):
input = torch.from_numpy(image.transpose((2, 0, 1))).float()
if std is None:
for t, m in zip(input, mean):
t.sub_(m)
else:
for t, m, s in zip(input, mean, std):
t.sub_(m).div_(s)
input = input.unsqueeze(0).cuda()
if flip:
input = torch.cat([input, input.flip(3)], 0)
with torch.no_grad():
output = model(input)
_, _, h_i, w_i = input.shape
_, _, h_o, w_o = output.shape
if (h_o != h_i) or (w_o != w_i):
output = F.interpolate(output, (h_i, w_i), mode='bilinear', align_corners=True)
output = F.softmax(output, dim=1)
if flip:
output = (output[0] + output[1].flip(2)) / 2
else:
output = output[0]
output = output.data.cpu().numpy()
output = output.transpose(1, 2, 0)
return output
def scale_process(model, image, classes, crop_h, crop_w, h, w, mean, std=None, stride_rate=2/3):
ori_h, ori_w, _ = image.shape
pad_h = max(crop_h - ori_h, 0)
pad_w = max(crop_w - ori_w, 0)
pad_h_half = int(pad_h / 2)
pad_w_half = int(pad_w / 2)
if pad_h > 0 or pad_w > 0:
image = cv2.copyMakeBorder(image, pad_h_half, pad_h - pad_h_half, pad_w_half, pad_w - pad_w_half, cv2.BORDER_CONSTANT, value=mean)
new_h, new_w, _ = image.shape
stride_h = int(np.ceil(crop_h*stride_rate))
stride_w = int(np.ceil(crop_w*stride_rate))
grid_h = int(np.ceil(float(new_h-crop_h)/stride_h) + 1)
grid_w = int(np.ceil(float(new_w-crop_w)/stride_w) + 1)
prediction_crop = np.zeros((new_h, new_w, classes), dtype=float)
count_crop = np.zeros((new_h, new_w), dtype=float)
for index_h in range(0, grid_h):
for index_w in range(0, grid_w):
s_h = index_h * stride_h
e_h = min(s_h + crop_h, new_h)
s_h = e_h - crop_h
s_w = index_w * stride_w
e_w = min(s_w + crop_w, new_w)
s_w = e_w - crop_w
image_crop = image[s_h:e_h, s_w:e_w].copy()
count_crop[s_h:e_h, s_w:e_w] += 1
prediction_crop[s_h:e_h, s_w:e_w, :] += net_process(model, image_crop, mean, std)
prediction_crop /= np.expand_dims(count_crop, 2)
prediction_crop = prediction_crop[pad_h_half:pad_h_half+ori_h, pad_w_half:pad_w_half+ori_w]
prediction = cv2.resize(prediction_crop, (w, h), interpolation=cv2.INTER_LINEAR)
return prediction
if __name__ == '__main__':
global args, logger
args = get_parser()
check(args)
logger = get_logger()
os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(str(x) for x in args.test_gpu)
logger.info(args)
logger.info("=> creating model ...")
logger.info("Classes: {}".format(args.classes))
value_scale = 255
mean = [0.485, 0.456, 0.406]
mean = [item * value_scale for item in mean]
std = [0.229, 0.224, 0.225]
std = [item * value_scale for item in std]
colors = np.loadtxt(args.colors_path).astype('uint8')
print(args.arch)
if args.arch == 'psp':
from model.pspnet import PSPNet
model = PSPNet(layers=args.layers, classes=args.classes, zoom_factor=args.zoom_factor, pretrained=False)
elif args.arch == 'psa':
from model.psanet import PSANet
model = PSANet(layers=args.layers, classes=args.classes, zoom_factor=args.zoom_factor, compact=args.compact,
shrink_factor=args.shrink_factor, mask_h=args.mask_h, mask_w=args.mask_w,
normalization_factor=args.normalization_factor, psa_softmax=args.psa_softmax, pretrained=False)
logger.info(model)
model = torch.nn.DataParallel(model).cuda()
cudnn.benchmark = True
if os.path.isfile(args.model_path):
logger.info("=> loading checkpoint '{}'".format(args.model_path))
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['state_dict'], strict=False)
logger.info("=> loaded checkpoint '{}'".format(args.model_path))
else:
raise RuntimeError("=> no checkpoint found at '{}'".format(args.model_path))
print(next(model.parameters()).device) # 输出:cuda:0
classes = args.classes
base_size = args.base_size
crop_h = args.test_h
crop_w = args.test_w
scales = args.scales
model=model.eval()
#这里修改自己的文件,输入街景目录,进行循环计算保存
image_path = 'sv.png'
image = cv2.imread(image_path, cv2.IMREAD_COLOR) # BGR 3 channel ndarray wiht shape H * W * 3
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # convert cv2 read image from BGR order to RGB order
h, w, _ = image.shape
prediction = np.zeros((h, w, classes), dtype=float)
for scale in scales:
long_size = round(scale * base_size)
new_h = long_size
new_w = long_size
if h > w:
new_w = round(long_size / float(h) * w)
else:
new_h = round(long_size / float(w) * h)
image_scale = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
prediction += scale_process(model, image_scale, classes, crop_h, crop_w, h, w, mean, std)
prediction = scale_process(model, image_scale, classes, crop_h, crop_w, h, w, mean, std)
prediction = np.argmax(prediction, axis=2)
gray = np.uint8(prediction)
color = colorize(gray, colors)
print('gray',gray)
print('color',color)
作者使用了函数,我们就修改他的默认值就好了,使用.yaml配置,就是说使用基于数据和网络训练的语义分割模型。有其他网络需要自行修改,101指的是使用结构。.yaml文件参数如下:
DATA:
data_root: dataset/cityscapes
train_list: dataset/cityscapes/list/fine_train.txt
val_list: dataset/cityscapes/list/fine_val.txt
classes: 19
TRAIN:
arch: psp
layers: 101
sync_bn: True # adopt syncbn or not
train_h: 713
train_w: 713
scale_min: 0.5 # minimum random scale
scale_max: 2.0 # maximum random scale
rotate_min: -10 # minimum random rotate
rotate_max: 10 # maximum random rotate
zoom_factor: 8 # zoom factor for final prediction during training, be in [1, 2, 4, 8]
ignore_label: 255
aux_weight: 0.4
train_gpu: [0, 1, 2, 3, 4, 5, 6, 7]
workers: 16 # data loader workers
batch_size: 16 # batch size for training
batch_size_val: 8 # batch size for validation during training, memory and speed tradeoff
base_lr: 0.01
epochs: 200
start_epoch: 0
power: 0.9
momentum: 0.9
weight_decay: 0.0001
manual_seed:
print_freq: 10
save_freq: 1
save_path: exp/cityscapes/pspnet101/model
weight: # path to initial weight (default: none)
resume: # path to latest checkpoint (default: none)
evaluate: False # evaluate on validation set, extra gpu memory needed and small batch_size_val is recommend
Distributed:
dist_url: tcp://127.0.0.1:6789
dist_backend: 'nccl'
multiprocessing_distributed: True
world_size: 1
rank: 0
TEST:
test_list: dataset/cityscapes/list/fine_val.txt
split: val # split in [train, val and test]
base_size: 2048 # based size for scaling
test_h: 713
test_w: 713
scales: [1.0] # evaluation scales, ms as [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
has_prediction: False # has prediction already or not
index_start: 0 # evaluation start index in list
index_step: 0 # evaluation step index in list, 0 means to end
test_gpu: [0]
model_path: exp/cityscapes/pspnet101/model/train_epoch_200.pth # evaluation model path
save_folder: exp/cityscapes/pspnet101/result/epoch_200/val/ss # results save folder
colors_path: data/cityscapes/cityscapes_colors.txt # path of dataset colors
names_path: data/cityscapes/cityscapes_names.txt # path of dataset category names
也就是说有19类,预训练模型地址在: exp///model/.pth 。作者在谷歌云盘上托管了该模型文件(/drive//-)。考虑到网络问题大部分人可能访问不到,没有关系,我们一会会自己上传一个库给大家。
在root目录下放一个sv.png文件,在exp///model目录放模型文件然后run这段代码。模型输出长这样,0-18是属于各类的索引:
索引对照表:
road
sidewalk
building
wall
fence
pole
traffic light
traffic sign
vegetation
terrain
sky
person
rider
car
truck
bus
train
motorcycle
bicycle
vegetation = (prediction == 8)
print('绿视率为:', str(len(prediction[vegetation]) / (prediction.shape[0] * prediction.shape[1])))
我们选择index=8就是绿视率啦,计算出来是这样,植被的图片占比:
我们把分割的结果映射到原图上去:
image_name = '喵喵测试'
color_path = image_name + '_color.png'
cv2.imwrite(gray_path, gray)
color.save(color_path)
logger.info("=> Prediction saved in {}".format(color_path))
hhh,这是识别的结果,大家应该猜到是哪里了,该数据集在城市中训练的,大家用街景会有更好的精度。我们设置一下透明度,看看结果:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.imshow(image)
plt.imshow(color, alpha=0.5)
plt.axis('off')
plt.savefig('cd2_deeplab_ade.jpg', dpi=150, bbox_inches='tight')
嗯,更丑了。
大家需要不同的模型,如psp或者psa,不同的种类划分(ADE类更多),都可以自行修改参数,非常简单,点击阅读原文,查看代码。
祝大家新春快乐!如有帮助请cite我们街景相关的文章:
@article{zhang2022migratable,
title={Migratable urban street scene sensing method based on vision language pre-trained model},
author={Zhang, Yan and Zhang, Fan and Chen, Nengcheng},
journal={International Journal of Applied Earth Observation and Geoinformation},
volume={113},
pages={102989},
year={2022},
publisher={Elsevier}
}
@article{zhang2021multi,
title={Multi-source sensor based urban habitat and resident health sensing: A case study of Wuhan, China},
author={Zhang, Yan and Chen, Nengcheng and Du, Wenying and Li, Yingbing and Zheng, Xiang},
journal={Building and Environment},
volume={198},
pages={107883},
year={2021},
publisher={Elsevier}
}
评论(0)