新闻中心

EEPW首页 > 智能计算 > 进阶指南 > OpenVINO+OpenCV 文本检测与识别

OpenVINO+OpenCV 文本检测与识别

作者:时间:2020-11-09来源:OpenCV学堂收藏

模型介绍
  模型
  支持场景文字检测是基于MobileNet的PixelLink模型,该模型有两个输出,分别是分割输出与bounding Boxes输出,结构如下:

本文引用地址:http://www.eepw.com.cn/article/202011/420108.htm

  下面是基于VGG16作为backbone实现的PixelLink的模型结构:

  输入格式:1x3x768x1280 BGR彩色图像
  输出格式:
  name: "model/link_logits_/add", [ 1x16x192x32 0] – pixelLink的输出 name: "model/segm_logits/add", [ 1x2x192x32 0] – 像素分类text/ notext

  这里CNN使用类似 VGG16结构提前特征,序列预测使用 双向LSTM网络。
  输入格式: 1x1x32x120 输出格式: 30, 1, 37 输出解释是基于 CTC贪心解析方式。
代码演示
  01
基于PixelLink完成,其中加载模型与获取输入与输出层名称的

代码实现如下:

log.info( "Creating Inference Engine")
ie = IECore
dete_net = ie.read_network(model=dete_text_xml, weights=dete_text_bin)
reco_net = ie.read_network(model=reco_text_xml, weights=reco_text_bin)
  
# 文本检测网络, 输入与输出格式
log.info( "加载文本检测网络,解析输入与输出格式...")
input_it = iter(dete_net.input_info)
input_det_blob = next(input_it)
print(input_det_blob)
output_it = iter(dete_net.outputs)
out_det_blob1 = next(output_it)
out_det_blob2 = next(output_it)

# Read andpre-process input images
print(dete_net.input_info[input_det_blob].input_data.shape)
dn, dc, dh, dw = dete_net.input_info[input_det_blob].input_data.shape

# Loading model to the plugin
det_exec_net = ie.load_network(network=dete_net, device_name= "CPU")
print( "out_det_blob1: ", out_det_blob1, "out_det_blob2: ", out_det_blob2)

执行推理与解析输出的

代码如下:

image = cv.imread( "D:/images/openvino_ocr.jpg")
# image = cv.imread("D:/facedb/tiaoma/1.png")
h, w, c = image.shape
cv.imshow( "input", image)
img_blob = cv.resize(image, (dw, dh))
img_blob = img_blob.transpose( 2, 0, 1)
# Start sync inference
log.info( "Starting inference in synchronous mode")
inf_start1 = time.time
res = det_exec_net.infer(inputs={input_det_blob: [img_blob]})
inf_end1 = time.time - inf_start1
print( "inference time(ms) : %.3f"% (inf_end1 * 1000))
link_logits_ = res[out_det_blob1][ 0]
segm_logits = res[out_det_blob2][ 0]
link_logits_ = link_logits_.transpose( 1, 2, 0)
segm_logits = segm_logits.transpose( 1, 2, 0)
pixel_mask = np.zeros(( 192, 320), dtype=np.uint8)
print(link_logits_.shape, segm_logits.shape)
# 192, 320
forrow inrange( 192):
forcol inrange( 320):
pv1 = segm_logits[row, col, 0]
pv2 = segm_logits[row, col, 1]
ifpv2 > 1.0:
pixel_mask[row, col] = 255

mask = cv.resize(pixel_mask, (w, h))
cv.imshow( "mask", mask) 运行结果如下:

02运行结果:

ie = IECore
reco_net = ie.read_network(model=reco_text_xml, weights=reco_text_bin)

# 文本识别网络
log.info( "加载文本识别网络,解析输入与输出格式...")
input_rec_it = iter(reco_net.input_info)
input_rec_blob = next(input_rec_it)
print(input_rec_blob)
output_rec_it = iter(reco_net.outputs)
out_rec_blob = next(output_rec_it)

# Read and pre-process input images
print(reco_net.input_info[input_rec_blob].input_data.shape)
rn, rc, rh, rw = reco_net.input_info[input_rec_blob].input_data.shape

# Loading model to the plugin
rec_exec_net = ie.load_network(network=reco_net, device_name= "CPU")
print( "out_rec_blob1: ", out_rec_blob)

# 文字识别
image = cv.imread( "D:/images/zsxq/ocr3.png")
gray = cv.cvtColor(image, cv.COLOR_BGR2GRAY)
ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
se = cv.getStructuringElement(cv.MORPH_RECT, ( 5, 1))
binary = cv.dilate(binary, se)
cv.imshow( "binary", binary)
cv.waitKey( 0)
contours, hireachy = cv.findContours(binary, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
for cnt inrange(len(contours)):
x, y, iw, ih = cv.boundingRect(contours[cnt])
roi = gray[y:y + ih, x:x + iw]
rec_roi = cv.resize(roi, (rw, rh))
rec_roi_blob = np.expand_dims(rec_roi, 0)

# Start sync inference
log.info( "Starting inference in synchronous mode")
inf_start1 = time.time
res = rec_exec_net.infer(inputs={input_rec_blob: [rec_roi_blob]})
inf_end1 = time.time - inf_start1
print( "inference time(ms) : %.3f"% (inf_end1 * 1000))
res = res[out_rec_blob]
txt = greedy_prase_text(res)
cv.putText(image, txt, (x, y), cv.FONT_HERSHEY_PLAIN, 1.0, ( 0, 0, 255), 1, 8)
cv.imshow( "recognition text demo", image)
cv.waitKey( 0)
cv.destroyAllWindows

运行结果如下:

  重新整理了一下,解析部分的代码函数。不用看公式,看完你会晕倒而且写不出代码!

实现如下:
  defctc_soft_max(data): sum = 0; max_val = max(data) index = np.argmax(data) fori inrange(len(data)): sum += np.exp(data[i]- max_val) prob = 1.0/ sum returnindex, prob
  defgreedy_prase_text(res): # CTC greedy decode from hereprint(res.shape)# 解析输出textocrstr = ""prev_pad = False; fori inrange(res.shape[ 0]): ctc = res[i] # 1x13ctc = np.squeeze(ctc, 0) index, prob = ctc_soft_max(ctc)ifdigit_nums[index] == '#': prev_pad = Trueelse: iflen(ocrstr) == 0orprev_pad or(len(ocrstr) > 0anddigit_nums[index] != ocrstr[ -1]): prev_pad = Falseocrstr += digit_nums[index]print(ocrstr)returnocrstr



评论


相关推荐

技术专区

关闭