位置: IT常識(shí) - 正文
推薦整理分享【Tensorflow深度學(xué)習(xí)】實(shí)現(xiàn)手寫字體識(shí)別、預(yù)測實(shí)戰(zhàn)(附源碼和數(shù)據(jù)集 超詳細(xì))(tensorflow gan),希望有所幫助,僅作參考,歡迎閱讀內(nèi)容。
文章相關(guān)熱門搜索詞:tensorflowgpu,tensorflow deeplab,tensorflow v1,tensorflow dlib,tensorflow deeplab,tensorflow deeplab,tensorflow deepfm,tensorflow 1,內(nèi)容如對您有幫助,希望把文章鏈接給更多的朋友!
需要源碼和數(shù)據(jù)集請點(diǎn)贊關(guān)注收藏后評(píng)論區(qū)留言私信~~~
一、數(shù)據(jù)集簡介下面用到的數(shù)據(jù)集基于IAM數(shù)據(jù)集的英文手寫字體自動(dòng)識(shí)別應(yīng)用,IAM數(shù)據(jù)庫主要包含手寫的英文文本,可用于訓(xùn)練和測試手寫文本識(shí)別以及執(zhí)行作者的識(shí)別和驗(yàn)證,該數(shù)據(jù)庫在ICDAR1999首次發(fā)布,并據(jù)此開發(fā)了基于隱馬爾可夫模型的手寫句子識(shí)別系統(tǒng),并于ICPR2000發(fā)布,IAM包含不受約束的手寫文本,以300dpi的分辨率掃描并保存為具有256級(jí)灰度的PNG圖像,IAM手寫數(shù)據(jù)庫目前最新的版本為3.0,其主要結(jié)構(gòu)如下
約700位作家貢獻(xiàn)筆跡樣本
超過1500頁掃描文本
約6000個(gè)獨(dú)立標(biāo)記的句子
超過一萬行獨(dú)立標(biāo)記的文本
超過十萬個(gè)獨(dú)立標(biāo)記的空間
展示如下 有許多張手寫照片?
?
?
二、實(shí)現(xiàn)步驟?1:數(shù)據(jù)清洗刪除文件中備注說明以及錯(cuò)誤結(jié)果,統(tǒng)計(jì)正確筆跡圖形的數(shù)量,最后將整理后的數(shù)據(jù)進(jìn)行隨機(jī)無序化處理
2:樣本分類接下來對數(shù)據(jù)進(jìn)行分類 按照8:1:1的比例將樣本數(shù)據(jù)集分為三類數(shù)據(jù)集,分別是訓(xùn)練數(shù)據(jù)集 驗(yàn)證數(shù)據(jù)集和測試數(shù)據(jù)集,針對訓(xùn)練數(shù)據(jù)集進(jìn)行訓(xùn)練可以獲得模型,而測試數(shù)據(jù)集主要用于測試模型的有效性
3:實(shí)現(xiàn)字符和數(shù)字映射利用Tensorflow庫的Keras包的StringLookup函數(shù)實(shí)現(xiàn)從字符到數(shù)字的映射 主要參數(shù)說明如下
max_tokens:單詞大小的最大值
num_oov_indices:out of vocabulary的大小
mask_token:表示屏蔽輸入的大小
oov_token:僅當(dāng)invert為True時(shí)使用 OOV索引的返回值 默認(rèn)為UNK
4:進(jìn)行卷積變化?通過Conv2D函數(shù)實(shí)現(xiàn)二維卷積變換 主要參數(shù)說明如下
filters:整數(shù)值 代表輸出空間的維度
kernel_size:一個(gè)整數(shù)或元組列表 指定卷積窗口的高度和寬度
strides:一個(gè)整數(shù)或元組列表 指定卷積沿高度和寬度的步幅
padding:輸出圖像的填充方式
activation:激活函數(shù)
三、效果展示?讀取部分手寫樣本的真實(shí)文本信息如下
訓(xùn)練結(jié)束后 得到訓(xùn)練模型 導(dǎo)入測試手寫文本數(shù)據(jù) 進(jìn)行手寫筆跡預(yù)測 部分結(jié)果如下
四、結(jié)果總結(jié)?觀察預(yù)測結(jié)果可知,基于均值池化以及訓(xùn)練過程預(yù)警極值,大部分的英文字符能夠得到準(zhǔn)確的預(yù)測判定,訓(xùn)練的精度持續(xù)得到改善,損失值控制在比較合理的區(qū)間內(nèi),沒有發(fā)生預(yù)測準(zhǔn)確度連續(xù)多次無法改進(jìn)的場景,模型穩(wěn)定性較好
五、代碼部分代碼如下 需要全部代碼請點(diǎn)贊關(guān)注收藏后評(píng)論區(qū)留言私信~~~
from tensorflow.keras.layers.experimental.preprocessing import StringLookupfrom tensorflow import kerasimport matplotlib.pyplot as pltimport tensorflow as tfimport numpy as npimport osplt.rcParams['font.family'] = ['Microsoft YaHei']np.random.seed(0)tf.random.set_seed(0)# ## 切分?jǐn)?shù)據(jù)# In[ ]:corpus_read = open("data/words.txt", "r").readlines()corpus = []length_corpus=0for word in corpus_read: if lit(" ")[1] == "ok"): corpus.append(word)np.random.shuffle(corpus)length_corpus=len(corpus)print(length_corpus)corpus[400:405]# 劃分?jǐn)?shù)據(jù),按照 80:10:10 比例分配給訓(xùn)練:有效:測試 數(shù)據(jù)# In[ ]:train_flag = int(0.8 * len(corpus))test_flag = int(0.9 * len(corpus))train_data = corpus[:train_flag]validation_data = corpus[train_flag:test_flag]test_data = corpus[test_flag:]train_data_len=len(train_data)validation_data_len=len(validation_data)test_data_len=len(test_data)print("訓(xùn)練樣本大小:", train_data_len)print("驗(yàn)證樣本大小:", validation_data_len)print("測試樣本大小:",test_data_len )# In[ ]:image_direct = "data\images"def retrieve_image_info(data): image_location = [] sample = [] for (i, corpus_row) in enumerate(data): corpus_strip = corpus_row.strip() corpus_strip = corpus_strip.split(" ") image_name = corpus_strip[0] leve1 = image_name.split("-")[0] leve2 = image_name.split("-")[1] image_location_detail = os.path.join( image_direct, leve1, leve1 + "-" + leve2, image_name + ".png" ) if os.path.getsize(image_location_detail) >0 : image_location.append(image_location_detail) sample.append(corpus_row.split("\n")[0]) print("手寫圖像路徑:",image_location[0],"手寫文本信息:",sample[0]) return image_location, sampletrain_image, train_tag = retrieve_image_info(train_data)validation_image, validation_tag = retrieve_image_info(validation_data)test_image, test_tag = retrieve_image_info(test_data)# In[ ]:# 查找訓(xùn)練數(shù)據(jù)詞匯最大長度train_tag_extract = []vocab = set()max_len = 0for tag in train_tag: tag = tag.split(" ")[-1].strip() for i in tag: vocab.add(i) max_len = max(max_len, len(tag)) train_tag_extract.append(tag)print("最大長度: ", max_len)print("單詞大小: ", len(vocab))print("單詞內(nèi)容: ", vocab)train_tag_extract[40:45]# In[ ]:print(train_tag[50:54])print(validation_tag[10:14])print(test_tag[80:84])def extract_tag_info(tags): extract_tag = [] for tag in tags: tag = tag.split(" ")[-1].strip() extract_tag.append(tag) return extract_tagtrain_tag_tune = extract_tag_info(train_tag)validation_tag_tune = extract_tag_info(validation_tag)test_tag_tune = extract_tag_info(test_tag)print(train_tag_tune[50:54])print(validation_tag_tune[10:14])print(test_tag_tune[80:84])# In[ ]:AUTOTUNE = tf.data.AUTOTUNE# 映射單詞到數(shù)字string_to_no = StringLookup(vocabulary=list(vocab), invert=False)# 映射數(shù)字到單詞no_map_string = StringLookup( vocabulary=string_to_no.get_vocabulary(), invert=True)# In[ ]:def distortion_free_resize(image, img_size): w, h = img_size image = tf.image.resize(image, size=(h, w), preserve_aspect_ratio=True, antialias=False, name=None) # 計(jì)算填充區(qū)域大小 pad_height = h - tf.shape(image)[0] pad_width = w - tf.shape(image)[1] if pad_height % 2 != 0: height = pad_height // 2 pad_height_top = height + 1 pad_height_bottom = height else: pad_height_top = pad_height_bottom = pad_height // 2 if pad_width % 2 != 0: width = pad_width // 2 pad_width_left = width + 1 pad_width_right = width else: pad_width_left = pad_width_right = pad_width // 2 image = tf.pad( image, paddings=[ [pad_height_top, pad_height_bottom], [pad_width_left, pad_width_right], [0, 0], ], ) image = tf.transpose(image, perm=[1, 0, 2]) image = tf.image.flip_left_right(image) return image# In[ ]:batch_size = 64padding_token = 99image_width = 128image_height = 32def preprocess_image(image_path, img_size=(image_width, image_height)): image = tf.io.read_file(image_path) image = tf.image.decode_png(image, 1) image = distortion_free_resize(image, img_size) image = tf.cast(image, tf.float32) / 255.0 return imagedef vectorize_tag(tag): tag = string_to_no(tf.strings.unicode_split(tag, input_encoding="UTF-8")) length = tf.shape(tag)[0] pad_amount = max_len - length tag = tf.pad(tag, paddings=[[0, pad_amount]], constant_values=padding_token) return tagdef process_images_tags(image_path, tag): image = preprocess_image(image_path) tag = vectorize_tag(tag) return {"image": image, "tag": tag}def prepare_dataset(image_paths, tags): dataset = tf.data.Dataset.from_tensor_slices((image_paths, tags)).map( process_images_tags, num_parallel_calls=AUTOTUNE ) return dataset.batch(batch_size).cache().prefetch(AUTOTUNE)# In[ ]:train_final = prepare_dataset(train_image, train_tag_extract )validation_final = prepare_dataset(validation_image, validation_tag_tune )test_final = prepare_dataset(test_image, test_tag_tune )print(train_final.take(1))print(train_final)# In[ ]:plt.rcParams['font.family'] = ['Microsoft YaHei']for data in train_final.take(1): images, tags = data["image"], data["tag"] _, ax = plt.subplots(4, 4, figsize=(15, 8)) for i in range(16): img = images[i] img = tf.image.flip_left_right(img) img = tf.transpose(img, perm=[1, 0, 2]) img = (img * 255.0).numpy().clip(0, 255).astype(np.uint8) img = img[:, :, 0] tag = tags[i] indices = tf.gather(tag, tf.where(tf.math.not_equal(tag, padding_token))) tag = tf.strings.reduce_join(no_map_string(indices)) tag = tag.numpy().decode("utf-8") ax[i // 4, i % 4].imshow(img) ax[i // 4, i % 4].set_title(u"真實(shí)文本:%s"%tag) ax[i // 4, i % 4].axis("on")plt.show()# In[ ]:class CTCLoss(keras.layers.Layer): def call(self, y_true, y_pred): batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64") input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64") tag_length = tf.cast(tf.shape(y_true)[1], dtype="int64") input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64") tag_length = tag_length * tf.ones(shape=(batch_len, 1), dtype="int64") loss = keras.backend.ctc_batch_cost(y_true, y_pred, input_length, tag_length) self.add_loss(loss) return lossdef generate_model(): # Inputs to the model input_img = keras.Input(shape=(image_width, image_height, 1), name="image") tags = keras.layers.Input(name="tag", shape=(None,)) # First conv block. t = keras.layers.Conv2D( filters=32, kernel_size=(3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="ConvolutionLayer1")(input_img) t = keras.layers.AveragePooling2D((2, 2), name="AveragePooling_one")(t) # Second conv block. t = keras.layers.Conv2D( filters=64, kernel_size=(3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="ConvolutionLayer2")(t) t = keras.layers.AveragePooling2D((2, 2), name="AveragePooling_two")(t) #re_shape = (t,[(image_width // 4), -1]) #tf.dtypes.cast(t, tf.int32) re_shape = ((image_width // 4), (image_height // 4) * 64) t = keras.layers.Reshape(target_shape=re_shape, name="reshape")(t) t = keras.layers.Dense(64, activation="relu", name="denseone",use_bias=False, kernel_initializer='glorot_uniform', bias_initializer='zeros')(t) t = keras.layers.Dropout(0.4)(t) # RNNs. t = keras.layers.Bidirectional( keras.layers.LSTM(128, return_sequences=True, dropout=0.4) )(t) t = keras.layers.Bidirectional( keras.layers.LSTM(64, return_sequences=True, dropout=0.4) )(t) t = keras.layers.Dense( len(string_to_no.get_vocabulary())+2, activation="softmax", name="densetwo" )(t) # Add CTC layer for calculating CTC loss at each step. output = CTCLoss(name="ctc_loss")(tags, t) # Define the model. model = keras.models.Model( inputs=[input_img, tags], outputs=output, name="handwriting" ) # Optimizer. # Compile the model and return. model.compile(optimizer=keras.optimizers.Adam()) return model# Get the model.model = generate_model()model.summary()# In[ ]:validation_images = []validation_tags = []for batch in validation_final: validation_images.append(batch["image"]) validation_tags.append(batch["tag"])# In[ ]:#epochs = 20 model = generate_model()prediction_model = keras.models.Model( model.get_layer(name="image").input, model.get_layer(name="densetwo").output)#edit_distance_callback = EarlyStoppingAtLoss()epochs = 60early_stopping_patience = 10# Add early stoppingearly_stopping = keras.callbacks.EarlyStopping( monitor="val_loss", patience=early_stopping_patience, restore_best_weights=True)# Train the model.history = model.fit( train_final, validation_data=validation_final, epochs=60,callbacks=[early_stopping])# ## Inference# In[ ]:plt.rcParams['font.family'] = ['Microsoft YaHei']# A utility function to decode the output of the network.def handwriting_prediction(pred): input_len = np.ones(pred.shape[0]) * pred.shape[1] = [] for j in results: j = tf.gather(j, tf.where(tf.math.not_equal(j, -1))) j = tf.strings.reduce_join(no_map_string(j)).numpy().decode("utf-8") output_text.append(j) return output_text# Let's check results on some test samples.for test in test_final.take(1): test_images = test["image"] _, ax = plt.subplots(4, 4, figsize=(15, 8)) predit = prediction_model.predict(test_images) predit_text = handwriting_prediction(predit) for k in range(16): img = test_images[k] img = tf.image.flip_left_right(img) img = tf.transpose(img, perm=[1, 0, 2]) img = (img * 255.0).numpy().clip(0, 255).astype(np.uint8) img = img[:, :, 0] title = f"預(yù)測結(jié)果: {predit_text[k]}"# In[ ]:創(chuàng)作不易 覺得有幫助請點(diǎn)贊關(guān)注收藏~~~
上一篇:無人駕駛學(xué)習(xí)筆記-LeGO-LOAM 算法源碼學(xué)習(xí)總結(jié)(無人駕駛基礎(chǔ)知識(shí))
下一篇:開源代碼 | FMCW-MIMO雷達(dá)仿真MATLAB(開源代碼網(wǎng)站github)
網(wǎng)站地圖: 企業(yè)信息 工商信息 財(cái)稅知識(shí) 網(wǎng)絡(luò)常識(shí) 編程技術(shù)
友情鏈接: 武漢網(wǎng)站建設(shè)