COMBINE_LANG_MODEL(1) | COMBINE_LANG_MODEL(1) |
NAME¶
combine_lang_model - generate starter traineddata
SYNOPSIS¶
combine_lang_model --input_unicharset filename --script_dir dirname --output_dir rootdir --lang lang [--lang_is_rtl] [pass_through_recoder] [--words file --puncs file --numbers file]
DESCRIPTION¶
combine_lang_model(1) generates a starter traineddata file that can be used to train an LSTM-based neural network model. It takes as input a unicharset and an optional set of wordlists. It eliminates the need to run set_unicharset_properties(1), wordlist2dawg(1), some non-existent binary to generate the recoder (unicode compressor), and finally combine_tessdata(1).
OPTIONS¶
--lang lang
--script_dir PATH
--input_unicharset FILE
--lang_is_rtl BOOL
--pass_through_recoder BOOL
--version_str STRING
--words FILE
--numbers FILE
--puncs FILE
--output_dir PATH
HISTORY¶
combine_lang_model(1) was first made available for tesseract4.00.00alpha.
RESOURCES¶
Main web site: https://github.com/tesseract-ocr Information on training tesseract LSTM: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
SEE ALSO¶
COPYING¶
Copyright (C) 2012 Google, Inc. Licensed under the Apache License, Version 2.0
AUTHOR¶
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).
11/17/2021 |