select_charactersT_select_charactersSelectCharactersSelectCharactersselect_characters (Operator)
Name
select_charactersT_select_charactersSelectCharactersSelectCharactersselect_characters
— Selects characters from a given region.
Signature
select_characters(Region : RegionCharacters : DotPrint, StrokeWidth, CharWidth, CharHeight, Punctuation, DiacriticMarks, PartitionMethod, PartitionLines, FragmentDistance, ConnectFragments, ClutterSizeMax, StopAfter : )
Herror T_select_characters(const Hobject Region, Hobject* RegionCharacters, const Htuple DotPrint, const Htuple StrokeWidth, const Htuple CharWidth, const Htuple CharHeight, const Htuple Punctuation, const Htuple DiacriticMarks, const Htuple PartitionMethod, const Htuple PartitionLines, const Htuple FragmentDistance, const Htuple ConnectFragments, const Htuple ClutterSizeMax, const Htuple StopAfter)
void SelectCharacters(const HObject& Region, HObject* RegionCharacters, const HTuple& DotPrint, const HTuple& StrokeWidth, const HTuple& CharWidth, const HTuple& CharHeight, const HTuple& Punctuation, const HTuple& DiacriticMarks, const HTuple& PartitionMethod, const HTuple& PartitionLines, const HTuple& FragmentDistance, const HTuple& ConnectFragments, const HTuple& ClutterSizeMax, const HTuple& StopAfter)
HRegion HRegion::SelectCharacters(const HString& DotPrint, const HString& StrokeWidth, const HTuple& CharWidth, const HTuple& CharHeight, const HString& Punctuation, const HString& DiacriticMarks, const HString& PartitionMethod, const HString& PartitionLines, const HString& FragmentDistance, const HString& ConnectFragments, Hlong ClutterSizeMax, const HString& StopAfter) const
HRegion HRegion::SelectCharacters(const char* DotPrint, const char* StrokeWidth, const HTuple& CharWidth, const HTuple& CharHeight, const char* Punctuation, const char* DiacriticMarks, const char* PartitionMethod, const char* PartitionLines, const char* FragmentDistance, const char* ConnectFragments, Hlong ClutterSizeMax, const char* StopAfter) const
HRegion HRegion::SelectCharacters(const wchar_t* DotPrint, const wchar_t* StrokeWidth, const HTuple& CharWidth, const HTuple& CharHeight, const wchar_t* Punctuation, const wchar_t* DiacriticMarks, const wchar_t* PartitionMethod, const wchar_t* PartitionLines, const wchar_t* FragmentDistance, const wchar_t* ConnectFragments, Hlong ClutterSizeMax, const wchar_t* StopAfter) const
(
Windows only)
static void HOperatorSet.SelectCharacters(HObject region, out HObject regionCharacters, HTuple dotPrint, HTuple strokeWidth, HTuple charWidth, HTuple charHeight, HTuple punctuation, HTuple diacriticMarks, HTuple partitionMethod, HTuple partitionLines, HTuple fragmentDistance, HTuple connectFragments, HTuple clutterSizeMax, HTuple stopAfter)
HRegion HRegion.SelectCharacters(string dotPrint, string strokeWidth, HTuple charWidth, HTuple charHeight, string punctuation, string diacriticMarks, string partitionMethod, string partitionLines, string fragmentDistance, string connectFragments, int clutterSizeMax, string stopAfter)
def select_characters(region: HObject, dot_print: str, stroke_width: str, char_width: Sequence[int], char_height: Sequence[int], punctuation: str, diacritic_marks: str, partition_method: str, partition_lines: str, fragment_distance: str, connect_fragments: str, clutter_size_max: int, stop_after: str) -> HObject
Description
select_charactersselect_charactersSelectCharactersSelectCharactersselect_characters
selects from a given RegionRegionRegionregionregion
the
areas which might be characters and returns them in
RegionCharactersRegionCharactersRegionCharactersregionCharactersregion_characters
. This is done by using features like
StrokeWidthStrokeWidthStrokeWidthstrokeWidthstroke_width
, DotPrintDotPrintDotPrintdotPrintdot_print
, the size of the characters and
some more. The given RegionRegionRegionregionregion
should be united, else every
RegionRegionRegionregionregion
is processed separately. Thus do not call
connectionconnectionConnectionConnectionconnection
before calling select_charactersselect_charactersSelectCharactersSelectCharactersselect_characters
, because
then fragments or dots would not be connected to a character.
If you have more than one region with text, you can of course handle
them without merging them.
The RegionRegionRegionregionregion
for select_charactersselect_charactersSelectCharactersSelectCharactersselect_characters
typically comes
from segment_characterssegment_charactersSegmentCharactersSegmentCharacterssegment_characters
but also any other segmentation operators
can be used.
The process of the selection can be partitioned into four parts.
All steps are influenced by the parameters StrokeWidthStrokeWidthStrokeWidthstrokeWidthstroke_width
,
CharHeightCharHeightCharHeightcharHeightchar_height
, and CharWidthCharWidthCharWidthcharWidthchar_width
.
If you loose small objects like dots, adapt the minimum CharWidthCharWidthCharWidthcharWidthchar_width
and the minimum CharHeightCharHeightCharHeightcharHeightchar_height
.
But some parameters affect the result of a certain step in particular.
A closer description follows below.
With the parameter StopAfterStopAfterStopAfterstopAfterstop_after
you can terminate after a specified
step.
In the first step, 'step1_select_candidates'"step1_select_candidates""step1_select_candidates""step1_select_candidates""step1_select_candidates",
CharWidthCharWidthCharWidthcharWidthchar_width
and the CharHeightCharHeightCharHeightcharHeightchar_height
are used to select the
candidates. The result of this step is also affected by
ClutterSizeMaxClutterSizeMaxClutterSizeMaxclutterSizeMaxclutter_size_max
.
In the next step, 'step2_partition_characters'"step2_partition_characters""step2_partition_characters""step2_partition_characters""step2_partition_characters", the parameter
PartitionMethodPartitionMethodPartitionMethodpartitionMethodpartition_method
and the parameter PartitionLinesPartitionLinesPartitionLinespartitionLinespartition_lines
influence
the result.
Step three, 'step3_connect_fragments'"step3_connect_fragments""step3_connect_fragments""step3_connect_fragments""step3_connect_fragments", uses the parameters
ConnectFragmentsConnectFragmentsConnectFragmentsconnectFragmentsconnect_fragments
and DotPrintDotPrintDotPrintdotPrintdot_print
. If dot-printed characters
have to be detected and some dots are not connected to the character, there
are two ways to overcome this problem:
You can increase the FragmentDistanceFragmentDistanceFragmentDistancefragmentDistancefragment_distance
and/or decrease the
StrokeWidthStrokeWidthStrokeWidthstrokeWidthstroke_width
.
In the last step, 'step4_select_characters'"step4_select_characters""step4_select_characters""step4_select_characters""step4_select_characters", the result is affected
by the parameters DiacriticMarksDiacriticMarksDiacriticMarksdiacriticMarksdiacritic_marks
and PunctuationPunctuationPunctuationpunctuationpunctuation
.
DotPrintDotPrintDotPrintdotPrintdot_print
:
Should be set to 'true'"true""true""true""true" if dot prints should be read, else to
'false'"false""false""false""false".
StrokeWidthStrokeWidthStrokeWidthstrokeWidthstroke_width
:
Specifies the stroke width of the text. It is used to calculate internally
used mask sizes to determine the characters. This mask sizes are also
influenced through the parameters DotPrintDotPrintDotPrintdotPrintdot_print
, the average
CharWidthCharWidthCharWidthcharWidthchar_width
, and the average CharHeightCharHeightCharHeightcharHeightchar_height
.
CharWidthCharWidthCharWidthcharWidthchar_width
:
This can be a tuple with up to three values. The first value is the
average width of a character. The second is the minimum width of a
character and the third is the maximum width of a character.
If the minimum is not set or equal -1, the operator automatically sets
these value depending on average CharWidthCharWidthCharWidthcharWidthchar_width
. The same is the
case if the maximum is not set. Some examples:
[10] sets the average character width to 10, the minimum and
maximum are calculated by the operator.
[10,-1,20] sets the average character width to 10, the minimum value is
calculated by the system, and the maximum is set to 20.
[10,5,20] sets the average character width to 10, the minimum to 5,
and the maximum to 20.
CharHeightCharHeightCharHeightcharHeightchar_height
:
This can be a tuple with up to three values. The first value is the
average height of a character. The second is the minimum height of a
character and the third is the maximum height of a character.
If the minimum is not set or equal -1, the operator automatically sets
these value depending on average CharHeightCharHeightCharHeightcharHeightchar_height
. The same is the
case if the maximum is not set. Some examples:
[10] sets the average character height to 10, the minimum and
maximum are calculated by the operator.
[10,-1,20] sets the average character height to 10 the minimum value is
calculated by the system and the maximum is set to 20.
[10,5,20] this sets the average character height to 10, the minimum to 5
and the maximum to 20.
PunctuationPunctuationPunctuationpunctuationpunctuation
:
Set this parameter to 'true'"true""true""true""true" if the operator also has to detect
punctuation marks (e.g., .,:'`"), otherwise they will be suppressed.
DiacriticMarksDiacriticMarksDiacriticMarksdiacriticMarksdiacritic_marks
:
Set this parameter to 'true'"true""true""true""true" if the text in your application
contains diacritic marks (e.g., â,é,ö), or to 'false'"false""false""false""false" to suppress
them.
PartitionMethodPartitionMethodPartitionMethodpartitionMethodpartition_method
:
If neighboring characters are printed close to each other, they may be
partly merged. With this parameter you can specify the method to partition
such characters. The possible values are 'none'"none""none""none""none", which means no
partitioning is performed. 'fixed_width'"fixed_width""fixed_width""fixed_width""fixed_width" means that the partitioning
assumes a constant character width. If the width of the extracted region is
well above the average CharWidthCharWidthCharWidthcharWidthchar_width
, the region is split into parts
that have the given average CharWidthCharWidthCharWidthcharWidthchar_width
. The partitioning starts at
the left border of the region.
'variable_width'"variable_width""variable_width""variable_width""variable_width" means that the characters are partitioned at the
position where they have the thinnest connection. This method can be selected
for characters that are printed with a variable-width font or if many
consecutive characters are extracted as one symbol.
It could be helpful to call text_line_slanttext_line_slantTextLineSlantTextLineSlanttext_line_slant
and/or use
text_line_orientationtext_line_orientationTextLineOrientationTextLineOrientationtext_line_orientation
before calling select_charactersselect_charactersSelectCharactersSelectCharactersselect_characters
.
PartitionLinesPartitionLinesPartitionLinespartitionLinespartition_lines
:
If some text lines or some characters of different text lines are connected,
set this parameter to 'true'"true""true""true""true".
FragmentDistanceFragmentDistanceFragmentDistancefragmentDistancefragment_distance
:
This parameter influences the connection of character fragments. If too
much is connected, set the parameter to 'narrow'"narrow""narrow""narrow""narrow" or
'medium'"medium""medium""medium""medium". In the case that more fragments should be connected,
set the parameter to 'medium'"medium""medium""medium""medium" or 'wide'"wide""wide""wide""wide".
The connection is also influenced by the maximum of CharWidthCharWidthCharWidthcharWidthchar_width
and
CharHeightCharHeightCharHeightcharHeightchar_height
. See also ConnectFragmentsConnectFragmentsConnectFragmentsconnectFragmentsconnect_fragments
.
ConnectFragmentsConnectFragmentsConnectFragmentsconnectFragmentsconnect_fragments
:
Set this parameter to 'true'"true""true""true""true" if the extracted symbols are
fragmented, i.e., if a symbol is not extracted as one region but broken
up into several parts. See also FragmentDistanceFragmentDistanceFragmentDistancefragmentDistancefragment_distance
and
StopAfterStopAfterStopAfterstopAfterstop_after
in the step 'step3_connect_fragments'"step3_connect_fragments""step3_connect_fragments""step3_connect_fragments""step3_connect_fragments".
ClutterSizeMaxClutterSizeMaxClutterSizeMaxclutterSizeMaxclutter_size_max
:
If the extracted characters contain clutter, i.e., small regions near the
actual symbols, increase this value. If parts of the symbols are missing,
decrease this value.
StopAfterStopAfterStopAfterstopAfterstop_after
:
Use this parameter in the case the operator does not produce the
desired results. By modifying this value the operator stops after the
execution of the selected step and provides the corresponding results.
To end on completion, set StopAfterStopAfterStopAfterstopAfterstop_after
to 'completion'"completion""completion""completion""completion".
Execution Information
- Multithreading type: reentrant (runs in parallel with non-exclusive operators).
- Multithreading scope: global (may be called from any thread).
- Automatically parallelized on tuple level.
Parameters
RegionRegionRegionregionregion
(input_object) region(-array) →
objectHRegionHObjectHObjectHobject
Region of text lines in which to select the characters.
RegionCharactersRegionCharactersRegionCharactersregionCharactersregion_characters
(output_object) region(-array) →
objectHRegionHObjectHObjectHobject *
Selected characters.
DotPrintDotPrintDotPrintdotPrintdot_print
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Should dot print characters be detected?
Default:
'false'
"false"
"false"
"false"
"false"
List of values:
'false'"false""false""false""false", 'true'"true""true""true""true"
StrokeWidthStrokeWidthStrokeWidthstrokeWidthstroke_width
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Stroke width of a character.
Default:
'medium'
"medium"
"medium"
"medium"
"medium"
List of values:
'bold'"bold""bold""bold""bold", 'light'"light""light""light""light", 'medium'"medium""medium""medium""medium", 'ultra_light'"ultra_light""ultra_light""ultra_light""ultra_light"
CharWidthCharWidthCharWidthcharWidthchar_width
(input_control) integer-array →
HTupleSequence[int]HTupleHtuple (integer) (int / long) (Hlong) (Hlong)
Width of a character.
Default:
25
Value range:
1
≤
CharWidth
CharWidth
CharWidth
charWidth
char_width
CharHeightCharHeightCharHeightcharHeightchar_height
(input_control) integer-array →
HTupleSequence[int]HTupleHtuple (integer) (int / long) (Hlong) (Hlong)
Height of a character.
Default:
25
Value range:
1
≤
CharHeight
CharHeight
CharHeight
charHeight
char_height
PunctuationPunctuationPunctuationpunctuationpunctuation
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Add punctuation?
Default:
'false'
"false"
"false"
"false"
"false"
List of values:
'false'"false""false""false""false", 'true'"true""true""true""true"
DiacriticMarksDiacriticMarksDiacriticMarksdiacriticMarksdiacritic_marks
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Exist diacritic marks?
Default:
'false'
"false"
"false"
"false"
"false"
List of values:
'false'"false""false""false""false", 'true'"true""true""true""true"
PartitionMethodPartitionMethodPartitionMethodpartitionMethodpartition_method
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Method to partition neighbored characters.
Default:
'none'
"none"
"none"
"none"
"none"
List of values:
'fixed_width'"fixed_width""fixed_width""fixed_width""fixed_width", 'none'"none""none""none""none", 'variable_width'"variable_width""variable_width""variable_width""variable_width"
PartitionLinesPartitionLinesPartitionLinespartitionLinespartition_lines
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Should lines be partitioned?
Default:
'false'
"false"
"false"
"false"
"false"
List of values:
'false'"false""false""false""false", 'true'"true""true""true""true"
FragmentDistanceFragmentDistanceFragmentDistancefragmentDistancefragment_distance
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Distance of fragments.
Default:
'medium'
"medium"
"medium"
"medium"
"medium"
List of values:
'medium'"medium""medium""medium""medium", 'narrow'"narrow""narrow""narrow""narrow", 'wide'"wide""wide""wide""wide"
ConnectFragmentsConnectFragmentsConnectFragmentsconnectFragmentsconnect_fragments
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Connect fragments?
Default:
'false'
"false"
"false"
"false"
"false"
List of values:
'false'"false""false""false""false", 'true'"true""true""true""true"
ClutterSizeMaxClutterSizeMaxClutterSizeMaxclutterSizeMaxclutter_size_max
(input_control) integer →
HTupleintHTupleHtuple (integer) (int / long) (Hlong) (Hlong)
Maximum size of clutter.
Default:
0
Value range:
0
≤
ClutterSizeMax
ClutterSizeMax
ClutterSizeMax
clutterSizeMax
clutter_size_max
StopAfterStopAfterStopAfterstopAfterstop_after
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Stop execution after this step.
Default:
'completion'
"completion"
"completion"
"completion"
"completion"
List of values:
'completion'"completion""completion""completion""completion", 'step1_select_candidates'"step1_select_candidates""step1_select_candidates""step1_select_candidates""step1_select_candidates", 'step2_partition_characters'"step2_partition_characters""step2_partition_characters""step2_partition_characters""step2_partition_characters", 'step3_connect_fragments'"step3_connect_fragments""step3_connect_fragments""step3_connect_fragments""step3_connect_fragments", 'step4_select_characters'"step4_select_characters""step4_select_characters""step4_select_characters""step4_select_characters"
Example (HDevelop)
for Index := 1 to 5 by 1
read_image (Image, 'dot_print_rotated/dot_print_rotated_'+Index$'02d')
text_line_orientation (Image, Image, 50, rad(-30), rad(30), \
OrientationAngle)
rotate_image (Image, ImageRotate, deg(-OrientationAngle), 'constant')
segment_characters (ImageRotate, ImageRotate, ImageForeground, \
RegionForeground, 'local_auto_shape', 'false', \
'false', 'medium', 25, 25, 0, 10, UsedThreshold)
select_characters (RegionForeground, RegionCharacters, 'true', \
'ultra_light', [60,1,100], [60,1,100], 'false', \
'false', 'none', 'true', 'wide', 'true', 0, 'completion')
endfor
Result
If the input parameters are set correctly, the operator
select_charactersselect_charactersSelectCharactersSelectCharactersselect_characters
returns the value 2 (
H_MSG_TRUE)
.
Otherwise an exception will be raised.
Possible Predecessors
segment_characterssegment_charactersSegmentCharactersSegmentCharacterssegment_characters
,
text_line_slanttext_line_slantTextLineSlantTextLineSlanttext_line_slant
Possible Successors
do_ocr_single_class_mlpdo_ocr_single_class_mlpDoOcrSingleClassMlpDoOcrSingleClassMlpdo_ocr_single_class_mlp
,
do_ocr_multi_class_mlpdo_ocr_multi_class_mlpDoOcrMultiClassMlpDoOcrMultiClassMlpdo_ocr_multi_class_mlp
Alternatives
connectionconnectionConnectionConnectionconnection
Module
Foundation