merge_spans
merge_spans(text, spans, skip_spaces=True)
Combine TextSpan elements which are either overlapping or adjacent where only whitespace separates them.
Parameters
| text |
str |
A text string where spans are extracted from |
required |
| spans |
List[TextSpan] |
A list of TextSpan objects to look for overlaps and to combine |
required |
| skip_spaces |
bool |
If True, combine spans which are not overlapping but which do have only whitespace between them |
True |
Returns
|
List[TextSpan] |
List of TextSpan objects with start/end positions |
Examples
>>> text = "Hello world! This is a test find the world in here."
>>> loc = txt_locate_all(text, "world")
>>> combined = merge_spans(text, loc, skip_spaces=True)
>>> loc = txt_locate_all(text, "world") + txt_locate_all(text, " wor")
>>> loc
[TextSpan(text='world', start=6, end=11), TextSpan(text='world', start=37, end=42), TextSpan(text=' wor', start=5, end=9), TextSpan(text=' wor', start=36, end=40)]
>>> combined = merge_spans(text, loc, skip_spaces=True)
>>> combined
[TextSpan(text=' world', start=5, end=11), TextSpan(text=' world', start=36, end=42)]