merge_spans

merge_spans(text, spans, skip_spaces=True)

Combine TextSpan elements which are either overlapping or adjacent where only whitespace separates them.

Parameters

Name Type Description Default
text str A text string where spans are extracted from required
spans List[TextSpan] A list of TextSpan objects to look for overlaps and to combine required
skip_spaces bool If True, combine spans which are not overlapping but which do have only whitespace between them True

Returns

Name Type Description
List[TextSpan] List of TextSpan objects with start/end positions

Examples

>>> text = "Hello world! This is a test find the world in here."
>>> loc = txt_locate_all(text, "world")
>>> combined = merge_spans(text, loc, skip_spaces=True)
>>> loc = txt_locate_all(text, "world") + txt_locate_all(text, " wor")
>>> loc
[TextSpan(text='world', start=6, end=11), TextSpan(text='world', start=37, end=42), TextSpan(text=' wor', start=5, end=9), TextSpan(text=' wor', start=36, end=40)]
>>> combined = merge_spans(text, loc, skip_spaces=True)
>>> combined
[TextSpan(text=' world', start=5, end=11), TextSpan(text=' world', start=36, end=42)]