smithwaterman.Smith_Waterman
smithwaterman.Smith_Waterman(a, b, method='default')Perform multiple pairwise alignments using Smith Waterman for looking up a short text in a longer text. For example to search several names in a text.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| a | a str with the text | required | |
| b | a str or a list of str with the texts to lookup into a | required | |
| method | str | either ‘default’ or ‘biopython’ | 'default' |
Attributes
| Name | Type | Description |
|---|---|---|
| a (str) | a str with the text | |
| b (str | List[str]) | |
| n (int) | the length of b | |
| match (object) | the result of the match between a and b using optimal_alignment or recursive_alignment | |
| method (str) | either ‘default’ or ‘biopython’ |
Examples
>>> from textalignment import Smith_Waterman
>>> import pandas as pd
>>> text = "I am looking for John McNrow, where can I find John McAndRow?"
>>> ##
>>> ## Optimal alignment
>>> ##
>>> sw = Smith_Waterman(a = text, b = ["John McEnroe", "McEnroe John"])
>>> m = sw.optimal_alignment(type = 'characters')
>>> m = sw.optimal_alignment(type = 'characters', method = 'default')
>>> m = sw.optimal_alignment(type = 'characters', method = 'biopython')
>>> similarities = sw.as_data_frame()
>>> sw = Smith_Waterman(a = text, b = "John McEnroe")
>>> m = sw.optimal_alignment(type = 'characters')
>>> m = sw.optimal_alignment(type = 'characters', method = 'default')
>>> m = sw.optimal_alignment(type = 'characters', method = 'biopython')
>>> similarities = sw.as_data_frame()
>>>
>>>
>>> from textalignment import Smith_Waterman
>>> import pandas as pd
>>> text = "I am looking for John McNrow, where can I find John McAndRow?"
>>> ##
>>> ## Recursive alignment
>>> ##
>>> sw = Smith_Waterman(a = text, b = "John McEnroe")
>>> m = sw.recursive_alignment(type = 'characters', which = 'both', threshold = 0.5)
>>> m = sw.recursive_alignment(type = 'characters', which = 'both', threshold = 0.5, method = 'default')
>>> m = sw.recursive_alignment(type = 'characters', which = 'both', threshold = 0.5, method = 'biopython')
>>> similarities = sw.as_data_frame(m, threshold = 0.7)
>>> sw = Smith_Waterman(a = text, b = ["John McEnroe", "McEnroe John", None])
>>> m = sw.recursive_alignment(type = 'characters', which = 'both', threshold = 0.5)
>>> m = sw.recursive_alignment(type = 'characters', which = 'both', threshold = 0.5, method = 'default')
>>> m = sw.recursive_alignment(type = 'characters', which = 'both', threshold = 0.5, method = 'biopython')
>>> similarities = sw.as_data_frame(m, threshold = 0.7)
>>> similarities = similarities[['a', 'a_from', 'a_to', 'b_similarity', 'b_aligned', 'a_aligned']]
>>> substring(text, start = list(similarities['a_from']), stop = list(similarities['a_to']))
['John McNro', 'John McAndRo']Methods
| Name | Description |
|---|---|
| optimal_alignment | Use smith_waterman to find the optimal alignment between a and b |
| recursive_alignment | Use smith_waterman_recursive to find the optimal alignment between a and b in a recursive fashion |
| as_data_frame | Returns the matched data as a pandas data frame with a similarity (b_similarity) above a certain threshold |
optimal_alignment
smithwaterman.Smith_Waterman.optimal_alignment(
type='characters',
match=2,
mismatch=-1,
gap=-1,
lower=True,
tokenizer=None,
collapse=None,
edit_mark='#',
method=None,
**kwargs,
)Use smith_waterman to find the optimal alignment between a and b
See Also
smith_waterman : smith_waterman
Returns
| Name | Type | Description |
|---|---|---|
| dict | List[dict] | a list of dictionary elements as returned by smith_waterman or a list of these dictionaries in case b is a list |
recursive_alignment
smithwaterman.Smith_Waterman.recursive_alignment(
threshold=0.5,
which='both',
type='characters',
match=2,
mismatch=-1,
gap=-1,
lower=True,
tokenizer=None,
collapse=None,
edit_mark='#',
method=None,
**kwargs,
)Use smith_waterman_recursive to find the optimal alignment between a and b in a recursive fashion
See Also
smith_waterman_recursive : smith_waterman_recursive
Returns
| Name | Type | Description |
|---|---|---|
| dict | List[dict] | a list of dictionary elements as returned by smith_waterman_recursive or a list of these in case b is a list |
as_data_frame
smithwaterman.Smith_Waterman.as_data_frame(data=None, threshold=0)Returns the matched data as a pandas data frame with a similarity (b_similarity) above a certain threshold
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | a pandas data frame with the results of the alignment(s), containing columns a, b, sw, similarity, matches, mismatches, a_n, a_aligned, a_similarity, a_gaps, a_from, a_to, a_fromto, b_n, b_aligned, b_similarity, b_gaps, b_from, b_to, b_fromto |