Message328593
| Author |
Springem Springsbee |
| Recipients |
Springem Springsbee, terry.reedy |
| Date |
2018-10-26.19:00:41 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1540580441.56.0.788709270274.issue35079@psf.upfronthosting.co.za> |
| In-reply-to |
|
| Content |
Hello, I'm using difflib's SequenceMatcher to locate common substrings. It seems like the matcher is missing a common substrings. I'm guessing this is a rather low-level issue in difflib. The autojunk parameter has no effect for obvious reasons. Alternate pairwise comparisons between the following 3 strings omit the 2-character match 'AC'
GATTACA
TAGACCA
ATACA
The following Github gist captures the issue, which I'll repeat here for redundancy https://gist.github.com/MatthewRalston/b0ab6ac1dbe322cb12063310ccdbb786
>import difflib
>string1 = "TAGACCA"
>string2 = "ATACA"
>s = difflib.SequenceMatcher(None, string1, string2)
>blox = s.get_matching_blocks()
>print(blox)
[Match(a=0, b=1, size=2), Match(a=5, b=3, size=2), Match(a=7, b=5, size=0)] # Missing Match(a=3, b=2, size=2)
>print([string1[x.a:x.a+x.size] for x in blox if x.size > 1])
['TA', 'CA'] # Missing the substring 'CA' |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2018-10-26 19:00:41 | Springem Springsbee | set | recipients:
+ Springem Springsbee, terry.reedy |
| 2018-10-26 19:00:41 | Springem Springsbee | set | messageid: <1540580441.56.0.788709270274.issue35079@psf.upfronthosting.co.za> |
| 2018-10-26 19:00:41 | Springem Springsbee | link | issue35079 messages |
| 2018-10-26 19:00:41 | Springem Springsbee | create | |
|