Problem: String Similarity (Z Function)

*Difficulty: Hard

Problem:

For two strings A and B, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings “abc” and “abd” is 2, while the similarity of strings “aaa” and “aaab” is 3.

Calculate the sum of similarities of a string S with each of it’s suffixes.

Input:
The first line contains the number of test cases T. Each of the next T lines contains a string each.

Output:
Output T lines containing the answer for the corresponding test case.

Constraints:
1 <= T <= 10
The length of each string is at most 100000 and contains only lower case characters.

Sample Input:

2
ababaa  
aa

Sample Output:

11  
3

Explanation:
For the first case, the suffixes of the string are “ababaa”, “babaa”, “abaa”, “baa”, “aa” and “a”. The similarities of these strings with the string “ababaa” are 6,0,3,0,1, & 1 respectively. Thus, the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.

For the second case, the answer is 2 + 1 = 3.
Note: I copy entire problem from hackkerrank just for education purpose, please don’t give me a lawsuit for the license or copyright 🙂

How to solve:
Well, to be honest,  I can only think of two way to solve. Unfortunately, they’re both O(n^2) complexity. Here is the easiest one, which I can instantly think of
Screen Shot 2016-07-15 at 11.10.22 AM
It works, but it’s too slow for all kind of input (I will explain later, because my second method work well on some kind of input, but not all) . The idea is dividing the string from input into substrings, check for the prefix similarity between the substring and the original string.

My second method:

Screen Shot 2016-07-15 at 11.17.07 AM

This work well on some input like this:

ababaa

Start, we got i = 1, l = 0, r = 0, n = 0 (n means next, store the position of the next point, the next point is the first character (count from 1) of the substring that is the same with the character at 0)
– The substring starts from i = 1 is ‘babaa’, prefix similarity = 0 , just because the first letter of the substring is different than the letter at position 0
– The substring starts from i = 2 is ‘abaa’, prefix similarity = 3 (‘aba’), l = 2, r = 4, n = 4. Here we can choose either to step forward from r or n, because they’re both equal 4. So i = 4
– The substring starts from i = 4 is ‘aa’, prefix similarity = 1 (‘a’), l = 4,r=4,n=0. Here because l == r, i will be equal with r + 1. Then reset l,r to zero
– The substring starts from i = 5 is ‘a’, prefix similarity = 1 (‘a’), l = 4, r = 4, n = 0. Here is reach the end of the string, we check l,r. If l is not zero, if l equals r, then plus 1, otherwise, plus ((r-l)+1)
Result = length(s) + 0 + 3 + 1 + 1 = 11
But this method will run with O(n^2) complexity on some input like this:

aaaaaaa

abababababa

 

And here is the ultimate method, aka Z FUNCTION .

I don’t know you can understand it or not, but I don’t have time to understand it, so I skip to the most important part, the implementation of z function .
Screen Shot 2016-07-15 at 11.33.20 AM

And here is the implementation in Python 2 : (stringSimilarity)
Screen Shot 2016-07-15 at 11.34.59 AM

Note the last line of code , we must add length of the input string into the result, because the string itself is also its prefix similarity string, and the z function run from 1.

Unfortunately, I cant update .txt file, which is big testcase file. So you have to spend your hackos to download it.
After all, this is nothing than just implementation of an algorithm. If you can think of the z function without reading the article or searching for google, then you’re genius. :>

 

 

Advertisements