Saturday, 27 June 2015

Z Algorithm

The concept of Z Algorithm is quite simple.
First of all, let us see what is a Z function or a Z array. At any index i, Z[i] stores longest match between
String[i,n] and String[0,n].
For example,
String                Z array
a                        1
aa                       21
aba                     301
ababa                 50301
tests                    50010

So, I hope after reading these examples i hope, its quite clear what a Z array is.
Let us redefine it, Z array stores longest match between all suffixes of a string and string itself. Or we can say if for any i, Z[i]>0 then that suffix is also a prefix of that string with length Z[i].

It has various applications in substring matching and finding count of a substring in a string.
Now let us see how to compute this array.

O(n^2) Algorithm : The Brute Force One.

Here, we one by one see each and every suffix and match it with the string.
for(i=0 to n-1)                                                             // assuming 0 indexing, n=string length
        {
            String suffix=str.substring(i,n);                      // suffix starting from i
            int m=suffix.length();                                    // m=suffix length
            int match=0;                                                 // for storing current longest match.
            for(int j=0;j<m;j++)
            {
                if(str.charAt(j)==suffix.charAt(j)) match++;           // if match found increment match
                else break;                                                                 // else break
            }
            Z[i]=match;                                                  // store the match in Z[i].
        }

In this way we can get the Z function in O(n^2) time, but as we can see all later suffixes are also suffixes of previous ones.We can use this fact to come up with a better approach.

O(n) Algorithm : The Smart One.

As, I have said earlier we can use a fact that we can use the knowledge of previous computed longest match of a previous suffix.To do this we will use two variables L and R (interval) .These two variables for each index will store length of longest substring string[L..R] that is also a prefix of the string.Now how are these two values useful, as we can see in case of a match we can Z[i]=R-L+1; and if no match we need to make R such that Z[i] is 0 and L will be start of that match.

For example if string is ababa
Index              L:R
 0                   0:4
1                   1:0
2                   2:4
3                   3:2
4                   4:4
Now let us see how the algorithm works,suppose we have calculated the values of L and R for an index i, then for an index i+1, we have two simple conditions,if(i>R) then we need to compute new values of L and R,else we can either use a precomputed Z value or compute a new interval.

Now let us see how the algorithm works (source : codeforces blog)

        int L=0,R=0;                                                   // interval L and R initialize
        Z[0]=n;                                                          // this will always be n
        for(i=1 to n)                                                   // start with 1 now.
        {
            if(i>R)                                                       // our initial case
            {
                L=R=i;                                                 // set L and R initially before matching
                while(R<n&&str.charAt(R)==str.charAt(R-i)) R++;          //new R
                Z[i]=R-L;                                            // store Z
                R--;                                                      // bring R back, since last R did not matched.
            }
            else
            {
                int k=i-L;                                           //length of current block of L...R
                if(Z[k]<R-i+1) Z[i]=Z[k];              //since L...R is equal to 0...R-L+1 (match thats why L,R)
                else
                {
                    L=i;                                             // in this case a greater match is possible
                    while(R<n&&str.charAt(R)==str.charAt(R-i)) R++;       //compute a new R
                    Z[i]=R-L;R--;                             // store the new match
                }
            }
        }

In this way, we can compute the Z array.
Problems : CHSTR

No comments:

Post a Comment