Rolling Hash

Use Cases for Rolling Hash

Use Case	Description	Example
Substring fingerprint in O(1)	After O(n) preprocessing, hash any substring `[left, right]` in constant time.	Compare `s[2:5]` and `s[10:13]` without rescanning characters.
Pattern search (Rabin–Karp)	Slide a fixed-length window; compare needle hash to each window hash, then verify on match.	Find first index of `needle` in `haystack`. 28. Find the Index of the First Occurrence in a String
Repeated substring detection	Track window hashes in a set; duplicates imply a repeated substring of that length.	187. Repeated DNA Sequences — length-10 windows on a genome string.
Binary search on answer + hash check	For “longest substring with property X”, binary search length `L` and use rolling hash to test in O(n).	1044. Longest Duplicate Substring
Palindrome / prefix checks (with reverse hash)	Maintain forward and backward hashes to test palindromes or prefix equality quickly.	214. Shortest Palindrome

Formula: hash(s[left:right+1]) = (prefix[right+1] - prefix[left] * power[length]) % mod

Code Template

Constants and prefix setup

Pick a base (often 26 or 131) and a large mod (e.g. 10**9 + 7) to reduce collisions.
Precompute power[i] = base^i mod mod and prefix[i] = hash of s[0:i).

Build prefix hashes

power[i+1] = power[i] * base % mod.
prefix[i+1] = prefix[i] * base + ord(s[i]) (or map a→0 … z→25 for lowercase letters).

Substring hash query

Inclusive range [left, right]: subtract the prefix before left, scaled by the window length.

Rabin–Karp search helper

Build RollingHash on text and pattern; compare pattern hash to each length-m window.
On hash match, verify with a direct string compare (hash collisions are possible).

class RollingHash:
    def __init__(self, s, base=131, mod=10**9 + 7):
        self.base = base
        self.mod = mod
        n = len(s)
        self.power = [1] * (n + 1)
        self.prefix = [0] * (n + 1)

Common mistakes

Wrong substring length in get_hash

# Wrong: use right instead of window length
return (self.prefix[right + 1] - self.prefix[left] * self.power[right]) % self.mod

# Correct:
length = right - left + 1
return (self.prefix[right + 1] - self.prefix[left] * self.power[length]) % self.mod

Forgetting collision check

# Wrong: return on hash match only
if hash_text.get_hash(start, start + m - 1) == target:
    return start

# Correct: verify the actual substring
if hash_text.get_hash(start, start + m - 1) == target:
    if text[start:start + m] == pattern:
        return start

Mismatched base/mod between two strings

# Wrong: different mod when comparing haystack and needle hashes
hash_text = RollingHash(haystack, mod=10**9 + 7)
hash_pattern = RollingHash(needle, mod=10**9 + 9)

# Correct: same base and mod
hash_text = RollingHash(haystack)
hash_pattern = RollingHash(needle)

Issues and consideration

Collisions: Two different strings can share the same hash mod M. Always confirm with a direct compare in contests and interviews when the answer must be exact.
Double hashing: Use two (base, mod) pairs and compare both hashes for stricter checks (common on 1044).
Negative modulo: Python % handles (a - b) % mod; in other languages normalize negative results after subtraction.
Character encoding: ord(s[i]) works for general ASCII; for a–z only, ord(s[i]) - ord('a') keeps numbers smaller.
Complexity: Preprocess O(n); each get_hash O(1); Rabin–Karp scan O(n) windows with O(1) hash each (plus occasional verify).

Same RollingHash class and get_hash; only the outer loop changes.

28. Find the Index of the First Occurrence in a String

class RollingHash:
    def __init__(self, s, base=131, mod=10**9 + 7):
        self.base = base
        self.mod = mod
        n = len(s)
        self.power = [1] * (n + 1)
        self.prefix = [0] * (n + 1)

        for i in range(n):
            self.power[i + 1] = self.power[i] * self.base % self.mod
            self.prefix[i + 1] = (self.prefix[i] * self.base + ord(s[i])) % self.mod

    def get_hash(self, left, right):
        length = right - left + 1
        return (self.prefix[right + 1] - self.prefix[left] * self.power[length]) % self.mod

def rabin_karp(text, pattern):
    n, m = len(text), len(pattern)
    if m == 0:
        return 0
    if m > n:
        return -1

    hash_text = RollingHash(text)
    hash_pattern = RollingHash(pattern)
    target = hash_pattern.get_hash(0, m - 1)

    for start in range(n - m + 1):
        if hash_text.get_hash(start, start + m - 1) == target:
            if text[start:start + m] == pattern:
                return start
    return -1

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        return rabin_karp(haystack, needle)

187. Repeated DNA Sequences

class RollingHash:
    def __init__(self, s, base=131, mod=10**9 + 7):
        self.base = base
        self.mod = mod
        n = len(s)
        self.power = [1] * (n + 1)
        self.prefix = [0] * (n + 1)

        for i in range(n):
            self.power[i + 1] = self.power[i] * self.base % self.mod
            self.prefix[i + 1] = (self.prefix[i] * self.base + ord(s[i])) % self.mod

    def get_hash(self, left, right):
        length = right - left + 1
        return (self.prefix[right + 1] - self.prefix[left] * self.power[length]) % self.mod

class Solution:
    def findRepeatedDnaSequences(self, s: str):
        n = len(s)
        window = 10
        if n < window:
            return []

        rh = RollingHash(s)
        seen = {}
        result = []

        for start in range(n - window + 1):
            h = rh.get_hash(start, start + window - 1)
            if h in seen:
                sub = s[start:start + window]
                if sub not in result:
                    result.append(sub)
            else:
                seen[h] = start

        return result

Rolling Hash

Use Cases for Rolling Hash

Code Template

Constants and prefix setup

Build prefix hashes

Substring hash query

Rabin–Karp search helper

Common mistakes

Issues and consideration

Related Problem