Code Templates
Rolling Hash
Use Cases for Rolling Hash
| Use Case | Description | Example |
|---|---|---|
| Substring fingerprint in O(1) | After O(n) preprocessing, hash any substring [left, right] in constant time. | Compare s[2:5] and s[10:13] without rescanning characters. |
| Pattern search (Rabin–Karp) | Slide a fixed-length window; compare needle hash to each window hash, then verify on match. | Find first index of needle in haystack. 28. Find the Index of the First Occurrence in a String |
| Repeated substring detection | Track window hashes in a set; duplicates imply a repeated substring of that length. | 187. Repeated DNA Sequences — length-10 windows on a genome string. |
| Binary search on answer + hash check | For “longest substring with property X”, binary search length L and use rolling hash to test in O(n). | 1044. Longest Duplicate Substring |
| Palindrome / prefix checks (with reverse hash) | Maintain forward and backward hashes to test palindromes or prefix equality quickly. | 214. Shortest Palindrome |
Formula: hash(s[left:right+1]) = (prefix[right+1] - prefix[left] * power[length]) % mod
Code Template
Constants and prefix setup
- Pick a base (often 26 or 131) and a large mod (e.g.
10**9 + 7) to reduce collisions. - Precompute
power[i] = base^i mod modandprefix[i]= hash ofs[0:i).
Build prefix hashes
power[i+1] = power[i] * base % mod.prefix[i+1] = prefix[i] * base + ord(s[i])(or mapa→0 …z→25 for lowercase letters).
Substring hash query
- Inclusive range
[left, right]: subtract the prefix beforeleft, scaled by the window length.
Rabin–Karp search helper
- Build
RollingHashon text and pattern; compare pattern hash to each length-mwindow. - On hash match, verify with a direct string compare (hash collisions are possible).
class RollingHash:def __init__(self, s, base=131, mod=10**9 + 7):self.base = baseself.mod = modn = len(s)self.power = [1] * (n + 1)self.prefix = [0] * (n + 1)
Common mistakes
- Wrong substring length in
get_hash
# Wrong: use right instead of window lengthreturn (self.prefix[right + 1] - self.prefix[left] * self.power[right]) % self.mod# Correct:length = right - left + 1return (self.prefix[right + 1] - self.prefix[left] * self.power[length]) % self.mod
- Forgetting collision check
# Wrong: return on hash match onlyif hash_text.get_hash(start, start + m - 1) == target:return start# Correct: verify the actual substringif hash_text.get_hash(start, start + m - 1) == target:if text[start:start + m] == pattern:return start
- Mismatched base/mod between two strings
# Wrong: different mod when comparing haystack and needle hasheshash_text = RollingHash(haystack, mod=10**9 + 7)hash_pattern = RollingHash(needle, mod=10**9 + 9)# Correct: same base and modhash_text = RollingHash(haystack)hash_pattern = RollingHash(needle)
Issues and consideration
- Collisions: Two different strings can share the same hash mod
M. Always confirm with a direct compare in contests and interviews when the answer must be exact. - Double hashing: Use two
(base, mod)pairs and compare both hashes for stricter checks (common on 1044). - Negative modulo: Python
%handles(a - b) % mod; in other languages normalize negative results after subtraction. - Character encoding:
ord(s[i])works for general ASCII; fora–zonly,ord(s[i]) - ord('a')keeps numbers smaller. - Complexity: Preprocess O(n); each
get_hashO(1); Rabin–Karp scan O(n) windows with O(1) hash each (plus occasional verify).
Related Problem
Same RollingHash class and get_hash; only the outer loop changes.
28. Find the Index of the First Occurrence in a String
class RollingHash:def __init__(self, s, base=131, mod=10**9 + 7):self.base = baseself.mod = modn = len(s)self.power = [1] * (n + 1)self.prefix = [0] * (n + 1)for i in range(n):self.power[i + 1] = self.power[i] * self.base % self.modself.prefix[i + 1] = (self.prefix[i] * self.base + ord(s[i])) % self.moddef get_hash(self, left, right):length = right - left + 1return (self.prefix[right + 1] - self.prefix[left] * self.power[length]) % self.moddef rabin_karp(text, pattern):n, m = len(text), len(pattern)if m == 0:return 0if m > n:return -1hash_text = RollingHash(text)hash_pattern = RollingHash(pattern)target = hash_pattern.get_hash(0, m - 1)for start in range(n - m + 1):if hash_text.get_hash(start, start + m - 1) == target:if text[start:start + m] == pattern:return startreturn -1class Solution:def strStr(self, haystack: str, needle: str) -> int:return rabin_karp(haystack, needle)
class RollingHash:def __init__(self, s, base=131, mod=10**9 + 7):self.base = baseself.mod = modn = len(s)self.power = [1] * (n + 1)self.prefix = [0] * (n + 1)for i in range(n):self.power[i + 1] = self.power[i] * self.base % self.modself.prefix[i + 1] = (self.prefix[i] * self.base + ord(s[i])) % self.moddef get_hash(self, left, right):length = right - left + 1return (self.prefix[right + 1] - self.prefix[left] * self.power[length]) % self.modclass Solution:def findRepeatedDnaSequences(self, s: str):n = len(s)window = 10if n < window:return []rh = RollingHash(s)seen = {}result = []for start in range(n - window + 1):h = rh.get_hash(start, start + window - 1)if h in seen:sub = s[start:start + window]if sub not in result:result.append(sub)else:seen[h] = startreturn result