Post

[Cracking Coding Problems] 1. Arrays and Strings

1. Is Unique

1) Brute Force

  • Compare all pairs

  • when to use? : Never, only education

1
2
3
4
5
6
7
8
9
public static boolean isUniqueBruteForce(String s) {
    if (s == null) return true;
    for (int i = 0; i < s.length(); i++) {
        for (int j = i + 1; j < s.length(); j++) {
            if (s.charAt(i) == s.charAt(j)) return false;
        }
    }
    return true;
}
  • Time : O(N^2)

  • Space : O(1)

2) Sorting + Adjacent Comparison

  • If the string is sorted, any duplicatie char will appear next to each other

  • when to use? : If extera memory is not allowed

1
2
3
4
5
6
7
8
9
10
11
12
import java.util.Arrays;

public static boolean isUniqueSort(String s) {
    if (s == null) return true;
    char[] arr = s.toCharArray();
    Arrays.sort(arr);

    for (int i = 1; i < arr.length; i++) {
        if (arr[i] == arr[i - 1]) return false;
    }
    return true;
}
  • Time : O(N logN)

    Arrays.sort(arr) → O(N logN)

    Arrays.sort internally uses Dual-Pivot Quicksort (QuickSort average time = O(N logN))

  • Space

    Depends on sorting implementation → This refers to which sorting algorithm is actually used internally

    • Auxiliary space : O(1)

      Because QuickSort only in-place swaps are needed

    • With recursion stack included

      O(log N) on average

      O(N) in the worst case

  • QuickSort, click here

3) HashSet + Unicode Code Points

  • Store each char seen so far in HashSet. And If it already exists, then it is duplicate

  • In java, char is 16 bits and does not always represent a full Unicode char (e.g emojis)

    • U+0000 ~ U+FFFF → Basic Multilingual Plane (BMP) : represented with a single char

      U+10000 ~ → represented as a surrogate pair, which uses two char values

    • Therefore, iterating with for(char c: s.toCharArray()) may separate pieces

      For such surrogate pairs, use codePointAt to combine the two char values

  • when to use? : For real-world strings where unicode support matter

1
2
3
4
5
6
7
8
9
10
11
12
13
import java.util.*;

public static boolean isUniqueSet(String s) {
    if (s == null) return true;
    Set<Integer> seen = new HashSet<>();

    for (int i = 0; i < s.length(); ) {
        int cp = s.codePointAt(i);
        if (!seen.add(cp)) return false;
        i += Character.charCount(cp);
    }
    return true;
}
  • Time : O(N) average

  • Space : O(min(N, Σ)), Σ = max size of character set

    • char “aaaa” → only one distinct char → space : O(1)

    • char “abcdef” → (N=6, all distinct char) → space : O(6)

    • char “Σ size” → (N=Σ) → space : O(Σ)

4) Boolean Array (ASCII only)

  • Simplest and fastest

  • when to Use? : Every Input is ASCII

1
2
3
4
5
6
7
8
9
10
11
12
13
public static boolean isUniqueAscii(String s) {
    if (s == null) return true;
    if (s.length() > 128) return false;

    boolean[] seen = new boolean[128];
    for (int i = 0; i < s.length(); i++) {
        int c = s.charAt(i);
        if (c >= 128) return false;
        if (seen[c]) return false;
        seen[c] = true;
    }
    return true;
}
  • Time : O(N)

  • Space : O(1) (constant)

5) Bitmask (Restricted Domain: a-z)

  • Input only constrains lowercase English

  • when to Use? : the domain is strictly limited (e.g only lowercase letter)

    • domain : range of values that the input can take
1
2
3
4
5
6
7
8
9
10
11
12
13
public static boolean isUniqueLowercase(String s) {
    if (s == null) return true;
    int mask = 0;

    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c < 'a' || c > 'z') return false;
        int bit = 1 << (c - 'a');
        if ((mask & bit) != 0) return false;
        mask |= bit;
    }
    return true;
}
  • Explain code

    1
    
      int bit = 1 << (c - 'a');
    

    bit : 000..00, Insert 1 at the position

    1
    2
    
      if ((mask & bit) != 0) return false;
      mask |= bit;
    

    If there is already 1 at the bit position → duplicate → return false

    Otherwise, use bitwise OR () to insert 1
  • Example

    1
    2
    3
    
      1.	'c': bit = 1 << 2 = 0100, mask=0 → mask=0100
      2.	'a': bit = 1 << 0 = 0001, mask=0100 → mask=0101
      3.	'b': bit = 1 << 1 = 0010, mask=0101 → mask=0111
    
  • Time : O(N)

  • Space : O(1)

This post is licensed under CC BY 4.0 by the author.