What is the most efficient way to search for unknown patterns in a string?

Norma King based on your specs "y" should not be returned. "are more than 1 character long" would have to be "at least 1 character long"

First of -> java uses the heap for strings so if this is heavily used I would switch to Rust, C, C++ or any language where you can use the stack for strings.

Assumptions:

Input: "the boy fell by the bell"

Output: 'ell', 'the b'

we parse from left to right so we have a best case time-complexity O(n) * O(n-1) if nothing matches.

first pass you create a registry where you add all strings of length 2 so we have the base of our next assumption

we have the top down or bottom up approach both have their advantages I will go top down from the small to big. to optimize performance for matching patterns we could add the special case with 1 look ahead -> "the " if there is a space after our lookup match we can assume this a unique word. but I leave the special cases for now :)

G(s:2) ∈ G(s:3) ∈ G(s:4) ..... ∈ G(all)

The idea is to first define the base but already filter out groups that are not relevant. if they only exist once, which adds on some extra time complexity, but it reduces the base and it keeps the complexity of the implementation lower as well.

the follow up time complexities are going down by throwing away all "1 hit" results at the end

type token {
    match: string,
    pos: posRange
}

type posRange {
    start: int,
    end: int
}

function hitByRange(input: string, hits: token[], range: int) : token[] {
    for (int i = 0;i < len(input); i++) {
        // don't match any string that starts with an empty space or ends with one
        if (input[i] == ' ' || input[i+range] == ' ') {
            continue;
        }
        matchingStr := sub_string(input, i, i+range);
        for (int x = i+1; x < len(input); x++) {
            compStr := sub_string(input, x, x+range);
            if (matchingStr == compStr) {
                hits.push({ matchingStr, {x, x+range}});
            }
        } 
    } 

    return filterHits(hits);
}

function filterHits(hits: token[]) : string[]token[] {
    filteredHits := string[]tokens[]
    for token in hits {
        let tokenSet := findToken(hits, token);
        if (len(tokenSet) == 1) {
            continue;
        }
        filteredHits.push({tokenSet[0].match, tokenSet});
    }
    return filteredHits;   
}

this is the basic matching implementation it's not java specific ... i know but it's the idea I wanted to transport not the idiomatic approach!

this is a pseudo implementation and only the first iteration. maybe someone else has a special java lib for you ;D .... I hope this helps at least a little....

Thread

What is the most efficient way to search for unknown patterns in a string?

Responses(1)

Search Hashnode

What is the most efficient way to search for unknown patterns in a string?

Responses(1)