Hi ,There
I hope you can help me with this
I need a Java algorithm to help me find how much % is string A relevant to String B
Example :
String A => Java ,C++ ,C# ,Assembly
String B => java ,c++
then Result should be => 50% relevant
it's the most simplified example

I've searched every little site on google but they are all 0% relevant to what I want

Thanks in advance :)

The example you show is for String B to be 1/2 of String A, case ignored. Perhaps that is too simple an example.
The indexOf method would give you that (both Strings changed to the same case).
Do you have any more complicated examples?

I suspect you are trying to measure the relevancy between two strings based on how many common substrings they have. if so, then you should build a method that extracts the unique parts of each string and saves them in arrays (my first thought), and then, check which array is longer. extract each element of the longer array, and check if it is in both arrays. if so, increase some int counter variable somewhere. divide counter with the length of your greater array, and you will find out how much of the longer array is contained in your shorter array, measured as a percentage. hope this helps.

The example you show is for String B to be 1/2 of String A, case ignored. Perhaps that is too simple an example.
The indexOf method would give you that (both Strings changed to the same case).
Do you have any more complicated examples?

i'm making a program that takes requirements from customer (String A) and compare it to records from DB (technical skills) String B
sort records (String B) according to how much % relevant to String A
example: Requirements => Java ,C++,C# ,Assembly

applicant 2=> java , c ,C++ ..... 75%
applicant 1=> java,c ..... 50%

the problem i'm facing is "Java & java should be the same " and how to know the percentage of matching

hope i'm clear :)

Change both strings to the same case for ease of comparing and searching.

Your app looks more like token matching. Extract all the tokens from both strings, normalize them to the same case, sort and then count the number of matches.

You may be able to do it in a more clever way?

Add up the total (integer values) of each and compare them?

edit:

It was just an idea-- at least it would help show if one is greater or less than or equal (100% match).. similar to C's strcmp(). At the very least they could make for some good methods in your class, isGreater(), isLesser(), isEqual() etc.

You may be able to do it in a more clever way?

Add up the total (integer values) of each and compare them?

edit:

It was just an idea-- at least it would help show if one is greater or less than or equal (100% match).. similar to C's strcmp(). At the very least they could make for some good methods in your class, isGreater(), isLesser(), isEqual() etc.

Thanks, everybody
that's very inspiring :)
I'm working on it

@\007: If you are you referring to the numerical value of each character, that won't be of any help. The added char values of "aaabb" and "zzyz" are the same. I wouldn't call them equal though.

Or am I misunderstanding what you mean by the integer values of each?

That's true, but I would start by comparing the lengths. If the lengths are different then then they are not a direct match.

I was initially thinking of how I would try to quickly identify a 100% match.

* Are the strings the same length?
* If yes, does sum(string1) == sum(string2)
* If yes, strings are a match.

I am unsure if that would work in every case, but the time should be fairly low and would give a better average case than immediately calling a hefty method that created arrays, sorted/analyzed etc.

It was just a quick thought. A better approach may be ripping off C's strcmp() and modifying it to return 100 on a direct match or a percent based on how close the match was.

* Are the strings the same length?
* If yes, does sum(string1) == sum(string2)
* If yes, strings are a match.

Unfortunately that won't work either. "arxvt" and "zzyfb" will still be equivalent under that approach.

I agree with Norm's assessment above. Split the string, normalize the case, and compare the occurrences since you would probably want a 100% on these two cases as well: "java, c++" vs "C++,java".

i made this but i'm still working on it
but it's the general idea
tell me if there's a better way

package string_matcher;

import java.util.ArrayList;
import java.util.StringTokenizer;

public class Main {

    public static void main(String[] args) {
        String req = "java,C++,Assembly";
        String qualifications = "Java,C++,C,php,assembly,mysql,html,power designer,photoshop";
        ArrayList<String> re = new ArrayList();
        ArrayList<String> skills = new ArrayList();
        StringTokenizer str = new StringTokenizer(req, ",");
        while (str.hasMoreTokens()) {
            re.add(str.nextToken());
        }
        int total = re.size();
        StringTokenizer sk = new StringTokenizer(qualifications, ",");
        while (sk.hasMoreTokens()) {
            skills.add(sk.nextToken());
        }
        int counter = 0;
        float perc = 0;
        String skill = "";
        String require = "";
        for (int i = 0; i < re.size(); i++) {
            for (int j = 0; j < skills.size(); j++) {
                skill = skills.get(j).toLowerCase();
                require = re.get(i).toLowerCase();
                if (skill.contains(require)) {
                    if (counter <= total) {
                        counter++;
                    }
                }
            }
        }
        System.out.println("REquirements " + re);
        System.out.println("tech  " + skills);
        perc = counter / total;
        System.out.print("\n" + perc + "%");
    }
}

this is exactly what I thought the first time I posted. however, another way, for less code would be the following:

instead of breaking your qualifications string into tokens, you can just check each token extracted from skills against the entire string that is held by qualifications variable. then, each time you find a match, increase counter. this way, counter will always be less than or equal to the number of tokens in your skills string. divide the counter by the number of tokens in skills, and you get your percentage. I figured this out after I checked your code.

I ran your code and it worked fine for me. I liked it.

You can squash it down a little bit if you want

String req = "java,C++,Assembly,Lisp";
        String qualifications = "Java,C++,C,php,assembly,mysql,html,power designer,photoshop";
        List<String> re = Arrays.asList(req.toLowerCase().split(","));
        List<String> skills = Arrays.asList(qualifications.toLowerCase().split(","));
        int counter = 0;
        float perc = 0;
        for (String require : re) {
            if (skills.contains(require)) {
                    counter++;
            }
        }
        System.out.println("REquirements " + re);
        System.out.println("tech  " + skills);
        perc = counter / (float) re.size()*100;
        System.out.println(perc + "%");

but your approach is essentially fine as it is.

Member Avatar for ztini

Maybe I'm missing something here, but isn't this essentially just a retainAll problem?

String reqs = "java, C++, Assembly, Lisp";
		String qual = "Java, C++, C, php, assembly, mysql, html, photoshop";
			
		ArrayList<String> reqsList = new ArrayList<String>(Arrays.asList(reqs.split(", ")));
		ArrayList<String> qualList = new ArrayList<String>(Arrays.asList(qual.split(", ")));

		qualList.retainAll(reqsList);

		System.out.println(qualList.size() / reqsList.size() * 100.0 + "%");
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.