Hiya,

I need to learn the SQL query that counts all the points from more than one column and orders the matching rows based on most points in descending order.

EXAMPLE 1:
I do a keyword search for "mobile phone tutorial apps". Note 4 words.
Sql should find all the rows that contain any of these 4 keywords.
That means, since I searched for 4 keywords, sql should even pull-up those rows that just have even 1 or 2 or 3 of these 4 keywords in them.

Say, my mysql table looks like this:


id |  kw1   | kw1_point |   kw2   |  kw2_point  |   kw3   | kw3_point |    kw4     | kw4_point 
----------------------------------------------------------------------------------------------
0  |  mobile |    3    |  phone  |     3       |  apps    |     2    |    tutorial  |    2
----------------------------------------------------------------------------------------------
1  |  mobile |    1    |  phone  |     1       |  apps    |     3    |    tutorial  |    3
----------------------------------------------------------------------------------------------
2  | tutorial |   3   |   apps   |     3       |  mobile  |     2    |   phone      |    3
----------------------------------------------------------------------------------------------
3  |  mobile |    5    |  tutorial     |      5       |  apps     |      5     |    usa        |    5

NOTE: All 4 keywords exists on the first 3 matching rows. However, only 3 words exist in the final matching row.
And the final matching row's keywords are not in the order of my keyword search.
here in Q1, this should not matter. Sql should ignore in which order the keywords are in each column when comparing the order of my searched keywords. All it should do, is:

A). Find matching rows, regardless of how many of my searched keywords exist on each row;
B). Count the totals of each points, (count more than one column in this case), in each row; And
C) List the rows in the point's descending order.

As you can see, from my example, the following are the keyword points of each row after calculating ALL their keyword points (all point columns):

id 0 = 10 points
id 1 = 8 points
id 2 = 11 points
id 3 = 20

So, in this case, the SQL query should present rows in this descending order:

id 3 = 20
id 2 = 11 points
id 0 = 10 points
id 1 = 8 points

id |  kw1   | kw1_point |   kw2   |  kw2_point  |   kw3   | kw3_point |    kw4     | kw4_point 
----------------------------------------------------------------------------------------------
3  |  mobile |    5    |  tutorial |   5       |  apps    |    5     |    usa    |    5
----------------------------------------------------------------------------------------------
2  | tutorial |   3   |   apps    |    3       |  mobile  |    2     |   phone   |    3
----------------------------------------------------------------------------------------------
0  |  mobile |    3    |  phone  |     3       |  apps    |     2    |    tutorial  |    2
----------------------------------------------------------------------------------------------
1  |  mobile |    1    |  phone  |     1       |  apps    |     3    |    tutorial  |    3
----------------------------------------------------------------------------------------------

Had there been only a single "keyword point" (kw1_point) to calculate, then I would have written the SQL like this using prepared statements:

$sql = "SELECT * from keywords WHERE kw1 = ? OR kw2 = ? OR kw3 = ? OR kw4 = ? order by kw1_point desc LIMIT 100";

The confusion arises when there is more than one column's values to count.
Now the next question ...

Q2.
I need another SQL query that will behave similar to Q1 but rows with less number of keyword matches should be given less priority.
So, in this case, from my example, SQL should present in this order:

id 2 = 11 points
id 0 = 10 points
id 1 = 8 points
**id 3 = 20**
**id 4 = 50**
id |  kw1   | kw1_point |   kw2   |  kw2_point  |   kw3   | kw3_point |    kw4     | kw4_point 
----------------------------------------------------------------------------------------------
2  | tutorial |   3   |   apps    |    3       |  mobile  |    2     |   phone   |    3
----------------------------------------------------------------------------------------------
0  |  mobile |    3    |  phone  |     3       |  apps    |     2    |    tutorial  |    2
----------------------------------------------------------------------------------------------
1  |  mobile |    1    |  phone  |     1       |  apps    |     3    |    tutorial  |    3
----------------------------------------------------------------------------------------------
3  |  mobile |    5    |  tutorial |   5       |  apps    |    5     |    usa    |    5
----------------------------------------------------------------------------------------------
4  |  cell   |    50   |  phone   |      50    |  uk     |    50     |    prices  |   50
----------------------------------------------------------------------------------------------

NOTE: id3 had the highest point (excluding id4) but it was listed below the other 3 rows (id0, id1 & id2) due to the fact that it only had 3 matching keywords while the other 3 above it had 4 matching keywords.
So the less the matching keywords, the bad the ranking.

NOTE: id4 had more points that id3, but it was ranked below id3 due to the fact that it had less matching keywords than id 3. Only one matching keyword "phone".

Folks,

I was suggested this SQL but it only presents me with 1 matching row (id:0) even though there exist other matching rows in mysql tbl.

SELECT * from keywords WHERE kw_1 = ? OR = ? OR kw_2 = ? OR = ? OR kw_3 = ? OR = ? OR kw_4 = ? ORDER BY (kw_1_point+kw_2_point+kw_3_point+kw_4_point) DESC

Check above for my mysql tbl for id:0.

Hello Programmers,

Got any clues what is wrong ?

@gce517

Have you ever built your owm searchengine or web crawler or even a membership php script (reg, login, logout, search, etc.) ?

Anyway, as you have guessed, this thread is related to building the web crawler.
And it is related to this thread building the searchengine:
https://www.daniweb.com/programming/web-development/threads/539998/how-to-order-by-adding-total-of-all-columns-in-sql

Also related to this one, building the Index:
https://www.daniweb.com/programming/web-development/threads/540013/how-to-find-does-not-contain-or-does-contain

Big corporates usually take their time copying your's.
Adding features other searchengines don't have. That way, people use mine especially for those particular features absent elewhere.
Very interested to build my own searchengine & crawler asap. Tonnes interested to test it and use it and see how many people like it to start using it. It will be my own toy. I hope you enjoy playing with it too.
I reckon the fun part for me will be testing the crawler and seeing how well it ranks relevant links.

@dani

Are you in USA ?
Don't mind. You got Aussie look. If you not Aussie, do you know what an Aussie is ? Lol!

CORRECTION:

Folks,

I was suggested this SQL but it only presents me with 1 matching row (id:0) even though there exist other matching rows in mysql tbl.

SELECT * from keywords WHERE kw_1 = ? OR kw_2 = ? OR kw_3 = ? OR kw_4 = ? ORDER BY (kw_1_point+kw_2_point+kw_3_point+kw_4_point) DESC

Check above for my mysql tbl for id:0.

Have you tried ChatGPT?

commented: This is the new "Did you Google it?" +1 +17

Your query is always counting all points in a record, but shouldn't it only count those with a matching keyword?

$sql = "SELECT * from keywords WHERE kw1 = ? OR kw2 = ? OR kw3 = ? OR kw4 = ? order by kw1_point desc LIMIT 100";

You're using OR between the clauses.

Also, that schema design is... yikes. Why exactly do you need each keyword to have a different score for every document? If I were you I'd just do two tables: one containing keywords and their scores, second containing the actual document-to-keyword relations. Makes the logistics much easier.

Or, since you want to do text queries, why not put the keywords into a single db field and run the fulltext match (or whatever is your database system's equivalent) against that?

@pritaeas
Sorry for the late reply. I did not logon yesterday.
You are 100% RIGHT in spotting my mistake.
You have 100% CORRECTLY understood what I intend to do. ONLY COUNT the POINTS of MATCHING KEYWORDS on EACH ROW.
And then RANK the records in DESC ORDER.
So, how to DO IT ? The SQL, I mean ?

@Fifth Horseman

Sorry mate!
I made a mistake on my original post. Talking about the first table.
My table actually was supposed to look something like this:

id | url | kw1 | kw1_point | kw2 | kw2_point | kw3 | kw3_point | kw4 | kw4_point

0 | borobhaisab.com | mobile | 3 | phone | 3 | apps | 2 | tutorial | 2

1 | fifthhorse.com| mobile | 1 | phone | 1 | apps | 3 | tutorial | 3

2 | pritaeas.com | tutorial | 3 | apps | 3 | mobile | 2 | phone | 3

3 | daniweb.com | mobile | 5 | tutorial | 5 | apps | 5 | usa | 5

SQL should ONLY COUNT the POINTS (point cols) of the MATCHING KEYWORDS ONLY on EACH ROW. And not the points of any non-matching keyword.
And then RANK the records in DESC ORDER.
So, how to DO IT ? The SQL, I mean ?

@pritaeas

I am an intermediate level student and I have a list of SQL query basics which I must learn. And so, I just structured the following table which you see below, to get my point across to you fine folks what I intend to learn from senior programmers like yourselves. And so thanks for baring with me and try aiding my learning as much as possible.

I made one serious mistake on my ORIGINAL POST which you pointed-out to me.
I was intending to ONLY COUNT the POINTS of MATCHING KEYWORDS on EACH ROW/RECORD.
And NOT COUNT the POINTS of the non-MATCHING KEYWORDS on the MATCHING RECORDS before assigning the point score to the record.
And then RANK the MATCHING records in DESC ORDER.
So, how to DO IT ? The SQL, I mean ?

Maybe you are confused and so let us draw-up a new table example here for your convenience and mine.
Say, I did an EXACT MATCH (no wildcard) keywords/keyphrase search for:

php book usa

Or,

php+book+usa

Now, it should not matter in what arrangements the searched keywords are in OR how many MATCHING keywords exist in a record.
Aslong as any keyword exists in a record, that record should be presented to the searcher.
Say my table records look like this:

Table: Links
id | url | kw1 | kw1_point | kw2 | kw2_point | kw3 | kw3_point | kw4 | kw4_point
0 | pritaeas.com | usa | 3 | phone | 3 | apps | 2 | tutorial | 2
1 | daniweb.com| uk | 1 | php | 4 | apps | 3 | price | 3
2 | borobhaisab.com | tutorial | 3 | book | 3 | php | 2 | usa | 3
3 | php.com | PHP's | 5 | books | 5 | united states america | 5 | prices | 5
4 | php.com | python | 5 | book | 5 | apps | 5 | usa | 5

I have highlighted the matching keywords.

NOTE 1:
Notice that the row, id:3, actually got the matching keywords. But since the keywords are not EXACT, then SQL should NOT draw-up this record.

PHP's is not an exact match of php.
Books is not an exact match for book.
united states america is not an exact match for usa.

Hence, record: id:3, should be IGNORED.
Also notice record id:4. It got php.com. But that is not an exact match for php. Hence, that Mysql cell should be ignored too.

NOTE 2:
Record id: 2 got 3 matches, while record id: 4 got 2.
However, record id:3 scores a total of 10 points while record id:2 scores 8.
Hence, in this case, record 1d:3 should get better ranking when presented to the keyword searcher.

Because ranking should not be based on how many keywords matched in a record but total points of all matching keywords in a record.

So, this is how the SQL should rank things before presenting the records to the keyword searcher:

Result ...
4 | php.com | python | 5 | book | 5 | apps | 5 | usa | 5 -------------------- 10 point
2 | borobhaisab.com | tutorial | 3 | book | 3 | php | 2 | usa | 3 --------------------- 8 point
1 | daniweb.com| uk | 1 | php | 4 | apps | 3 | price | 3 ---------------------- 4 point
0 | pritaeas.com | usa | 3 | phone | 3 | apps | 2 | tutorial | 2 ---------------------- 3 point

The total point section on the right on the above presented result, is for your convenience ONLY to understand things how I want done. Obviously, I do not expect SQL to echo points in that manner. I can write php to do that, if I really want to.

You may have a question, why should SQL present record id: 0 here, since the record is totally irrelevant.

ANSWER: It does not matter if the record is irrelevant or not to the keyword search in such examples. There is an EXACT matching keyword here "usa". And so, this record counts in this EXACT MATCH query.

So, how to write the SQL ?

Thanks for your time & effort.

commented: kudos for great effort in your explanation +17

@Fifth Horseman

I have another SQL query similar to the SQL query I asked for on my previous post to kicken.
I do not think it will be fair on pritaeas to work out for me 2 sophisticated SQL queries and so I hope you won't mind me expecting you to show me how this 2nd SQL should be ?

You see, here, I am searching for an SQL that will search for records with EXACT MATCHING keyword per column but RANK the records first based on number of matching keywords found per record and then based on score per record.

Say, I did an EXACT MATCH (no wildcard) keywords/keyphrase search for:

php book usa

Or,

php+book+usa

Now, it should not matter in what arrangements the searched keywords are in BUT ...

  1. how many MATCHING keywords exist in a record. (1st priority).
  2. total score of record of matching keywords' points.

Aslong as any keyword exists in a record, even one matching keyword, that record should be presented to the searcher.
Say my table records look like this:

Table: Links
id | url | kw1 | kw1_point | kw2 | kw2_point | kw3 | kw3_point | kw4 | kw4_point
0 | chickensphpDOTcom | usa | 3 | phone | 3 | apps | 2 | tutorial | 2
1 | mypointDOTcom| uk | 1 | php | 4 | apps | 3 | price | 3
2 | borobhaisabDOTcom | tutorial | 3 | book | 3 | php | 2 | usa | 3
3 | phpDOTcom | PHP’s | 5 | books | 5 | united states america | 5 | prices | 5
4 | pythonDOTcom | python | 5 | book | 5 | apps | 5 | usa | 5

I have highlighted the matching keywords.
NOTE 1:
Notice that the row, id:3, actually got the matching keywords. But since the keywords are not EXACT, then SQL should NOT draw-up this record.

PHP’s is not an exact match of php.
Books is not an exact match for book.
united states america is not an exact match for usa.

Hence, record: id:3, should be IGNORED.
Also notice record id:4. It got php.com. But that is not an exact match for php. Hence, that Mysql cell should be ignored too.

NOTE 2:
Record id: 2 got 3 matches, while record id: 4 got 2.
However, record id:3 scores a total of 10 points while record id:2 scores 8.
And so, even though record id:4 got more points, you can clearly see that, it has less matching keywords than record id:2.
Hence, in this case, record 1d:2 should get better ranking when presented to the keyword searcher.

Because here, in this 2nd SQL I seek, ranking should BE based on how many keywords matched in a record AND after that, ranking should be based on total points of all matching keywords in a record.

So, this is how the SQL should rank things before presenting the records to the keyword searcher:

Result ...
2 | borobhaisabDOTcom | tutorial | 3 | book | 3 | php | 2 | usa | 3 -------8 point
4 | pythonDOTcom | python | 5 | book | 5 | apps | 5 | usa | 5 -------------------- 10 point
1 | mypointDOTcom| uk | 1 | php | 4 | apps | 3 | price | 3 ---------------------- 4 point
0 | chickensphpDOTcom | usa | 3 | phone | 3 | apps | 2 | tutorial | 2 ---------------------- 3 point

The total point section on the right on the above presented result, is for your convenience ONLY to understand things how I want done. Obviously, I do not expect SQL to echo points in that manner. I can write php to do that, if I really want to.

You may have a question, why should SQL present record id: 0 here, since the record is totally irrelevant.

ANSWER: It does not matter if the record is irrelevant or not to the keyword search in such examples. There is an EXACT matching keyword here “usa”. And so, this record counts in this EXACT MATCH query.

So, how to write the SQL ?

Thanks for your time & effort.

Folks,

Reading my two previous posts may confuse you.
You may think I am asking the same question or asking for the same SQL here.
But I am not. If you notice the result … of both posts and glance at the two final tables on both posts then things will become clear.
And so, I asked pritaeas one question and asked fifth horseman a different question. They are similar but not same.
You are welcome to chime in and answer my above two posts if you wish.

Folks,

I know my 2 table structures mentioned on my above posts are UNORTHODOX.
Now, even though you deem my table arrangement is UNORTHODOX , I still wish to learn how to achieve my 2 purposes with SQL that I asked pritaeas and Fifth Horseman above. Just for my learning purpose, to satisfy my CURIOSITY, do you mind handing me the SQLs ? I want to experiment and fiddle. That is all.

Then, after that, I can move onto experimenting & fiddling your WAY of doing things, if you structure the table differently to mine.

This is how I gain experience. Do things UNORTHODOX first and then ORTHODOX.

If you are an adv programmer, then I guess it is very easy for you to draw-up the SQL I asked pritaeas and I know it would be peasy for you to draw-up the SQL I asked Fifth Horseman.

So, instead of replying I should structure my table differently, how-about drawing the 2 SQLs up for me that I am searching for ? And, let me play tonight with these 2 SQLs.
Then tomorrow, I can move-on to your ORTHODOX way of doing things.

@dani

Have you ever tried the 2 SQLs I am searching for ?
And, how would you structure your table if mine is UNORTHODOX ?

I think you should look to the IF function:

SELECT 
    IF(kw_1 = ?, kw_1_point, 0) + IF(kw_2 = ?, kw_2_point, 0) + IF(kw_3 = ?, kw_3_point, 0) + IF(kw_4 = ?, kw_4_point, 0) AS relevance 
FROM keywords
ORDER BY relevance DESC

This is an untested query.

As a sidenote/learning opportunity, check out an old tagging example I wrote.

@pritaeas

Thanks!
I will test it out!

Cheers!

Since kw(n) and kw(n)_point are the same data-types, try union_all on each (n) pair... i.e.

select kw1, kw1_point
union all
kw2, kw2_point
union all
kw3, kw3_point
union all
kw4, kw4_point...

Apply order by , group by, or whatever your dbms has available that aggregates the result you're looking for.

Your syntax may change depending on platform/dbms

Revied: added id column (my bad).

Since kw(n) and kw(n)_point are the same data-types, try union_all on each (n) pair... i.e.

select
id, kw1, kw1_point
union all
id, kw2, kw2_point
union all
id, kw3, kw3_point
union all
id, kw4, kw4_point...

Apply order by , group by, or whatever your dbms has available that aggregates the result you're looking for.

Your syntax may change depending on platform/dbms

I just stumbled upon this thread and I got a little lost trying to read through the 20 posts to figure out what question is being asked and what you're exactly trying to accomplish. Maybe if you could explain to me in just a sentence or two, I can take a quick stab at it?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.