Hello ALL

The project we're working on is a document retrieval system

I have 3 tables as follows. Table 1 contains words from a example doc. Table 2 contains words from documents in a database. Table3 contain only one column ‘DocID’.

I would like to search the database to find those docs similar to the example doc by calculating similarity score between the example doc and each doc in the database. Specifically, the simi score is calculated by adding up the sum of word frequencies of all matched word. Additionally, I am only interested in the those documents whose DocID appear in table3.

Table 1
Word    Freq
Book      2
Desk      3
Pen        3
Board      3


Table2
DocID   Word    Freq
1         Book     3
1         English    2
1          Math     1
2         Desk       2
2         Machine    5
2         Power      2
3         Desk       3
3         Teacher    3
3         Class       2
4         Building    1
4         Tower     2


Table3
DocID
1
2

I am not sure the following statement is correct or not

SELECT
      DocID, table1.Freq+ table2.Freq
FROM
      table1, table2, table3
WHERE
     table1.Word = table2.Word AND  table2.DocID = table3.DocID;

you'll need to use

Select table3.DocID, sum(table1.Freq + table2.Freq) as Freq
-- your from and where
group by table3.DocID
order by SUM(table1.Freq + table2.Freq) desc

i added the order by for clarity

dickersonka thanks for your solution.

I wonder, in the 'order by' clause, if I can use

order by Freq desc

to replace

order by SUM(table1.Freq + table2.Freq) desc

yes you can, sorry i added in the alias last and forgot to update order by

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.