Hello ALL
The project we're working on is a document retrieval system
I have 3 tables as follows. Table 1 contains words from a example doc. Table 2 contains words from documents in a database. Table3 contain only one column ‘DocID’.
I would like to search the database to find those docs similar to the example doc by calculating similarity score between the example doc and each doc in the database. Specifically, the simi score is calculated by adding up the sum of word frequencies of all matched word. Additionally, I am only interested in the those documents whose DocID appear in table3.
Table 1
Word Freq
Book 2
Desk 3
Pen 3
Board 3
Table2
DocID Word Freq
1 Book 3
1 English 2
1 Math 1
2 Desk 2
2 Machine 5
2 Power 2
3 Desk 3
3 Teacher 3
3 Class 2
4 Building 1
4 Tower 2
Table3
DocID
1
2
I am not sure the following statement is correct or not
SELECT
DocID, table1.Freq+ table2.Freq
FROM
table1, table2, table3
WHERE
table1.Word = table2.Word AND table2.DocID = table3.DocID;