Hi, everyone.
I am stuck when trying to solve such a problem, I need your help.
We have a website for online e-book reading that holds around 60,000 books and currently has 5 million registered users, we are using MySQL for storage of all the data.
Now there's requirement from the operating team of the company that they want a small amount of data for each book(a top 20-item list), it sounds easy but not quite so, this small amount of data can't be acquired with simple queries on the database, or it can be but the efficiency will by no means be acceptable. Let me describe the problem in detail: A book in the website can be added to a virtual folder called "book list" owned by a user(a user may have 0-x book lists), a book list can hold as many books as the user adds to it. and we have a web page containing some information(name, author of the book, etc.) for each book.
Let's say we have a book called "A", and this book was added to 3 book lists owned by 3 users, they are BookList1, BookList2, BookList3. it is possible that these 3 book lists hold other books. they may be represented like this: BookList1(A, B, C) , BookList2(A, B, D), BookList3(A, B, D, H, G), and for describing the problem thoroughly, I will add a BookList4(H, G, K) here , each letter represents a book. Now for book A(added to 3 book lists besides BookList4), I would like to make a list that holds books "B, D, C, H, G", let me tell you where this list come from, they are books collected and sorted by their appearing frequency in book lists in descending order from the 3 book lists that hold book A(BookList4 is not counted here cause it does not have book A in it). so, for each book, I would like to make such a top-20(may be less than 20) list.
My question is: how to do this efficiently, I think SQL can not be an option here.
P.S.: a book may be added to 80,000 book lists. and a book list can hold up to 2,000 books.
Thanks for your time for reading such a long post. your ideas will be greatly appreciated.
Thanks in advance.
Kevin Tse