Skip to content

hGriff0n/BookExplorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BookExplorer

Goal

The end goal of this project is to produce an engine that is capable of recommending books to the user that "expand" (and "complement") on their existing library. The primary goal is of expansion: If the user has no books on a topic A, then books on A should be recommended by the engine. A second goal would to mimic the process of innovative discovery, where recommendations are tailored to exist "near" the user's existing library. A tertiary goal would be to find synergistic recommendations: If the user has books on history, and books on mathematics, then a book on the history of mathematics should be recommended. A quaternary goal would be to incorporate broad public opinion into the ranking results, allowing for "discussion books" to enter into the user's recommendations, even if they lay outside their normal consideration.

Ideally this engine should operate as an broadening force on the user's library, making it easier to discover and explore new topics outside of their normal interest areas ("under-explored"). This engine should also be focused on providing positive pressures; above all the tool should enrich the life and reading experiences of the user.

Approach

The process for translating user data will be split into three broad stages: input frontend, ml engine, output frontend. The frontends are responsible for collecting data about the user's current library and transforming it into data that the ml engine can operate on, in a manner that most efficiently utilizes the api which stores the user's library information. The ml engine would then be responsible for reducing and mapping the provided library information unto the "global knowledge graph" and selecting a list of interesting "topics" to consider. These topics are the "recommendations" that the engine considers would be serve the user according to the goals outline above. The output frontend is then responsible for communicating these results to the end user in a way it best sees fit. Any recommendations of specific books would be handled in this layer as the nature of the ml-engine (knowledge graph) makes it ill suited to extract the necessary information.

Constructing the Knowledge Base

The core premise of this project it that it should be possible, in some abstract sense, to calculate how well a user's existing library maps onto the "global knowledge base", the set of all possible book topics - therefore the algorithms for constructing and utilizing this global graph present the key problem for our use case. Broadly, this represents as two discrete steps: building a knowledge graph, producing an ML model to map book descriptions onto the knowledge graph. However with further consideration, these two steps are actually two sides of the same coin as there does not exist a globally accepted knowledge graph, especially in the vein that we require. We are instead forced to construct this graph ourselves from some other source of information, thereby adding the additional step of producing an ML model to map our "source" information into the knowledge graph. From this step, it seems to reason that any model we create for the initial mapping may also work in the context of the secondary mapping, that of the books to the graph, especially if our initial source stands to be some textual format.

The current approach of this project is to utilize Wikipedia to construct this knowledge graph, largely because of the textual nature of the representation, but also based on several existing projects to construct the exact sort of "knowledge graph" that we require for the project (although none of those projects will work for us). The first stage will be to construct a basic "inter-relatedness" graph from the wikipedia pages, using some metric for "inter-relatedness". From this initial graph, it should be possible to perform various clustering algorithms on the source data, thereby producing our first approximation for "topics" that will be necessary for the final product. After sorting the source data into clusters, we can then train a classification model on the source data, using the cluster membership as the expected results for the training data. After the model has been trained, it should then be possible to utilize that model on book description excerpts to produce some reasonable approximation of a book's "place" in the global "knowledge space" - ie. determining what "cluster" that book would belong to, if it were a wikipedia article. From there, the recommendation algorithm could then follow by selecting articles from clusters that have not been "visited", along with other metrics and responses.

Additional refinement possibilities exist by changing the cluster density of our initial modelling attempt - allowing us to possibly capture the idea of "sub-topics" within the model and the engine. At a broad level we consider the topic of history and American history - any article clustered in the later is also a member of the former. This may be possible to represent by producing cluster densities of x and y, with x < y. The lack of available clusters at density x will naturally produce a graph tending towards generality over specificity, neatly encompassing the idea of sub-topics (although it does not ensure that it matches our conception completely). An additional possible improvement is to use an alternate "topic extraction" model on the clustered data to produce some "explainable" conception of why the various articles are clustered together as a single "topic" - basically applying the labels of "History" and "American History" to the clusters (as opposed to 1, 2, ...). This explanation procedure will probably work by collating the clustered documents together and then running the merged document through the extraction algorithm.

About

Recommend books based on under-utilized topical knowledge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages