Crowdsourcing Science with Bayes Expert

jasmine339
Nov 29, 2022
6 min read

Based on the scientific method and decades of experience, the scientific community has an established and well-respected methodology for discovering and testing medical advances before they are made available to the public. That process could take a long time and is vulnerable to disruption from the tech community. However, because its methodologies are so different from the established, respected methodology, the tech community has had difficulty disrupting the healthcare space. Many of the best AI predictors are not only opaque, but they also do not focus on determining causation in the same way that science’s Randomized Controlled Trials and epidemiological statistics do.

Our software for generating rule-based Bayesian network models, Bayes Expert, on the other hand, works at the level of Randomized Controlled Trials and the scientific theories that underpin them, using a methodology that rigorously connects theories so that they can be crowdsourced in a way in which studies and data may lend support to one another.

We crafted our Longevity Bayesian network model with Bayes Expert. It is the first AI model to be integrated into the Rejuve.AI model framework, and is currently operating in our Longevity app beta. The Longevity Bayesian network model connects certain variables, such as chronic illnesses and aging signs, to other variables to help us better understand them. But, before we go any further, we should define “Bayesian network”.

What is a Bayesian Network?

A Bayesian network is a probabilistic model. It measures the risk of something happening, for example, the risk of a particular “hallmark of aging” causing premature aging.

Bayesian networks are one of the most popular artificial intelligence tools used in medicine because of their ability to model cause and their transparency. They are a kind of probabilistic graphical model, so you can draw them as a tree of nodes and links, with the nodes representing a state (for example, gender) or the occurrence of an event (for example, a hallmark of aging causing premature aging, or receiving a diagnosis of diabetes or hypertension). The links would represent the chance that the state is actual or the event would occur given the others that have already occurred. For example, the chance of coming down with diabetes given history of hypertension, or the possibility of premature shortening of the telomeres, given diabetes history.

Introducing Bayes Expert

Bayes Expert is a premier Rejuve.AI software that makes Bayesian net models.

The Bayesian network models created by the Bayes Expert software, including our Longevity Bayesian network model, run as SingularityNET services. Bayes Expert will empower us to crowdsource science from the community of science. It is a powerful method for scientists to use data in their experiments and the theories of causation behind the data to create models without having to be a data scientist. We will have a form on our website allowing scientists to add their data and theories to the network. We have hand-coded a “seed” network to start the process — a first network which other scientists will improve upon.

Now, all the individual probabilities in our Longevity Bayesian Network are in the medical literature, and Bayesian nets have been used to express medical theories and statistics in the past. However, constructing one often required a team of data scientists and medical scientists to work together. This is where our Bayes Expert model comes in, as it opens up this modeling technique to any scientist to create a model from their data and theories, which can then be combined with other crowdsourced models. This is only possible now because there is no easy way for scientists to create them using the current open-source data science software.

What Makes Our Longevity Bayesian Network Unique?

Our Longevity Bayesian network is just a first seed network that scientists will add to. The first iteration of our seed network contains hand-coded rules for 315 nodes. Of those, 162 are the “Leaves” of the Bayesian network tree that derive their probabilities from over 85,000 NHANES data contributors who took blood tests. We created the remaining 153 nodes from the hand-coded rules: 63 were created from dependency rules, and 90 are logic rules using “and”, “or”, and “average” to map the data across studies.

Figure 1. All the variables that can directly change the “Hallmark 2 Telomere Attrition” Variable in the seed Bayesian network. Here the leaf nodes are “Vitamin D”, “Healthy Diet”, “Micronucleus Frequency”, “Smoking”, “Gender”, “Soy Isoflavones Supplementation” and “Daily Time Sitting”. Their relative risk scores and confidence intervals from the literature tells how the variables are connected.

The relations in the dependency rules come from 62 meta-analyses from reputable medical journals based on 1500 gold-standard Randomized Controlled Trials on 12.5 million subjects. Their priors come from the NHANES data. We are working on a second validated iteration of our seed network that will contain over 200 meta analyses. Each kind of rule in python will be enterable on the Bayes Expert GUI on our website, one for each variable to be explained..

We have two types of rules, one for entering data and the other for entering theories. As an example of the data kind of rule, let’s take an authentic example in our seed net. One variable of the net represents the event that Hallmark 2, Telomere Attrition, will cause premature aging. The rule we encoded creates one node for that event and links into it, representing the effects of smoking and gender on the Hallmark 2 (Telomere Attrition) node.

In the rule, we entered the relative risks and their 95% confidence intervals from two meta-analyses and systematic reviews (the scientifically based summaries of clinical trials, the primary method of medical science).

In other rules, we have entered those same statistics from other meta-analyses and systematic reviews on how the second hallmark affects frailty, cancer, obstructive sleep apnea, and Alzheimer’s disease.

We can calculate the risk of premature aging due to telomere attrition from the nodes that contribute to those nodes using probability laws. Vitamin D levels, a healthy diet, daily sitting time, soy isoflavone supplementation, and micronucleus frequency are all potential contributors to premature aging. All these risks are taken from meta-analyses and systematic reviews.

How do we know how to combine all those risks?…. Well, we do so using the rules of probabilities on the data from the populations that the risks that are applied to. Those populations constrain what the combinations can be.

We use the laws of probability in an algorithm called “quadratic programming” to tell us how well the calculated risks apply to the population. Because we only have a few app users, we start with the NHANES data set, a US government dataset that is proportional to the population in the US.

The validation score that the algorithm supplies tells us which studies fit in, given their confidence intervals, and which ones don’t, guiding the inclusion of studies. This same validation score will crowdsource data from scientists to cull what does not fit in. Studies that fit together “vote for” each other, like the “likes” we see in other crowdsourcing media.

Our other type of rule is a logical rule, including “and”, “or” , “ average”, and “if-then-else”, with which a scientist can express the causal relationships behind data that prompted their clinical studies. These causes are critical to discovering new treatments and will be crowdsourced and culled through the validation scores of the dependency rules they connect to.

Our seed network includes all ten hallmarks, each in its node, but those other than the second hallmark have limited clinical studies. In these cases, we have used logic-based nodes to express single studies that show a risk relationship exists.

Yet, these are also subject to the validation score that tells if they fit in because they relate to the other nodes with more mature relative risk studies. The risk combinations found by the quadratic programming will themselves be subject to validation against the data in the second interaction of our seed net, to be in the released longevity app.

Some of the hallmark relationships in our current network, for example, include:

How blood tests, age, and diet affect genomic instability and how genomic instability affects frailty and cancer.
How liver disorders, sleep habits, folate labs, and frailty affect epigenetic alterations and how epigenetic alterations affect cancer.
How cardiovascular disease affects the loss of proteostasis and how the loss of proteostasis affects Alzheimer’s and Parkinson’s.
How frailty affects deregulated nutrient sensing.
How inflammation affects mitochondrial dysfunction and how mitochondrial dysfunction affects cellular senescence.
How cancer inflammation affects cellular senescence and how cellular senescence affects mitochondrial dysfunction.
How frailty and inflammation affect stem cell exhaustion.
And finally, how cancer and inflammation affect altered intercellular communication.

Bayes Expert on SingularityNET Marketplace

Bayes Expert is currently running in the backend of the SingularityNET marketplace on Testnet, to be published to mainnet when the app is live and the front end of the service is published. This means that Rejuve’s native AI service will have immediate real-life usage, not only running in the Longevity app but being available on the platform for scientists to construct their own statistical models.

We are working on creating an intuitive, comprehensive demo that shows how to use this service, which will play a key role in driving utilization of the platform after the port to Cardano.