Scientists have known for a long time that human genes spring into action by instructions provided in the exact order of our DNA, guided by the four different types of individual ties, or “bases,” coded A, C , G and T.
Almost 25% of our genes are commonly understood to be transcribed by TATAAA-like sequences, called the “TATA box.” How the other three-quarters are turned on or promoted,The vast number of DNA base sequence possibilities that kept the activation details secret remained a mystery.
Now researchers at the University of California San Diego have, with the aid of artificial intelligence, discovered a DNA activation code that is used at least as often as the human TATA box. Their discovering,
In biotechnology and biomedical applications, which they called the downstream core promoter region (DPR), could eventually be used to control gene activation. The details are described in the journal Nature, September 9.
“DPR detection shows a crucial phase in triggering roughly a quarter to a third of our genes,” James T said.
Kadonaga, a distinguished professor in the Biological Sciences Division at UC San Diego and the senior author of the study. “The DPR has been an enigma — it has been controversial whether or not it still exists in humans. Luckily, by using machine learning, we have been able to solve the puzzle.”
In 1996, a new gene activation sequence was discovered by Kadonaga and his peers working in fruit flies,In the absence of the TATA box, it is called the DPE (which corresponds to a part of the DPR) that allows genes to be turned on. Then, in 1997, in humans, they found a single DPE-like sequence. Since that time however, it has been difficult to decipher the specifics and incidence of the human DPE.More strikingly, in the tens of thousands of human genes, there were only two or three active DPE-like sequences found. Kadonaga collaborated with lead author and post-doctoral scholar Long Vo ngoc, Cassidy Yunjing Huang, Jack Cassidy, a retired computer scientist who helped the team harness the powerful resources of artificial intelligence, and Claudia Medrano, to crack this case after more than 20 years.
The researchers made a pool of 500,000 random copies of DNA sequences and analyzed each one’s DPR operation in what Kadonaga defines as “fairly significant computation” brought to bear on a biological issue. 200,000 versions were used from there to build a model of machine learning that could reliably predict the behavior of DPR in human DNA.
The findings were “absurdly good,” as Kadonaga puts them. So good, in fact, that they developed a related machine learning model as a new way to classify TATA box sequences.They tested the latest models with thousands of test cases in which the findings of the TATA box and DPR were already known and found that, according to Kadonaga, the predictive capacity was “incredible.”
These findings specifically demonstrated the DPR motif ‘s presence in human genes. In comparison, it indicates that the degree of incidence of the DPR is similar to that of the TATA box. Moreover, an interesting duality between the DPR and TATA was found.DPR sequences are missing in genes which are triggered with TATA box sequences, and vice versa.
Kadonaga claims it was easy to locate the six bases in the TATA box sequence. Cracking the coding for DPR at 19 bases has become even more difficult.
“It was not possible to locate the DPR because it has no visible sequence structure,” Kadonaga said. “In the DNA chain, there is secret information encrypted that makes it an active part in DPR. The machine learning model can decode the message, but we humans can not.”
In the future, the more use of artificial intelligence to study DNA sequence patterns could expand the capacity of researchers to understand and monitor the activation of genes in human cells. It is possible that this expertise will be useful in biotechnology and biomedical sciences, Kadonaga said.
“It is likely that similar artificial intelligence methods would be useful for studying other significant DNA sequence motifs in the same way that machine learning allowed us to classify the DPR,” said Kadonaga. “A lot of unanswered stuff could be explainable now.”
This research was sponsored in National Institutes of Health by the National Institution of General Medical Sciences (NIGMS).
Source of Story: University of California-San Diego offered supplies. Initial composed by Mario Aguilera. Note: For style and length, material can be edited.