Friday, September 16, 2016

Spit Takes (And It Gives a Little, Too)




One of the breakthroughs of modern science is the mapping of the human genome. We now know a great deal about how our genes--what genes cause what traits, make us more susceptible diseases, and even what parts of the world our recent ancestors came from.

23andMe is the eminent personal genetic service. Since its founding in 2006, the company has grown to become the go-to company if you want to have your genome sequenced. 23andMe is perhaps most well-known for the health information it provides: If a user has a genetic variant that has been proven in scientific studies to be associated with developing, say, Tay-Sachs disease, then 23andMe will notify them. The same goes for other diseases and conditions like sickle cell anemia, lupus, and lactose intolerance. It can be useful for people who want to stay healthy and get the jump on preventing what they me genetically predisposed to. However, much of the health information 23andMe provides has been limited by the FDA, because Uncle Sam won't let anyone have any fun.

Another interesting aspect of 23andMe is its Ancestry Composition, which basically tells you how much of your DNA comes from certain population groups around the world. The company does this by comparing your DNA to that of people in 31 "reference populations"--that is, people whose families have supposedly lived in the same region for the past half millennium or so. It's only an estimate, but it's widely considered accurate. Here are my results, and I think they're pretty accurate considering what I know about my own family history:

(Once I found out 2/3 of my genome comes from African populations and 1/3 from European ones, the oreo jokes lobbed at me in middle school took on a whole new meaning.)

I was interested in how they do this, so I did some digging and found that (of course) a lot of it involves computer science.

So, how does it work? Well, first, 23andMe sends the customer a kit, and then the customer provides a saliva sample and sends it to one of the 23andMe labs in California or North Carolina. At the lab, the scientists extract DNA from the cells in the spit, determining exactly which base pairs (A, C, T, or G) the customer has along each of their 23 chromosomes. This raw data can be downloaded and viewed in the terminal, however as a layman I find it totally useless without it being interpreted.

23andMe interprets genetic code for its Ancestry Composition using a modified version of the (apparently) famed computer program, BEAGLE, which it calls "Finch." First, 23andMe compares the customer's DNA with that of "a set of 10,418 people with known ancestry, from within 23andMe and from public sources." This is called "phasing."

After phasing, 23andMe uses something called a support vector machine (SVM for short) to determine which chunk of your DNA most likely originated in which population group. For example, the SVM clearly recognized that most of my DNA is most common among people living in Sub-Saharan Africa, so it gives an estimate of that amount.

Next, the 23andMe scientists move on to "smoothing," "calibration," and "aggregation," a holy trinity of fancy words that essentially means they're just checking and re-checking their work. And then, voila! The user receives an email that their reports are ready and can then view them online.

23andMe could definitely use some improvement, such as larger African and Asian reference populations. Still, I marvel that something like this is available to us. It's practically magic.

Sources:

No comments:

Post a Comment