Omics made easier

Illustration of DNA — Genomics, the study of the genome, or all the DNA in a cell, tissue or organism, has transformed research. We discuss other omics, including proteomics and metabolomics, and how researchers use them to advance knowledge. Stock image courtesy of Getty Images

Omes and omics are proliferating these days: Genomics. Proteomics. Metabolomics. But what are they? What’s the point of ome-ing everything?

An “ome” is a totality. So a genome is all the DNA, including the genes, in a cell or organism. All of our proteins make up our proteome. Omic technologies allow researchers to see a more-complete picture of the ome they’re interested in.

How and why do researchers use omes and omics to advance knowledge? We talked with experts at Fred Hutchinson Cancer Research Center about some of the most often-studied omes. Read on, or jump to a specific omic of interest, to learn more.

Common omics:

Genomics and epigenomics: The cell’s hardware and software
Transcriptomics: Sending a message
Proteomics and Proteogenomics: Doing the work
Metabolomics: Fuel for living
Meta-omics: We are not alone

Genomics and epigenomics: The cell’s hardware and software

DNA is often described as life’s blueprint. Our genes, made up of specific sequences of DNA, encode the proteins that perform most cellular processes. A genome is all the DNA on all the chromosomes. And, for the most part, it’s the same among all the cells in an organism.

But our genome is more than the sum of our genes. In addition to our thousands of genes, which encode proteins, there’s a lot of extra DNA that doesn’t. But it’s not junk — within this non-gene DNA are sequences that perform important functions, including regulating which genes are turned on or off and ensuring that replicated chromosomes properly separate when a cell divides.

Data Science & Tech Series:

How are new technologies and Big Data transforming research? Read more or attend an upcoming Science Says expert roundtable.

Changes in DNA sequences, whether in genes or outside of them, can have important biological consequences — or not. Homing in on a specific gene can be a powerful path toward understanding the biology behind a disease. Sometimes scientists know that changes in DNA likely underlie a health condition but don’t know where the changes lie: In a gene or outside it? Which gene? Genomic technologies allow researchers to step back and study the big picture.

One type of study that employs such technologies to take a wide-angle look at our DNA is a genome-wide association study, or GWAS. This type of study aims to link a trait — increased cancer risk, say — with specific changes to DNA by screening the genomes of thousands of individuals, with and without the trait in question.

But hold up, you say. If the genome is the blueprint, why do we have such a variety of cells in our bodies?

“We're just a little over 20,000 genes,” said Dr. Steven Henikoff, a Hutch molecular biologist and Howard Hughes Medical Institute investigator who studies the structure, function and evolution of our chromosomes. “How they’re regulated to make us what we are, is kind of amazing — and amazingly complicated.”

It’s the coordination of genetic regulation within and between cells that produces our complexity, he said. If every gene in the genome were turned on at once, it would cause cellular pandemonium. Depending on which genes you turn on in a cell, you get different cell behavior, shape and function.

That’s where the epigenome comes in. “Epi” means “on top of,” and the epigenome comprises the changes to the genome that don’t affect its DNA sequence, but do affect how its genes are turned on and off.

“They’re analogous to hardware and software,” Henikoff said. “The genome is the hardware, and the epigenome is the software.”

A cell’s epigenetic software program enables it to perform different functions than other cells.

Originally, the term “epigenome” referred specifically to molecular modifications made to DNA itself, said Henikoff. More recently, the term has expanded to include the DNA packaging proteins and molecular tools called transcription factors that “read” genes as the first step toward making proteins. (Scientists also term the combination of DNA and its associated proteins “chromatin.”)

Alterations in a cell’s hardware or software, or both, can have dramatic biological impacts, as Henikoff’s own work has shown.

One of the proteins of the epigenome is the histone, which forms wagon wheel-shaped complexes around which DNA wraps. Histones, which come in a variety forms, help cells organize their DNA and control access to the genetic information it contains.

About 20 years ago, Henikoff and his team discovered that cells wrap their DNA around a specific histone form when turning a gene on. Then about a decade ago, researchers studying a type of pediatric brainstem tumor discovered that it’s caused by a change to a single DNA letter of the gene encoding this histone. Right now, there are no treatments for this tumor, but Henikoff would like to help change that.

But first, scientists must understand how just a single mutation in one type of histone can cause cancer. Henikoff and his team demonstrated that adding mutant histones throughout the genome, even at low levels, alters how many genes are turned on and predisposes normal cells to transform into cancer.

“For me, it’s been thrilling to be able to get this kind of understanding. Someday it might make a difference in treating cancer,” Henikoff said.

Genomes have been laboriously sequenced since the 1970s, but most of today’s genomic and epigenomic studies rely on newer technologies that first emerged in the mid-1990s and gained steam in the mid-2000s. Microarrays allowed researchers to determine the sequence of DNA at multiple areas of the genome simultaneously. Another method, often termed “next-generation” or “next-gen” sequencing, enables researchers to quickly build a more-complete map of DNA by sequencing a lot of DNA at once (often called “high-throughput”). Next-gen sequencing accelerated DNA mapping, most notably underpinning the Human Genome Project, the first attempt at mapping the human genome.

The techniques that resolve the epigenome aim to pinpoint the location and identity of proteins complexed with DNA, as well as distinguish “open” areas, where genes are more likely to be turned on, from “closed” areas where they’re turned off. One widely used approach is termed chromatin immunoprecipitation, or ChIP. (“Immunoprecipitation” refers to the method of using immune proteins called antibodies to bind specific epigenetic proteins and pull them out of the sample, which also brings along the DNA they’re attached to.)

But ChIP doesn’t build super-detailed epigenome maps. In addition to working to understand how epigenetic factors regulate genes, Henikoff has made a career of developing new techniques to map the epigenome. Most recently, he and his team developed methods called, for short, CUT&RUN and CUT&Tag, which allow scientists to make more-refined epigenomic maps at lower cost.

One expansion of genomics technology is the ability to survey the genomes of individual cells, which gives a clearer picture of the variation within, say, a tumor, and how this may contribute to progression or treatment response. Epigenomic techniques, like Henikoff’s, are also being adapted for such single-cell experiments.

Transcriptomics: Sending a message

Though we only have two copies of each gene, cells can’t function properly without hundreds or thousands of copies of each protein. To make this mathematical leap, life has evolved a process called transcription, in which the information encoded in DNA is copied many times over into mobile messages. This enables the molecular factories that build proteins to work off many sets of instructions — known as transcripts — simultaneously.

Transcripts, made of what’s called messenger RNA, tell researchers which genes are switched on. This, in turn, gives them a clue as to which proteins cells or tissues are trying to use. The transcriptome is the all the transcripts within a cell, tissue or organism.

A view of the transcriptome, rather than a handful of transcripts, can have advantages, explained Dr. Robert Bradley, a Fred Hutch computational biologist who studies RNA processing in cancer and related diseases and holds the McIlwain Family Endowed Chair in Data Science.

“Frequently we don't have just one transcript that accurately reflects the biological state that we're interested in,” he said. “If we're interested in, for example, detecting whether or not a cancer is responding to therapy, we might need to look at a whole host of transcripts in order to get an accurate picture of what the tumor is doing.”

That host of transcripts characteristic to a cell type or cell state is often called a “signature” by scientists and reflects the fact that biological processes arise from collaboration among many different proteins. Identifying which transcripts are present or which have become more or less abundant can help scientists better understand a biological state. A transcriptomic view can also help when researchers know that a specific biological process is changing, but don’t know the key genetic players.

“For example, say we think that there's something going on with the metabolism of this tumor, but we don't know specifically which gene is involved,” Bradley said.

In this case, transcriptomics would help guide scientists toward key genes by highlighting any transcripts of metabolism-related genes that are more or less common in tumors compared to normal tissue.

While transcriptomics can give information that’s related to epigenomics — i.e., which genes are on or off — it also offers a more nuanced understanding. Sometimes a disease state is created by switching genes on or off, but often genes are "tuned" instead: they get transcribed more or less, rather than switched on or off. Transcriptomics sheds light on this kind of genetic tuning.

RNA is also where our cells can add a little complexity. Before the human genome was mapped, researchers had expected humans would have more genes than we do, given our complexity. But transcripts often undergo processing, also called splicing, to produce different forms of a protein which can also work differently. Sometimes, mis-splicing or a lack of RNA splicing contributes to the development or progression of a disease. Bradley studies this phenomenon in cancer and related diseases. In a recent project, Bradley and his collaborators used transcriptomics to linked dysfunctional RNA splicing in a gene called BRD9 to cancer.

Transcriptomics first started building momentum when microchip arrays, which allowed scientists to look at multiple transcripts at once, were developed in the late 1990s. Today, the technique that researchers primarily use to study the transcriptome, called RNA sequencing or RNA-seq, relies on the same next-gen sequencing technology that scientists use to sequence DNA. The most recent advance in transcriptomics, like genomics, is the ability to sequence RNAs from individual cells.

There are pros and cons to sequencing RNAs either in aggregate, or cell-by-cell, Bradley said. Bulk RNA experiments don’t shed light on variation among cells within a tissue, but do provide high-quality data on virtually every kind of molecule of RNA present in a sample.

Single-cell transcriptomics, one kind of single-cell genomics, gives a clearer picture of the variation within a tissue sample, and allows researchers to make powerful statements about the cells within a sample, Bradley noted.

But this strategy provides less information about the transcripts themselves. RNA-seq technologies don’t sequence the entirety of each RNA transcript; instead, they read just enough to detect a transcript’s presence. Single-cell RNA-seq relies on even shorter reads than bulk RNA-seq, so researchers get less information for fewer of the genes turned on in each cell. This means that certain nuances — such as the often-rare splicing errors Bradley studies — are lost.

Researchers are working to overcome this by improving methods that enable them to sequence complete transcripts.

Another challenge for researchers incorporating transcriptomics into their work is the sheer amount of data that this omic, like other omics, produces. A tissue may have tens or even hundreds of thousands of different RNA transcripts, and each of them may have hundreds to thousands of copies. This is multiplied by the number of samples; in single-cell experiments, this can reach the millions.

Computational biologists are trying to go from the initial "unspeakably large amount of data to something that humans can understand and make into an actionable fact, something that we can write in one sentence,” Bradley said.

Say an RNA-seq experiment spits out 20 gigabytes of data. Sophisticated algorithms turn that data into a more-manageable matrix that lists how many copies of each gene were transcribed in each sample. But this matrix may still have 25,000 rows (for each gene turned on) and as many columns as samples in the experiment.

“That matrix is of a size that you can actually open up in a spreadsheet editor, but it's still too big for humans to really do something very meaningful with,” Bradley said. “The big bottleneck right now is the next step, going from that spreadsheet to a prediction that five or ten genes are really important for mediating some phenotype [trait] of interest.”

One principle that Bradley applies when trying to spot meaningful patterns in dizzying datasets is whether the same pattern pops up in consistently, in different types of data and from different sources.

“If we find the same phenomenon, we're going to believe it a lot more,” he said. “So we devote a lot of our effort to finding ways to integrate across different data sources.”

He applied this principle to studying mis-splicing of BRD9 transcripts, comparing hundreds of patient samples from many different types of tumors, searching for changes that arose consistently, suggesting they were worth investigating further.

Proteomics and Proteogenomics: Doing the work

While transcriptomics sheds light on which genes are turned on and how high, much of the cellular action occurs once the information in those transcripts is used to create proteins. Once made, proteins can also undergo modifications that alter their activity. The amount of a protein can change independently from that of its transcripts.

“Just sequencing the DNA or measuring the RNA of a cancer biopsy, for example, doesn't accurately reflect what's happening at the protein level. This is important because the proteins are carrying out the work of the cells and causing the cells’ behavior,” said Dr. Amanda Paulovich, a Hutch cancer geneticist and oncologist who develops proteomics technologies and holds the Aven Foundation Endowed Chair.

Because of this, many treatments, including cancer drugs, target proteins. But this has created a disconnect between the goal of personalized oncology — to tailor treatments to individual tumors — and how well it works in practice.

Though the proteome includes all the proteins in a tissue sample, personalized oncology studies have largely ignored it in favor of focusing on the genome. Many personalized oncology approaches aim to tailor patients’ therapies based on changes in their tumor’s DNA, largely because it’s cheaper and easier to precisely sequence DNA than it is to detect and quantify specific proteins. But that technology gap is beginning to narrow, said Paulovich.

Dr. Amanda Paulovich develops proteomics technologies to improve care for cancer patients. Robert Hood / Fred Hutch News Service

To study the proteome, researchers use mass spectrometry. Mass spectrometers enable scientists to calculate the molecular weight of a sample’s protein components, which they can use to determine their identities.

Like transcriptomics, current proteomics technologies don’t capture everything. Most biospecimens are so complex, containing hundreds of thousands of different forms of proteins, that even a lightning-fast mass spectrometer can’t detect all the protein ions flying past. Separating the samples by certain protein characteristics can help simplify what the mass spectrometer sees, and help improve data resolution, but some proteins still fly under the instrument’s radar.

Still, as with transcriptomics, a broad view can help researchers identify unexpected protein signatures underlying a biological state or response.

Paulovich is also working to integrate information drawn from the genome, transcriptome and proteome, an approach dubbed proteogenomics. Paulovich is a member of the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium, which has helped advance proteogenomic studies and technologies.

“We try to combine the information because each of those data sets has complementary information,” she said. Proteogenomics experiments can also help researchers prioritize which tantalizing findings to follow up on: A change seen in all three types of data is more likely to be shaping tumor biology and less likely to be a red herring.

“Looking at dynamic markers like proteins can help also when we're trying to develop and understand new drugs in cancer,” Paulovich said.

Changes in protein levels or modifications to proteins help scientists understand how a person’s cells respond to a drug, whether the drug hit its target, and whether it has worrisome off-target effects. This kind of information can be invaluable as oncologists try to determine the best dose and dosing schedule for new drugs.

When Paulovich entered the proteomics field 17 years ago, the mass spectrometry technology couldn’t identify and quantify proteins well enough to inform clinical decisions. The strategy she’s shepherded toward the clinic, called multiple reaction monitoring mass spectrometry, or MRM mass spec, makes measuring proteins more efficient and precise.

“We use software interfaces to tell the mass spectrometer to ignore everything in this sample, except for the proteins we’re programming on this list. We want to measure them and devote all the analytical capacity of the instrument to those proteins of high interest, instead of trying to see everything,” she said. “With these targeted mass spec assays, we can get really high specificity, sensitivity and reproducibility for measuring the proteins that we target — and we can readily measure many proteins at once, which is a distinct advantage over conventional approaches.”

Paulovich and her team applied this approach to study the protein Her2, which is more abundant in some breast cancers. Her2-targeting drugs can work well for patients with these types of tumors. But this treatment strategy is complicated by the fact that some patients, whose tumors would have been classified as Her2-negative using an older method of measuring the protein, respond well to a new class of Her2-targeting drugs, called antibody-drug conjugates, which includes drugs like ado-trastuzumab emtansine (Kadcyla or TDM-1) and fam-trastuzumab deruxtecan (Enhertu).

“These new drugs may actually require more sensitive and precise measurements down in the lower Her2 expression range” to classify patients who may benefit, Paulovich said.

The assay she and her team developed to measure Her2 accurately, along with a few other key proteins, is the first CLIA-certified assay to come out of her CLIA lab. CLIA, which stands for Clinical Laboratory Improvement Amendments, sets standards clinical assays must clear before they reach patients. As a CLIA-certified assay, her test can be used in patient care.

Metabolomics: Fuel for living

A change in metabolism accompanies certain disease states, including cancer.

“A metabolite is a small molecule that is either a source or an intermediate in the process of converting nutrients into products,” said Dr. Lucas Sullivan, a Hutch molecular biologist who studies cell metabolism in normal and cancer cells. “That could be in the form of generating energy, or it could be in the form of building biomass like proteins in muscles, or DNA or RNA.”

The metabolome is the entire collection of metabolites in a cell or organism. In some ways, said Sullivan, it’s simpler than the genome. Compared to our tens of thousands of genes, a given tissue sample might have only a few hundred or a few thousand metabolites.

But in other ways, studying the metabolome can be much more complicated.

“The problem with metabolites is that they are used in every cell — it’s rare that a metabolite is specific to one cell,” Sullivan said. “And, a metabolite being elevated or suppressed could mean either that it is very important or very unimportant.”

It’s counterintuitive, but here’s an example: If a metabolite is high, is that because the cell’s making a lot of it for an important cellular process? Or is the metabolite building up because the cell has no use for it?

Dr. Lucas Sullivan studies cellular metabolism and how it changes in cancer. Robert Hood / Fred Hutch News Service

Sometimes, a small change in a metabolite’s level can have a meaningful impact on a patient’s health, while large changes do not. For example, Sullivan noted, a person with diabetes may have a blood glucose level only 50% higher than normal, despite being very sick. Other metabolites, which have nothing to do with diabetes, may vary tenfold. In this case, scientists studying diabetes would be led astray by chasing the biggest change in metabolite levels.

This can make interpreting metabolomic datasets difficult. Sullivan doesn’t recommend them as a first step to guide discovery, but as a resource to refer back to confirm or validate insights made after asking more targeted questions.

Sullivan has used metabolomics this way in his own work. As a postdoc, he was studying mitochondria, cellular compartments best known for making ATP, the molecule that powers most of our cellular processes. Unexpectedly, he’d found that growing cells can get all the ATP they need from another source — so what did they need mitochondria for?

It turned out that growing cells need mitochondria to balance their electrons. Our cells can get rid of unneeded electrons by sticking them on oxygen molecules — a reaction that occurs within mitochondria. Once Sullivan realized electron balance could be playing a role in cell growth, he didn’t need to do a new experiment to look for metabolic changes. Instead, he looked in his metabolomics dataset and found a small change in a particular protein building block, or amino acid, that confirmed his hypothesis.

Though appending the “ome” to the end of metabolite implies that scientists can study all of the metabolites in a single tissue, there are technical challenges that make this more of an ideal than a reality, Sullivan noted.

Like proteins, metabolites are measured using mass spectrometry. However, the ions and masses don’t always tell an easily read story. Two metabolites may have the same mass but very different structures and functions. The amino acids leucine and isoleucine make up one such pair. They’re made up of the same mix of atoms, just put together in two different arrangements. Both are independently needed to make the proteins we need to survive, but unless researchers take steps to separate them beforehand, a mass spectrometer can’t tell them apart.

Mass spectrometers are also very sensitive — which means they can also detect contaminants from the room or from plastic used in the experiment. These ions can difficult or impossible to distinguish from ions derived from important metabolites. Scientists have ways of untangling some of this, but more work needs to be done.

“There’s likely this dark metabolome of tissue-specific or low-abundance molecules that we don't yet know what they are,” Sullivan said. For the future, “one goal and one challenge are to identify which metabolites come from cell metabolism, which ones are new, and which ones are underappreciated.”

The single-cell technologies sweeping transcriptomics and genomics have not yet arrived for the metabolome. Though efforts are underway to measure the metabolome of smaller amounts of material, single cells hold too few metabolites to accurately detect on most current mass spectrometry setups.

Biochemistry textbooks lay out highly detailed enzymatic pathways that suggest the metabolome is mostly mapped out — but that couldn’t be farther from the truth, Sullivan said.

“There’s a lot left to be discovered in how tissues accomplish their metabolic goals by differentially using their metabolic networks,” he said.

Meta-omics of the microbiome: We are not alone

“Just as an individual such as you or I might have led a completely different life if we had had the same genome but had been born, you know, in Bangladesh or another part of the world with a different community, different people around us, we are starting to appreciate that identical bacteria may lead very different lives when embedded in different microbial communities,” said Dr. Neelendu Dey, a Fred Hutch physician-scientist who studies how the microbes in our gut influence our health.

The communities that shape us include the various microbes that live in and on us — our microbiomes. More studies are linking this ome to our health, from obesity to cancer risk, and researchers are applying various omics technologies to better understand this extended self.

Dr. Neelendu Dey studies how interactions between our microbes and diet can influence our health. Robert Hood / Fred Hutch News Service

When an omics term starts with the prefix “meta-,” it means the method analyzes a community, such as the microbes in our microbiome. Metagenomics takes a broad look at the collection of genomes present in a community of microbes, while meta-proteomics examines the community’s proteins, and so forth.

These other omics “start to give us a sense of what the bacteria are doing and what's in the environment, what they’re responding to,” Dey said.

Just as our microbes influence our health — by helping to digest food, release vitamins and nutrients, and interact with our immune systems — we influence them, through our diet and lifestyle.

The sheer number of known and unknown bacterial species presents its own challenges for researchers incorporating meta-omics into their studies. Scientists doing metagenomics work, for example, classify the species within a given sample by comparing their data to reference datasets of microbial genomes, which are still incomplete. Every year scientists expand these databases to cover more species, many newly discovered in human microbiome samples.

Dey studies how diet, the microbiome and gut motility interact to promote colorectal cancer. As a postdoctoral fellow, he examined how interactions between microbes and specific diet ingredients affect gut physiology. His initial metagenomic analysis helped him identify changes in gut physiology with different sets of microbes, but the picture remained incomplete.

“It seemed like we could link diet and microbes together, but not necessarily to what was happening on the physiology side of things,” Dey said.

The answer came when he incorporated metabolomics and was able to identify a class of bacterial metabolites that consistently produced the same effect on gut physiology across different microbiome samples.

The sheer variety of microbial species adds another layer of complexity to already-complex omics analysis. Systems biology, the discipline of modeling complex systems rather than individual components, tackles the problem of putting the pieces together.

Ultimately, scientists hope to guide patient care by integrating a holistic view of our biological systems with a holistic view of our microbiome. But that level of medicine personalized is still a future ideal, Dey noted.

“We don't quite have that ability to, for example, see somebody in the clinic, profile their microbiome and say, ‘All right, so you need to do this and this and this over the next seven days,’” he said. “But I think the future will be integrating all these different omics in an interpretable way.”