Tanya Woyke and Chris Rinke
Manage episode 309942917 series 3042656
Contenuto fornito da Gregory German and KALX 90.7FM - UC Berkeley. Tutti i contenuti dei podcast, inclusi episodi, grafica e descrizioni dei podcast, vengono caricati e forniti direttamente da Gregory German and KALX 90.7FM - UC Berkeley o dal partner della piattaforma podcast. Se ritieni che qualcuno stia utilizzando la tua opera protetta da copyright senza la tua autorizzazione, puoi seguire la procedura descritta qui https://it.player.fm/legal.
Transcript
Speaker 1: Spectrum's. Next.
Speaker 2: N. N. N. N.
Speaker 3: [inaudible].
Speaker 1: Welcome to spectrum the science and technology show on k a l x, [00:00:30] Berkeley, a biweekly 30 minute program, bringing you interviews, featuring bay area scientists and technologists as well as a calendar of local events and news.
Speaker 4: Good afternoon. I'm Rick Karnofsky. Brad swift and I are the hosts of today's show. Today we're talking with doctors, Tonya Wilkie and Chris Rink of the Department of Energy Joint Genome Institute in Walnut Creek. They recently published an article entitled insights into the Phylogeny and coding potential [00:01:00] of microbial dark matter in which they have to characterized through relationships between 201 different genomes and identified some unique genomic features. Tonya and Chris, welcome to spectrum.
Speaker 5: Thanks for having us. Thank you.
Speaker 4: So Tanya, what is microbial dark matter?
Speaker 5: We like to take life as we know it and put it in an evolutionary tree in a tree of life. And what this assists us is to figure out the evolutionary histories of organisms and the relationships between [00:01:30] related groups of organisms. So what does this mean? It's to say we take microbial diversity as we know it on this planet and we place it in this tree of life. What you will find is that there will be some major branches in this tree, about 30 of them, and we call these major branches Fila that are made up of organisms that you can cultivate. So we can grow them on plates in the laboratory, we can grow them in Allen Meyer, flask and liquid media. We can study that for CLG. We can figure out what substrates they metabolize, [00:02:00] we can figure out how they behave under different conditions.
Speaker 5: Many of them we can even genetically modify. So we really know a lot about these organisms and we can really figure out, you know, how do they function, what are the genetic underpinnings that make them function the way they do in the laboratory and also in the environment where they come from. So now coming back to this tree of life, if you keep looking at this tree of life, uh, we will find at least another 30 off these major branches that we refer to as [00:02:30] Canada. Dot. Sila and these branches have no cultivators, representatives, so all the organisms that make up these branches, we have not yet been able to cultivate in the laboratory. We call these kind of dot, Fila or microbial dark matter. And the term dark matter. All biological dark matter has been coined by the Steve Craig Laboratory at Stanford University when they published the first genomes after a candidate, phylum TM seven. We know that dark matter is in most if not all [00:03:00] ecosystems. So we find it in most ecosystems, but to get at their complete genetic makeup. That's the key challenge.
Speaker 4: Yeah. And if you, if you want to push it through the extreme, there are studies out there estimating the number of bacteria species they are and how many we can cultivate. And the result is all there. The estimation of the studies we can cultivate about, you know, one or 2% of all the microbial species out there. So basically nine to 9% is still out there and we haven't even looked at it. So this really, this major on culture microbes and majority is [00:03:30] still waiting out there to be explored. So that sort of carries on the analogy to cosmological dark matter in which there's much more of it than what we actually see and understand. Right.
Speaker 5: So how common and how prevalent are, are these dark matter organisms? Yeah, that's a really good question. So in some environments they are what we would consider the rabbi biosphere. So they are actually at fairly low abundance, but our methods are sensitive enough to still pick them up. [00:04:00] In other environments. We had some sediment samples where some of these candidate file, our, actually what we would consider quite abandoned, it's a few percent, let's say 2% of opiate candidate phylum that to us, even 2% is quite abandoned. Again, you have to consider the whole community. And if one member is a 2%, that's, that's a pretty dominant community members. So I'd arise from environment, environment
Speaker 4: and Chris, where were samples collected from? So altogether we sampled nine sampling sites all over the globe [00:04:30] and we tried to be as inclusive as possible. So we had marine samples, freshwater samples, sediment samples, um, some samples from habitats with very high temperatures and also a sample from a bioreactor. And there were a few samples among them that for which we had really great hopes. And among them were um, samples from the hot vans from the bottom of Pacific Ocean. The samples we got were from the East Pacific virus sampling side, and that's about 2,500 meters below the store phase. And [00:05:00] the sample there, you really need a submersible that's a small submarine and you can launch from a research vessel. In our case, those samples were taken by Elvin from the woods hole oceanographic institution and now you have a lot of full Canik activity and also the seawater seeps into the earth crust goes pretty deep and gets heated up.
Speaker 4: And when it comes back out as a hydrothermal event, it has up to [inaudible] hundred 50 to 400 degrees Celsius. And it is enriched in chemicals such as a sulfur or iron. [00:05:30] It makes us immediately with the surrounding seawater, which is only about a two degrees Celsius. So it's a very, it's a very challenging environment because you have this gradient from two degrees to like 400 degrees within a few centimeters and you have those chemicals that uh, the organisms, the micro organisms could use blast. There is no sunlight. So we thought that's a very interesting habitat to look for. Microbial, dark matter. There were several samples. That's a to us. One of them is the Homestake [00:06:00] mine in South Dakota and that's an old gold mine that is not used anymore since 2002 but are there still scientific experiments going on there? It's a very deep mine, about 8,000 feet deep and we could all sample from about 300 feet.
Speaker 4: And we were surprised about this Ikea diversity we found in those samples. There were a few Akia that were not close to any, I don't know another key out there for some of them. We even had to propose new archaeal Fila. Stepping back a bit, Chris, [00:06:30] can you tell us more about Ikea and perhaps the three domains of life? The three domains were really established by Culver's with his landmark paper in 1977 and what he proposed was a new group of Derek here. So then he had all together three domains. You had the bacteria and archaea and the eukaryotes, the eukaryote state. There are different one big differences to have the nucleus, right? They have to DNA in the nucleus and it also includes all the higher taxa. But then you have also their key and the bacteria. [00:07:00] And those are two groups that only single cell organisms, but they are very distant related to each other, the cell envelope, all. And also the cell duplication machinery of the archaea is closer to the eukaryotes than it is to the bacteria.
Speaker 5: Yeah, and it's interesting, I mean Ikea, I guess we haven't sequenced some that much yet, but Ikea are very important too, but people are not aware of them. They know about bacteria, but Ikea and maybe because there aren't any RKO pathogen [00:07:30] and we'd like to think about bacteria with regards to human health, it's very important. That's why most of what we sequence are actually pathogens, human pathogens. So we sequence, I don't know how many strains of your senior pastors and other pathogenic bacteria, but archaea are equally important, at least in the environment. But because we rarely find them associated with humans, we don't really think about archaea much. Our people aren't really aware of Ikea.
Speaker 4: Talk about their importance,
Speaker 5: the importance [00:08:00] in the environment. So Ikea are, for example, found in extreme environments. We find them in Hydro Soma environments. We find them in hot springs. Uh, we, they have, they have biotechnological importance and not a lot of, quite useful in enzymes that are being used in biotechnology are derived from Ikea in part because we find them in these extreme environments and hot environments and they have the machinery to deal with this temperature. So they have enzymes that function [00:08:30] properly at high temperature and extreme conditions, really extreme on the commerce extreme or fields. And that makes them very attractive bio technologically because some of these enzymes that we would like to use should be still more tolerant or should have these features that are sort of more extreme. Um, so we can explain it them for a biotech technological applications. [inaudible]
Speaker 6: [inaudible] [00:09:00] you are listening to spectrum on k l x Berkeley. I'm Rick [inaudible] and I'm talking with Kanya vulgate and Chris, her and Kate about using single cell genomics. You're expand our knowledge that the tree of life,
Speaker 5: [00:09:30] so again, we called up a range of different collaborators and they were all willing to go back to these interesting sites, even to the hydrothermal vent and get us fresh sample. No one turned us down. So we, we, we screened them again to make sure they are really of the nature that we would like to have them and the ones that were suitable. We then fed into our single cell workflow. Can you talk briefly about that screening? There were two screens in waft. One screen was narrowing down the samples themselves and we received a lot more sample, I would say at least [00:10:00] three times as many sample as we ended up using. And we pre-screened these on a sort of barcode sequencing level. And so we down selected them to about a third. And then within this third we sorted about 9,000 single cells and within these 9,000 single cells, only a subset of them went through successful single cell, whole genome amplification. And out of that set then we were only, we were able to identify another subset. And [00:10:30] in the end we selected 200 for sequencing 201
Speaker 4: and how does single cell sequencing work?
Speaker 5: So to give you a high level overview, you take a single cell directly from the environment, you isolate it, and there's different methodologies to do that. And then you break it open, you expose the genetic material within the cell, the genome, and then you amplify the genome. And some single cells will only have one copy of that genome. And we have a methodology, it's a whole genome amplification process that's called multiple displacement amplification [00:11:00] or MDA. And that allows us to make from one copy of the genome, millions and billions of copies. One copy of the genome corresponds to a few family or grams of DNA. We can do much with it. So we have to multiply, we have to make these millions and billions of copies of the genome to have sufficient DNA for next generation sequencing.
Speaker 4: Are there other extreme environments that you guys didn't take advantage of in this study that might be promising? Definitely. Um, so we, [00:11:30] we created the list already off environments that would be interesting to us based on, you know, on the results from the last start in the experience we have with environmental conditions and the is microbes we've got out of it. So we're definitely planning to have a followup study where we explore all those, um, habitats that we couldn't include in this, uh, study.
Speaker 5: So some examples of the Red Sea and some fjords in Norway and their various that were after
Speaker 4: the, that the Black Sea is a very interesting environment too. It's, it's completely anoxic, high levels of sulfide [00:12:00] and it's, it's really, it's huge. So that's a very interesting place to sample too. And how historically have we come to this tree in the old days? And I mean the, the, the pre sequencing area, um, the main criteria that scientists use to categorize organisms whilst the phenotype. That's the, the morphology, the biochemical properties, the development. And that was used to put, uh, organisms into categories. And then with the dawn of the sequencing area, and that was [00:12:30] mainly, um, pushed by the Sanger sequencing, the development of the Sanger sequencing in the 70s. We finally had another and we could use and that was the DNA sequence of organisms. And that was used to classify and categorize organisms. Does a phenotyping still play a role in modern phylogeny? It still does play a role in modern philosophy in the, especially for eukaryotes.
Speaker 4: Well you have a very significant phenotype. So what you do there is you can compare a phenotyping information with the [00:13:00] genomic information and on top of that even, uh, information from all the ontology and you try to combine all the information you have doing for, let's say, for the evolutionary relationships among those organisms in modern times, the phylogeny of bacteria, Nokia, it's mainly based on molecular data. Part of our results were used to infer phylogenetic relationships into the started. The evolutionary history of those microbes. We'll be, well do you have for the first time is we now have chine [00:13:30] ohms for a lot of those branches of the tree where before we only had some barcodes so we knew they were there, but we had no information about the genomic content and they'll seem to be hafted for the first time. We can actually look at the evolutionary history of those microbes and there were two, two main findings in our paper.
Speaker 4: One was that for a few groups, the f the placement that taxonomic placement in the tree of life was kind of debated in the past. We could help to clarify that. For example, one group is they clock chemo needs [00:14:00] and it was previously published. It could be part of the farm of the spiral kids, but we could Cully show with our analysis that they are their own major branch entry of laughter or their own file them and a a second result. That's, I think it's very important that that's because they didn't share a lot of jeans with others. Bifurcates is that, that's, that's right. So if you placed him in a tree of life, you can see that the don't cluster close parakeets, they'll come out on the other side by out by themselves, not much resembling if the spark is there. And the second result was [00:14:30] that, uh, we found several of those main branches of the tree of life, those Fila the class of together consistently in our analysis.
Speaker 4: And so we could group them together and assign super filer to them. One example is a sweet book, Zero Fila Debra Opa 11 or the one and Chino too, and also almost clustered together. So we proposed a super final name. Potesky and Potesky means I'm bear or simple. And we choose that because they have a reduced and streamlined genome. That's another common feature. [00:15:00] I'm Andrea and I, I have to say that, you know, looking into evolutionary relationships, it is, it is a moving target because as Tanya mentioned, especially for microbes and bacteria and like here, there's still so many, um, candidates that are out there for which we have no genomic information. So we definitely need way more sequences, um, to get a better idea of the evolutionary relationships of all the books. Your Nokia out there
Speaker 6: [00:15:30] spectrum is a public affairs show about science on k a l x Berkeley. Our guests today are Tanya. Okay. And Chris Rink k you single cell genomics to find the relationships between hundreds of dark matter of microbes.
Speaker 4: And can you speak to the current throughput? I would have thought that gathering up organisms in such extreme environments was really the time limiting factor. [00:16:00] But I suppose if you have this archive, other steps might end up taking a while. I will say the most time consuming step is really to to sort those single cells and then to lyse the single cells and amplify the genome and then of course to screen them for the, for genomes of interest for microbial like metagenomes [inaudible] that was a big part of the study. So actually getting the genomic information out of the single cells and if that can be even more streamlined than uh, and push to a higher or even more stupid level, I think [00:16:30] that will speed up the recovery of, of novel microbial dogmatic genomes quite a bit.
Speaker 5: Well, we have a pretty sophisticated pipeline now at the JGI where we can do this at a fairly high throughput, but as Chris said, it still takes time and every sample is different. Every sample behaves different depending on what the properties of the samples are. You may have to be treated in a certain way to make it most successful for this application and other staff in the whole process that takes a long time is the key. The quality control [00:17:00] of the data. So the data is not as pretty as a sequencing data from an isolet genome where you get a perfect genome back and the sequence data that you get back is fairly, even the coverage covered all around the genome. Single cell data is messy. The amplification process introduces these artifacts and issues. It can introduce some error because you're making copies of a genome.
Speaker 5: So errors can happen. You can also introduce what we call comeric rearrangement. That means that pieces of DNA [00:17:30] go together that shouldn't go together. Again, that happens during the amplification process. It's just the nature of the process. And on top of that, parts of the genome amplify nicely and other parts not so nice. So the overall sort of what we call sequence coverage is very uneven. So the data is difficult to deal with. We have specific assembly pipelines that we do. We do a sort of a digital normalization of the data before we even deal with the data, so it's not as nice. And then on top of that you can have contamination. So the whole process is very [00:18:00] prone to contamination. Imagine you only have one copy of a single cell, five Phantogram, one circle of DNA and any little piece of DNA that you have in that prep that sometimes as we know comes with the reagents.
Speaker 5: Because reagents are not designed to deal with such low template molecules. They will call amplify, they will out-compete or compete with your template. So what you end up with in your sequence is your target and other stuff that was in was in the reagents or again, in your prep. We have very rigorous [00:18:30] process of cleaning everything. We you read a lot of things we sterilize, so we need to get rid of any DNA to not, um, to, to have a good quality genome in the end. And so that said, we have developed tools and pipelines at our institute now that specifically help us detect contamination. Sometimes it's not easy to detect it and then remove it. We want to make sure that the single cell genomes that we released at as single cell genome ABC are really ABC and not a plus x and [00:19:00] B plus k because accidentally something came along and contaminated the prep. And especially with candidate Fila, it's, it's fairly difficult to detect tech contamination because what would help us would be if we would have referenced genomes, we're actually generating this reference genome so we don't have a good reference to say, yeah, this is actually, that's our target organism and the rest is public contamination, so it's very tricky.
Speaker 4: Are there other examples for [00:19:30] single cell sequencing being used on this many organisms
Speaker 5: on this many organisms? No, not that I'm aware of. I know there's an effort underway and the h and p, the human microbiome project where they also identified there, they nicely call it the most wanted list, so they have the target organisms that are quite abundant in different microbiomes within the human body associated with the human body and they've been very successfully able to cultivate. A lot of them bring a lot of them in culture [00:20:00] and it may be easier for the h and p because we can mimic the conditions within the body a little bit better and more controlled. We know our body temperature and we know sort of what the middle year is in the different parts of our body. So it's a little bit easier to bring these organisms and culture than going to the hydrothermal vent and try and recreate these conditions which are extremely difficult to recreate. So that said, um, there are some that they are now targeting with single cell sequencing. So that's another large effort [00:20:30] that I know of that's specifically using single cell genomics to get at some of these reference genomes.
Speaker 4: Can you get more out of this then? Sort of phylogenetic links? We found a few unique genomic features and one on one dimension is we found a recode. It's stopped caught on in, in two of those, a bacteria from the hot vans I mentioned earlier. And to give you a little bit of background, so, um, it's, we know the genetic information of each sale is and coded in its DNA, but in order to [00:21:00] make use of this genomic information, this genetic information has to be translated into proteins. And then proteins that could be enzymes that are employed in the metabolism to keep the cell going. And a dispensation is pretty universal between the three domains of life. The way it works, we have three basis in your DNA and three basis are called the core done. And each call is translated in the one amino acid.
Speaker 4: So this way you'll build a chain of amino acids and then this chain is for a folder [00:21:30] and then you have your ready made protein. This call them triplet. This three basis also work for start and stop. So there are certain colons that tell the cell, okay, that's where you start a protein. And another called in to tell us the cell. So that's, that's where you enter prod and you're done with it. There are some slight variations, but in general does a universally called, is perceived between all three domains of life. And what we found was very interesting in two of those bacteria from the hot vans. Ah, those two caecilian bacteria, we found the [00:22:00] recording. So one of the accord on did not called for a stop code on anymore, but in the quarter's for an amino acid in that case, glycine. And that has never been seen before. Were you surprised by these results?
Speaker 5: To us, they were surprising because they were unique and they were different. On the other hand, I have to say I'm not that surprised because we haven't, like Russ said, we haven't looked at heart yet and considering that we can only cultivate a few percent of all the microbial diversity that exists on this planet as far as, [00:22:30] as far as we know it, it's not that surprising that you find these novel functions and there's these unique features and novel genetic codes because it's really, it's a highly under-explored area.
Speaker 4: It is very rewarding. But if you look in the future, um, how much is still out of the sequence? Of course we're interested in that. So we looked at all the files show diversity that's known, that's out there based on this, um, biomarkers that Tony mentioned earlier and we just compared it to the genomes that we have sequenced so far. And we really want [00:23:00] to know, so if you want to cover let's say about 50% of all the fall diversity that's out there, how many achievements do we still have to sequence and the number of the estimate was we need to sequence at least 16,004 more genomes
Speaker 5: and this is a moving target. So this is as we know, diversity of today it and every day we sample my environments, we sequence them deeper and everyday our diversity estimates increase. So what we've done with these 201 it's the tip of the iceberg but it's a start.
Speaker 4: [00:23:30] Well Tanya and Chris, thanks for joining us. Thanks for having us. Thanks for having us. Yeah.
Speaker 6: [inaudible] that's what shows are archived on iTunes to you. We've queued a simple link for you. The link is tiny, url.com/calex
Speaker 7: spectrum
Speaker 8: irregular feature of spectrum is a calendar [00:24:00] of some of the science and technology related events happening in the bay area over the next two weeks. Here's Brad swift and Renee Rao here today. Majority tomorrow. Expanding technological inclusion, technological inclusion is not an issue for some of us. It is an issue for all of us. Mitchell Kapore, co-chair of [inaudible] center for social impact and a partner at Kapore capital. We'll moderate a panel discussion among the following [00:24:30] presenters, Jennifer r Guayle, executive director of Latino to Kimberly Bryant, founder of Black Girls Code Connie Mack Keebler, a venture capitalist with the collaborative fund. Vivek Wadhwa academic researcher, writer and entrepreneur here today. Majority tomorrow is free and open to everyone on a first come first seated basis. This is happening on the UC Berkeley campus in Soutar de Di Hall [inaudible] [00:25:00] Auditorium Monday October 7th at 4:00 PM
Speaker 7: the second installment of the six part public lecture series, not on the test. The pleasure and uses of mathematics will be held this October 9th Dr. Keith Devlin will deliver a lecture on underlying mathematics in video games. Dr Devlin will show how casual video games that provide representation of mathematics enabled children and adults to learn basic mathematics by playing in the same way people [00:25:30] learn music by learning to play the piano. Professor Devlin is a mathematician at Stanford, a Co founder and president of Inner Tube Games and the math guy of NPR. The lecture will be held on October 9th at 7:00 PM in the Berkeley City College Auditorium located at 2050 Center street in Berkeley. The event is free and open to the public.
Speaker 8: The Leonardo arts science evening rendezvous or laser is a lecture series with rotating barrier venues. October 9th there will be a laser [00:26:00] at UC Berkeley. Presenters include Zan Gill, a former NASA scientists, Jennifer Parker of UC Santa Cruz, Cheryl Leonard, a composer, Wayne Vitali, founding member of gamelons Sakara [inaudible]. This is Wednesday, October 9th from 6:30 PM to 9:00 PM on the UC Berkeley campus in barrels hall room 100
Speaker 7: how can we prevent information technology [00:26:30] from destroying the middle class? Jaron Lanier, is it computer scientists, Kim Poser, visual artist and author. October 14th linear will present his ideas on the impact of information technology on his two most recent books are title. You are not a gadget and who owns the future. The seminar will be held in Sue Taja, Dai Hall, but not auditorium on the UC Berkeley campus. Monday, October 14th from 11:00 AM to noon [00:27:00] and that with some science news headlines. Here's the Renee, the intergovernmental panel on climate change released part of its assessment report. Five last Friday. The more than 200 lead authors on their report included Lawrence Berkeley National Labs, Michael Warner and William Collins who had a chapters on longterm climate change productions and climate models. The report reinforces previous conclusions that over the next century, the continents will warm [00:27:30] with more hot extremes and fewer cold extremes. Precipitation patterns around the world will also continue changing. One-Arm Collins noted that climate models since the last report in 2007 have improved significantly as both data collection and mechanistic knowledge have grown using these models. Scientists made several projections of different scenarios for the best, worst and middling cases of continued greenhouse emissions.
Speaker 7: [00:28:00] Two recent accomplishments by commercial space programs are notable. Orbital Sciences launched their sickness spacecraft on September 18th a top the company's rocket and Tara's from wallops island, Virginia. On September 28th the Cygnus dock did the international space station for the first time, a space x rocket carrying and Canadian satellite has launched from the California coast in a demonstration flight of a new Falcon rocket. The next generation. Rocket boasts [00:28:30] upgraded engines designed to improve performance and carry heavier payloads. The rocket is carrying a satellite dead kiss IOP, a project of the Canadian Space Agency and other partners. Once in orbit it will track space weather.
Speaker 2: Mm mm mm. Mm Huh.
Speaker 7: The music [00:29:00] heard during the show was written and produced by Alex Simon. Yeah.
Speaker 3: Thank you for listening to spectrum. If you have comments about the show, please send them to us via email. Address is [inaudible] dot [inaudible] dot com
Speaker 9: [inaudible].
Hosted on Acast. See acast.com/privacy for more information.
78 episodi