1 00:00:07,280 --> 00:00:12,279 Hi, my name is Jennifer Doudna from UC Berkeley, and I'm here today to tell you about how we 2 00:00:12,279 --> 00:00:15,279 uncovered a new genome engineering technology. 3 00:00:15,279 --> 00:00:20,280 This story starts with a bacterial immune system. 4 00:00:20,280 --> 00:00:25,280 That means understanding how bacteria fight off a viral infection. 5 00:00:25,280 --> 00:00:31,280 It turns out that a lot of bacteria have in their chromosome, which is what you're looking 6 00:00:31,280 --> 00:00:39,780 we're looking at here, a sequence of repeats, shown in these black diamonds, that are interspaced 7 00:00:39,780 --> 00:00:44,219 with sequences that are derived from viruses. 8 00:00:44,219 --> 00:00:50,100 And these had been noticed by microbiologists who were sequencing bacterial genomes, but 9 00:00:50,100 --> 00:00:57,500 nobody knew what the function of these sequences might be until it was noticed that they tend 10 00:00:57,500 --> 00:01:07,799 also occur with a series of genes that often encode proteins that have homology to enzymes 11 00:01:07,799 --> 00:01:11,459 that do interesting things like DNA repair. 12 00:01:11,459 --> 00:01:17,799 So it was a hypothesis that this system, which came to be called CRISPR, which is an acronym 13 00:01:17,799 --> 00:01:24,480 for this type of repetitive locus, that these CRISPR systems could actually be an acquired 14 00:01:24,480 --> 00:01:31,019 immune system in bacteria that might allow sequences to be integrated from viruses, and 15 00:01:31,019 --> 00:01:37,319 then somehow used later to protect the cell from an infection with that same virus. 16 00:01:37,319 --> 00:01:43,219 So, this was an interesting hypothesis, and we got involved in studying this in the mid-2000s, 17 00:01:43,260 --> 00:01:49,379 right after the publication of three papers that pointed out the incorporation of viral 18 00:01:49,379 --> 00:01:52,260 sequences into these genomic loci. 19 00:01:52,260 --> 00:01:58,359 And so what emerged over the next several years was that, in fact, these CRISPR systems 20 00:01:58,359 --> 00:02:01,659 really are acquired immune systems in bacteria. 21 00:02:01,819 --> 00:02:09,860 So, until this point, no one knew that bacteria could actually have a way to adapt to viruses 22 00:02:09,860 --> 00:02:11,099 that get into the cell. 23 00:02:11,180 --> 00:02:12,500 But this is a way that they do it. 24 00:02:12,919 --> 00:02:18,319 And it involves detecting foreign DNA that gets injected, like shown in this example, 25 00:02:18,319 --> 00:02:20,479 from a virus that gets into the cell, 26 00:02:21,180 --> 00:02:24,139 the CRISPR system allows 27 00:02:24,139 --> 00:02:27,419 integration of short pieces 28 00:02:27,419 --> 00:02:29,659 of those viral DNA molecules 29 00:02:29,659 --> 00:02:31,379 into the CRISPR locus. 30 00:02:31,379 --> 00:02:34,120 And then in the second step 31 00:02:34,120 --> 00:02:38,800 that's shown here as CRISPR RNA biogenesis, 32 00:02:39,300 --> 00:02:40,819 these CRISPR sequences 33 00:02:40,819 --> 00:02:43,819 are actually transcribed in the cell 34 00:02:43,819 --> 00:02:45,580 into pieces of RNA 35 00:02:45,580 --> 00:02:52,580 that are subsequently used together with proteins encoded by the Cas genes, 36 00:02:52,580 --> 00:02:58,580 these CRISPR-associated genes, to form interfering or interference complexes 37 00:02:58,580 --> 00:03:03,580 that can use the information in the form of these RNA molecules 38 00:03:03,580 --> 00:03:07,580 to base pair with matching sequences in viral DNA. 39 00:03:07,580 --> 00:03:13,580 So, a very nifty way that bacteria have come up with to take their invaders 40 00:03:13,580 --> 00:03:17,580 and turn the sequence information against them. 41 00:03:17,580 --> 00:03:23,580 So, in my own laboratory, we have been very interested for a long time 42 00:03:23,580 --> 00:03:32,580 in understanding how RNA molecules are used to help cells to figure out how to regulate 43 00:03:32,580 --> 00:03:35,580 the expression of proteins from the genome. 44 00:03:35,580 --> 00:03:39,580 And so this seemed like also a very interesting example of this. 45 00:03:39,580 --> 00:03:45,580 And so we started studying the basic molecular mechanisms by which this pathway operates. 46 00:03:45,580 --> 00:03:53,580 And in 2011, I went to a scientific conference and I met a colleague of mine, Emmanuelle Charpentier, 47 00:03:53,580 --> 00:03:57,580 who is shown in this picture on the far left. 48 00:03:57,580 --> 00:04:03,580 And Emmanuelle's lab works on microbiology problems, and they're particularly interested 49 00:04:03,580 --> 00:04:06,580 in bacteria that are human pathogens. 50 00:04:06,580 --> 00:04:12,159 She was studying an organism called Streptococcus pyogenes, which is a bacterium that can cause 51 00:04:12,159 --> 00:04:15,099 very severe infections in humans. 52 00:04:15,099 --> 00:04:19,939 And what was curious in this bug was that it has a CRISPR system, and in that organism 53 00:04:19,939 --> 00:04:25,720 there was a single gene encoding a protein known as Cas9 that had been shown genetically 54 00:04:25,720 --> 00:04:31,639 to be required for function of the CRISPR system in Streptococcus pyogenes. 55 00:04:31,639 --> 00:04:35,560 But nobody knew at the time what the function of that protein was. 56 00:04:35,560 --> 00:04:42,000 And so we got together and recruited people from our respective research labs to start 57 00:04:42,000 --> 00:04:47,800 testing the function of Cas9, and so the key people in the project are shown here in the 58 00:04:47,800 --> 00:04:48,800 photograph. 59 00:04:48,800 --> 00:04:54,379 In the center is Martin Yinek, who was a postdoctoral associate in my own lab, and next to him in 60 00:04:54,379 --> 00:04:59,139 the blue shirt is Krzysztof Czajlinski, who was a student in Emanuel's lab. 61 00:04:59,139 --> 00:05:03,680 And so these two guys, together with Inez Fanfara, who's on the far right, a postdoc 62 00:05:03,680 --> 00:05:11,560 with Immanuel began doing experiments across the Atlantic and sharing their data. 63 00:05:11,560 --> 00:05:17,459 And what they figured out was that Cas9 is actually a fascinating protein that has the 64 00:05:17,459 --> 00:05:24,240 ability to interact with DNA and generate a double-stranded break in DNA at sequences 65 00:05:24,240 --> 00:05:29,959 that match the sequence in a guide RNA, and in this slide what you're seeing is the guide 66 00:05:29,959 --> 00:05:35,959 the guide RNA and the sequence of the guide in orange that base pairs with one strand 67 00:05:35,959 --> 00:05:43,980 of the double helical DNA, and, very importantly, this RNA interacts with a second RNA molecule 68 00:05:43,980 --> 00:05:48,980 called tracer that forms a structure that recruits the Cas9 protein. 69 00:05:48,980 --> 00:05:55,980 So, those two RNAs and the single protein in nature are what are required for this protein 70 00:05:55,980 --> 00:06:02,379 recognize what would normally be viral DNAs in the cell, and the protein is able to cut 71 00:06:02,379 --> 00:06:07,019 these up, literally, by breaking up the double helical DNA. 72 00:06:07,019 --> 00:06:13,000 And so, when we figured this out, we thought, wouldn't it be amazing if we could actually 73 00:06:13,000 --> 00:06:18,540 generate a simpler system than nature has done by linking together these two RNA molecules 74 00:06:18,540 --> 00:06:23,240 to generate a system that would be a single protein and a single guiding RNA. 75 00:06:23,240 --> 00:06:31,240 And so the idea was to basically take these two RNAs that you see on the far side of the slide 76 00:06:31,240 --> 00:06:37,240 and then basically link them together to create what we call a single guide RNA. 77 00:06:37,240 --> 00:06:44,240 And so Martin Jinek in the lab made that construct and we did an experiment, 78 00:06:44,240 --> 00:06:50,240 a very simple experiment, to test whether we truly had a programmable DNA cleaving enzyme. 79 00:06:50,240 --> 00:06:56,579 And the idea was to generate short, single guide RNAs that recognize different sites 80 00:06:56,579 --> 00:07:00,920 in a DNA molecule, this circular DNA molecule that you see here. 81 00:07:01,420 --> 00:07:08,000 And the guide RNAs were designed to recognize the sequences shown by the red bars in the 82 00:07:08,000 --> 00:07:08,339 slide. 83 00:07:08,980 --> 00:07:15,639 And the experiment was then to take that plasmid, that circular DNA molecule, and incubate it 84 00:07:15,639 --> 00:07:19,259 with two different restriction or cutting enzymes. 85 00:07:19,259 --> 00:07:27,259 one called SAL1, which cuts the DNA sort of upstream at the far end of the DNA in this picture, 86 00:07:27,259 --> 00:07:35,259 in the gray box, and the second site being directed by the RNA-guided Cas9 at these different sites, 87 00:07:35,259 --> 00:07:37,259 shown in red. 88 00:07:37,259 --> 00:07:43,259 And a very simple experiment, we did this incubation reaction with plasmid DNA, 89 00:07:43,259 --> 00:07:45,259 and this is the result. 90 00:07:45,259 --> 00:07:50,879 This is... what you're looking at is an agarose gel that allows us to separate the cleaved 91 00:07:50,879 --> 00:07:55,939 molecules of DNA, and what you can see is that in each of these reaction lanes we get 92 00:07:55,939 --> 00:08:02,639 a different sized DNA molecule released from this doubly digested plasmid that... in which 93 00:08:02,639 --> 00:08:08,660 the size of the DNA corresponds to cleavage at the different sites directed by these guide 94 00:08:08,660 --> 00:08:11,319 RNA sequences indicated in red. 95 00:08:11,319 --> 00:08:14,139 So this was a really exciting moment, actually. 96 00:08:14,139 --> 00:08:19,139 So, it was a very simple experiment that was kind of an a-ha moment when we said we really 97 00:08:19,139 --> 00:08:26,160 have a programmable DNA-cutting enzyme and we can program it with a short piece of RNA 98 00:08:26,160 --> 00:08:30,160 to cleave essentially any double-stranded DNA sequence. 99 00:08:30,160 --> 00:08:36,159 So, the reason we were so excited about an enzyme that could be programmed to generate 100 00:08:36,159 --> 00:08:43,159 double-stranded DNA breaks at any sequence is because there was a long-standing set of 101 00:08:43,159 --> 00:08:49,799 in the scientific community that showed that cells have ways of repairing double-stranded 102 00:08:49,799 --> 00:08:56,940 DNA breaks that lead to changes in the genomic information in the DNA. 103 00:08:56,940 --> 00:09:01,620 And these... so this is a slide that just shows that after a double-stranded break is 104 00:09:01,620 --> 00:09:08,120 generated by any kind of enzyme that might do this, including the Cas9 system, those 105 00:09:08,120 --> 00:09:15,120 So, these two-stranded breaks in a cell are detected and repaired by two types of pathways, 106 00:09:15,120 --> 00:09:24,139 one on the left-hand side that involves non-homologous end-joining, in which the ends of the DNA are 107 00:09:24,139 --> 00:09:30,139 chemically ligated back together, usually with the introduction of a small insertion 108 00:09:30,139 --> 00:09:35,139 or deletion at the site of the break, and then on the right-hand side is another way 109 00:09:35,139 --> 00:09:44,139 a homology-directed repair in which a donor DNA molecule that has sequences that match 110 00:09:44,139 --> 00:09:51,159 those flanking the site of the double-stranded break can be integrated into the genome at 111 00:09:51,159 --> 00:09:56,159 the site of the break to introduce new genetic information to the genome. 112 00:09:56,159 --> 00:10:03,159 And so this had given many scientists the idea that if there were a tool or a technology 113 00:10:03,159 --> 00:10:06,500 that allowed scientists or researchers 114 00:10:06,500 --> 00:10:08,799 to introduce double-stranded breaks 115 00:10:08,799 --> 00:10:12,299 at targeted sites in the DNA of a cell, 116 00:10:12,299 --> 00:10:15,399 then together with all of the genome sequencing data 117 00:10:15,399 --> 00:10:16,539 that are now available, 118 00:10:16,539 --> 00:10:19,879 where we know the whole genetic sequence in a cell, 119 00:10:19,879 --> 00:10:22,480 and if you knew where a mutation occurred 120 00:10:22,480 --> 00:10:24,440 that causes a disease, for example, 121 00:10:24,440 --> 00:10:28,259 you could actually use a technology like this 122 00:10:28,259 --> 00:10:32,259 to introduce DNA that would fix a mutation 123 00:10:32,259 --> 00:10:36,259 or generate a mutation that you might like to study in a research setting. 124 00:10:36,259 --> 00:10:43,259 So, the power of this technology is really the idea that we can now generate these types 125 00:10:43,279 --> 00:10:49,279 of double-stranded breaks at sites that we choose, as scientists, by programming Cas9, 126 00:10:49,279 --> 00:10:55,279 and then allow the cell to make repairs that introduce genomic changes at the sites of 127 00:10:55,279 --> 00:10:56,279 these breaks. 128 00:10:56,279 --> 00:11:00,279 But, the challenge was how to generate the breaks in the first place. 129 00:11:00,279 --> 00:11:07,279 And so a number of different strategies had been produced for doing this in different labs. 130 00:11:07,279 --> 00:11:12,299 Most of them, and I'm going to show two specific examples here, 131 00:11:12,299 --> 00:11:16,299 one called zinc finger nucleases and the other TAL effector domains, 132 00:11:16,299 --> 00:11:21,299 these are both programmable ways to generate double-stranded breaks in DNA 133 00:11:21,299 --> 00:11:26,299 that rely on protein-based recognition of DNA sequences. 134 00:11:26,299 --> 00:11:31,779 So, these are proteins that are modular and can be generated in different combinations 135 00:11:31,779 --> 00:11:38,679 of modules to recognize different DNA sequences, requiring... so, it works as a technology, 136 00:11:38,679 --> 00:11:42,860 but it requires a lot of protein engineering to do so. 137 00:11:42,860 --> 00:11:49,960 And what's really exciting about this CRISPR Cas9 enzyme is that it's an RNA-programmed 138 00:11:49,960 --> 00:11:50,960 protein. 139 00:11:50,960 --> 00:11:57,960 So, a single protein can be used for any site of DNA where we would like to generate a break 140 00:11:57,960 --> 00:12:01,980 by simply changing the sequence of the guide RNA associated with Cas9. 141 00:12:01,980 --> 00:12:08,980 So, instead of relying on protein-based recognition of DNA, we're relying on RNA-based recognition 142 00:12:08,980 --> 00:12:10,980 of DNA, as shown at the bottom. 143 00:12:10,980 --> 00:12:17,980 And so what this means is that it's just a system that is simple enough to use that anybody 144 00:12:17,980 --> 00:12:20,980 with basic molecular biology training 145 00:12:20,980 --> 00:12:22,980 can take advantage of this system 146 00:12:22,980 --> 00:12:24,980 to do genome engineering. 147 00:12:24,980 --> 00:12:26,980 And so, this is a tool, then, 148 00:12:26,980 --> 00:12:30,000 that really, I think, fills out 149 00:12:30,000 --> 00:12:33,000 an essential, previously missing component 150 00:12:33,000 --> 00:12:36,000 of what we could call biology's IT toolbox, 151 00:12:36,000 --> 00:12:39,000 that includes not only the ability to sequence DNA 152 00:12:39,000 --> 00:12:41,000 and look at its structure, 153 00:12:41,000 --> 00:12:44,000 we know about the double helix since the 1950s, 154 00:12:44,000 --> 00:12:46,000 and then, in the last few decades, 155 00:12:46,000 --> 00:12:49,000 it's also possible to use enzymes like restriction enzymes 156 00:12:49,000 --> 00:12:51,000 and the polymerase chain reaction 157 00:12:51,000 --> 00:12:55,000 to isolate and amplify particular segments of DNA. 158 00:12:55,000 --> 00:12:57,019 And now, with Cas9, 159 00:12:57,019 --> 00:13:01,019 we have a technology that enables facile genome engineering 160 00:13:01,019 --> 00:13:04,019 that is, you know, available to labs around the world 161 00:13:04,019 --> 00:13:07,019 for experiments that they might want to do. 162 00:13:07,019 --> 00:13:10,019 And so this is a summary of this... 163 00:13:10,019 --> 00:13:12,019 of the technology. 164 00:13:12,019 --> 00:13:14,019 It's a two-component system. 165 00:13:14,019 --> 00:13:16,259 RNA-DNA base pairing for recognition, 166 00:13:16,259 --> 00:13:18,240 and very importantly, 167 00:13:18,419 --> 00:13:20,639 because of the way that this system works, 168 00:13:20,720 --> 00:13:22,299 it's actually quite straightforward 169 00:13:22,299 --> 00:13:25,500 to do something called multiplexing, 170 00:13:25,539 --> 00:13:27,639 which means we can program Cas9 171 00:13:27,639 --> 00:13:29,639 with multiple different guide RNAs 172 00:13:29,639 --> 00:13:30,460 in the same cell 173 00:13:30,460 --> 00:13:32,279 to generate multiple breaks 174 00:13:32,279 --> 00:13:34,320 and do things like cut out 175 00:13:34,320 --> 00:13:35,919 large segments of a chromosome 176 00:13:35,919 --> 00:13:37,559 and simply delete them 177 00:13:37,559 --> 00:13:39,940 in one experiment. 178 00:13:40,679 --> 00:13:43,259 And so this has led to a real explosion 179 00:13:43,259 --> 00:13:46,440 in the field of biology and genetics, 180 00:13:46,440 --> 00:13:49,000 with many labs around the world 181 00:13:49,000 --> 00:13:50,679 adopting this technology 182 00:13:50,679 --> 00:13:52,379 for all sorts of very interesting 183 00:13:52,379 --> 00:13:54,139 and creative kinds of applications. 184 00:13:54,299 --> 00:13:56,220 And this is a slide that's actually 185 00:13:56,220 --> 00:13:57,120 almost out of date now, 186 00:13:57,200 --> 00:13:58,399 but just to give you a sense 187 00:13:58,399 --> 00:14:01,039 of the way that the field 188 00:14:01,039 --> 00:14:02,059 has really taken off. 189 00:14:02,139 --> 00:14:04,299 So, we published our original work 190 00:14:04,299 --> 00:14:06,399 on Cas9 in 2012, 191 00:14:07,299 --> 00:14:08,299 and up until that point 192 00:14:08,299 --> 00:14:09,519 there was very little research 193 00:14:09,519 --> 00:14:12,059 going on on CRISPR biology anywhere. 194 00:14:12,059 --> 00:14:17,559 It's a very small field, and then you can see that starting in 2013 and extending until 195 00:14:17,559 --> 00:14:23,559 now, there's just been this incredible explosion in publications from labs that are using this 196 00:14:23,559 --> 00:14:25,320 as a genome engineering technology. 197 00:14:25,320 --> 00:14:31,100 So it's been really very exciting for me as a basic scientist to see what started as a 198 00:14:31,100 --> 00:14:36,500 fundamental research project turn into a technology that turns out to be very enabling for all 199 00:14:36,500 --> 00:14:38,899 sorts of exciting experiments. 200 00:14:38,899 --> 00:14:45,200 And I just wanted to close by sharing with you a few things that are going on using this 201 00:14:45,200 --> 00:14:46,200 technology. 202 00:14:46,200 --> 00:14:51,539 So, of course, on the left-hand side, lots of basic biology that can be done now with 203 00:14:51,539 --> 00:14:57,000 the engineering of model organisms and different kinds of cell lines that are cultured in the 204 00:14:57,000 --> 00:15:03,460 laboratory to study the behavior of cells, but also in biotechnology, being able to do... 205 00:15:03,460 --> 00:15:08,700 to make targeted changes in plants and various kinds of fungi that could be very useful for 206 00:15:08,700 --> 00:15:13,899 different sorts of industrial applications, and then of course in biomedicine with lots 207 00:15:13,899 --> 00:15:21,399 of interest in the potential to use this technology as a tool for, you know, really coming up 208 00:15:21,399 --> 00:15:26,620 with novel therapies for human disease I think is something that's very exciting and is really 209 00:15:26,620 --> 00:15:29,179 something that's on the horizon already. 210 00:15:29,179 --> 00:15:34,639 And then this slide just really indicates where I think we're going to see this going 211 00:15:34,639 --> 00:15:40,779 in the future with a lot of interesting and creative kinds of directions that are coming 212 00:15:40,779 --> 00:15:46,000 along in different labs, both in academic research laboratories, but also increasingly 213 00:15:46,000 --> 00:15:53,019 in commercial labs that are going to enable the use of this technology for all sorts of 214 00:15:53,019 --> 00:15:57,379 applications that we, many of which we couldn't have even imagined even two years ago. 215 00:15:57,379 --> 00:15:59,120 So very exciting. 216 00:15:59,120 --> 00:16:06,059 And I want to just acknowledge a great team of people that have been involved in working 217 00:16:06,059 --> 00:16:10,879 on the project with me, and we've had the, you know, terrific financial support from 218 00:16:10,879 --> 00:16:14,960 various groups as well, and it's been a pleasure to share this with you. 219 00:16:14,960 --> 00:16:15,460 Thank you.