Here are some patterns to avoid when designing multi-gene constructs. If you want to get straight to it, repeating is prohibited but rhyming is allowed. Meaning, exactly duplicating hundreds of continuous base pairs should be avoided in any orientation. Slight differences in the sequences avoids a lot of pitfalls.
Putting duplicated sequences back to back is generally a bad idea. In this example we put two genes next to each other facing opposite directions. The two 35S promoters have a space of 20bp between them, but otherwise form an inverted repeat.

When we tried to build this plasmid the restriction digests repeatedly came back with incorrect sized bands. At that time, restriction digests were our main QC mechanism for large Level 2 plasmids. This was before Plamsidsaurus offered fast, cheap long nanopore reads. We could Sanger in from both ends, but that only gets us (2x) 800bp. Beyond that we would either (A) accept it as-is without further QC, (B) make bespoke primers to Sanger various points throughout the construct, or © send it off for Illumina sequencing at the MGH DNA Core with a 2 week turnaround time.
In this case I suspected something funky was happening because the restriction digests were always wrong in the same way or very similar ways every time. So we sent it to the MGH DNA Core for Illumina sequencing to hopefully get some insight into why this plasmid wouldn’t assemble correctly.
We mapped the reads to the reference snapgene file for this plasmid using Genious Prime. The blue area is the read coverage across the plasmid. There is an obvious drop in the area with the two back to back 35S promoters. One explanation for this could be that Geneious is failing to map the reads to that area. Keep in mind these are 150 base reads and that promoter is around 450bp. Therefore many reads will not be able to map uniquely to these regions. We did some further bioinformatic analysis to confirm that there were no reads that mapped to the 35S promoter. And we also Sangered across this area with a primer landing in the GFP to confirm this section is indeed gone.

OK, so no back-to-back repeats. This also ties in to my lesson on gene orientation. If we had just designed this plasmid to have both genes facing the same direction this would not have been a problem.
But it gets more complicated. One of our favorite ways to knock down the expression of endogenous soy genes is with hairpin RNAs. We take a portion of the target gene, say around 150bp, combine it with the reverse complement of that sequence and separate the two with the potato IV2 intron, which is around 200bp. In the example shown below we are targeting the Glycinin 1 (GY1) gene.

This strategy never leads to deletions. We now have a lot of data on this thanks to Plasmidsaurus. I don’t think I have ever seen a deletion in the hairpins encoded by any of our plasmids1. Maybe it is the size that matters? The 35S promoters in the first example were 450bp whereas these are 150. My counterpoint to that is we have made some pretty large hairpins where each side is up to 750bp (with the center intron remaining at 200bp) and also never had a problem cloning those designs.
Maybe it is the gap in the middle that saves this design from deletion? One time I tried to make the hairpin with a tetraloop2 instead of the intron and the cloning consistently failed. I didn’t investigate further but maybe there is something to having a gap of a certain size between the two homologous regions?
Regardless, just try to avoid inverted repeats if you can.
In this plasmid I took my own advice and arranged the genes all in the same direction. But I made a new mistake.
have to make small changes to duplicate sequences to prevent deletion.
This paper has some observations on how many differences are needed to avoid recombination3.
PCRing over these regions is definitely an issue. But cloning these sequences in Ecoli and sequencing with Plasmidsaurus has not been a problem. ↩︎
Santini GP, Pakleza C, Cognet JA. DNA tri- and tetra-loops and RNA tetra-loops hairpins fold as elastic biopolymer chains in agreement with PDB coordinates. Nucleic Acids Res. 2003 Feb 1;31(3):1086-96. doi: 10.1093/nar/gkg196. PMID: 12560507; PMCID: PMC149216. ↩︎
Opperman R, Emmanuel E, Levy AA. The effect of sequence divergence on recombination between direct repeats in Arabidopsis. Genetics. 2004 Dec;168(4):2207-15. doi: 10.1534/genetics.104.032896. PMID: 15611187; PMCID: PMC1448723. ↩︎