T Cell Receptor Sequencing Analysis Report

 

1. Project Summary

 

2. Workflow Overview

 

2.1 Experiment Workflow

The incredible diversity of B and T lymphocyte receptors is the key to the adaptive immune system [1]. To leverage high-throughput sequencing for the characterization of T cell receptor repertoire, total RNA is extracted from the isolated T lymphocytes and reverse transcribed to cDNA. The complete 5’ end of TCR genes containing the variable regions are amplified by oligonucleotide primers against constant TCR constant regions (Figure 1.1). The resulted fragments were used for next-generation sequencing library preparation and subjected to high-throughput sequencing.



Figure 1.1 T cell receptor repertoire sequencing workflow

 

2.2 Bioinformatics Analysis Workflow

The raw sequencing data in fastq format were subjected to quality filtering [2]. Sequences that passed the quality filtering were mapped against IMGT database to find the best germline V(D)J gene matches [3]. CDR sequences were then further characterized and analyzed.



Figure 2.1 Bioinformatics analysis workflow

 

3. Data Quality Analysis

For TCR analysis, only reads that cover CDR3 regions were analyzed. Raw fastq files were first subject to quality assessment (Figure 3.1) [4]. Bases with poor quality scores (Q<20) were removed using Trimmomatic (v0.30). Trimmed data were also subject to quality assessment (Figure 3.2) [4]. Statistics of data processing is summarized in Table 3.1 (raw reads) and Table 3.2 (clean reads).



GQ01-E120-TRA_R1_raw_per_base_quality.png
  • GQ01-E120-TRA_R1_raw_per_base_quality.png
  • GQ01-E120-TRB_R1_raw_per_base_quality.png
  • GQ02-E120-TRA_R1_raw_per_base_quality.png
  • GQ02-E120-TRB_R1_raw_per_base_quality.png
  • GQ03-E120-TRA_R1_raw_per_base_quality.png
  • GQ03-E120-TRB_R1_raw_per_base_quality.png
  • GQ04-E120-TRA_R1_raw_per_base_quality.png
  • GQ04-E120-TRB_R1_raw_per_base_quality.png
  • GQ05-E120-TRA_R1_raw_per_base_quality.png
  • GQ05-E120-TRB_R1_raw_per_base_quality.png
  • GQ06-E120-TRA_R1_raw_per_base_quality.png
  • GQ06-E120-TRB_R1_raw_per_base_quality.png
  • GQ07-E120-TRA_R1_raw_per_base_quality.png
  • GQ07-E120-TRB_R1_raw_per_base_quality.png
  • GQ08-E120-TRA_R1_raw_per_base_quality.png
  • GQ08-E120-TRB_R1_raw_per_base_quality.png


Figure 3.1 Sequence quality across all bases on raw reads. Y axis: Phred Quality Scores.

GQ01-E120-TRA_clean_per_base_quality.png
  • GQ01-E120-TRA_clean_per_base_quality.png
  • GQ01-E120-TRB_clean_per_base_quality.png
  • GQ02-E120-TRA_clean_per_base_quality.png
  • GQ02-E120-TRB_clean_per_base_quality.png
  • GQ03-E120-TRA_clean_per_base_quality.png
  • GQ03-E120-TRB_clean_per_base_quality.png
  • GQ04-E120-TRA_clean_per_base_quality.png
  • GQ04-E120-TRB_clean_per_base_quality.png
  • GQ05-E120-TRA_clean_per_base_quality.png
  • GQ05-E120-TRB_clean_per_base_quality.png
  • GQ06-E120-TRA_clean_per_base_quality.png
  • GQ06-E120-TRB_clean_per_base_quality.png
  • GQ07-E120-TRA_clean_per_base_quality.png
  • GQ07-E120-TRB_clean_per_base_quality.png
  • GQ08-E120-TRA_clean_per_base_quality.png
  • GQ08-E120-TRB_clean_per_base_quality.png


Figure 3.2 Sequence quality across all bases on trimmed reads. Y axis: Phred Quality Scores.

Table 3.1 Raw sequencing data quality statistics

SampleTotal_readQ20_readQ20(%)Q30_readQ30(%)
GQ01-E120-TRA25,00024,98499.9420,96983.88
GQ01-E120-TRB25,00024,97699.9020,57482.30
GQ02-E120-TRA25,00024,98399.9321,46285.85
GQ02-E120-TRB25,00024,96499.8620,91683.66
GQ03-E120-TRA25,00024,98999.9620,95683.82
GQ03-E120-TRB25,00024,97999.9221,27985.12
GQ04-E120-TRA25,00024,98099.9221,62786.51
GQ04-E120-TRB27,50027,47899.9223,57285.72
GQ05-E120-TRA2,5002,49899.922,10584.20
GQ05-E120-TRB27,50027,47399.9023,44685.26
GQ06-E120-TRA25,00024,97699.9021,34885.39
GQ06-E120-TRB27,50027,46899.8823,39085.05
GQ07-E120-TRA27,50027,48699.9523,44385.25
GQ07-E120-TRB2,5002,49899.921,98879.52
GQ08-E120-TRA27,50027,47899.9223,37084.98
GQ08-E120-TRB25,00024,97599.9020,86283.45


Table 3.2 Trimmed sequencing data quality statistics

SampleTotal_readQ20_readQ20(%)Q30_readQ30(%)
GQ01-E120-TRA22,81122,811100.0022,811100.00
GQ01-E120-TRB22,86922,869100.0022,868100.00
GQ02-E120-TRA23,04023,040100.0023,039100.00
GQ02-E120-TRB22,65722,657100.0022,657100.00
GQ03-E120-TRA21,58621,586100.0021,586100.00
GQ03-E120-TRB21,99621,996100.0021,996100.00
GQ04-E120-TRA21,87921,879100.0021,879100.00
GQ04-E120-TRB23,97923,979100.0023,979100.00
GQ05-E120-TRA2,2982,298100.002,298100.00
GQ05-E120-TRB24,26324,263100.0024,263100.00
GQ06-E120-TRA22,91722,917100.0022,917100.00
GQ06-E120-TRB23,79823,798100.0023,798100.00
GQ07-E120-TRA24,03524,035100.0024,035100.00
GQ07-E120-TRB2,2722,272100.002,272100.00
GQ08-E120-TRA23,71923,719100.0023,718100.00
GQ08-E120-TRB22,62322,623100.0022,623100.00
 

4. TCR CDR3 Analysis

 

4.1 Sequence Alignment

The assembled reads were blasted against IMGT database to identify the best match of germline V(D)J genes [5]. Since TCR CDR3 region contains information of all V, D and J gene usage, this report focuses on CDR3 analysis. The alignment results are shown in Tables 4.1.1 and Table 4.1.2. The complete output for all the samples are in the 'TCR_mapping' directory and can be accessed using the link below:

CDR3 alignment results   CDR3 alignment results




Table 4.1.1 CDR3 alignment result for TCR alpha

FreqCountCDR3ntCDR3aaV_geneJ_geneV_end_pos J_start_pos V_end_del J_start_del
1.5179208TGTGCTGTTCTTAATGCTGGTGGTACTAGCTATGGAAAGCTGACATTTCAVLNAGGTSYGKLTFTRAV21*01TRAJ52*018 11 4 1
1.1749161TGCCCTAGGAGAGCACTTACTTTTCPRRALTFTRAV26-2*01TRAJ5*013 6 12 11
0.401455TGTGCTGTGATGGATAGCAACTATCAGTTAATCTGGCAVMDSNYQLIWTRAV1-2*01TRAJ33*0110 10 4 0
0.386853TGTGCTGTAAGCAGAGGCTCAACCCTGGGGAGGCTATACTTTCAVSRGSTLGRLYFTRAV36/DV7*01TRAJ18*018 11 5 4
0.34347TGTGCTGTGCAGGACCTATTAACCAGTGGCTCTAGGTTGACCTTTCAVQDLLTSGSRLTFTRAV20*01TRAJ58*0113 20 0 7
0.248134TGTGCAGCCCCCAATGCTGGTGGTACTAGCTATGGAAAGCTGACATTTCAAPNAGGTSYGKLTFTRAV13-1*01TRAJ52*018 12 5 2
0.226231TGTGCTTATAGGAGCAGCAGAGATGACAAGATCATCTTTCAYRSSRDDKIIFTRAV38-2/DV8*01TRAJ30*0115 17 1 4
0.218930TGTGCTGTGATGGATAGCAGCTATAAATTGATCTTCCAVMDSSYKLIFTRAV1-2*01TRAJ12*0110 8 4 1
0.218930TGTGCTGTGAGAGATAGCAACTATCAGTTAATCTGGCAVRDSNYQLIWTRAV1-2*01TRAJ33*0114 12 0 2
0.189726TGTGCTGCCATGGATAGCAACTATCAGTTAATCTGGCAAMDSNYQLIWTRAV1-2*01TRAJ33*017 10 7 0
0.175124TGTGCTTATAGACGTGGCTCTAGGTTGACCTTTCAYRRGSRLTFTRAV38-2/DV8*01TRAJ58*0111 13 5 12
0.167823TGTGCTGTCATGGATAGCAACTATCAGTTAATCTGGCAVMDSNYQLIWTRAV1-2*01TRAJ33*018 10 6 0
0.153321TGCCTCGTGGGTGGGGCGATGGGAGGTGCTGACGGACTCACCTTTCLVGGAMGGADGLTFTRAV4*01TRAJ45*0113 21 3 11
0.131418TGTGAAAATCGGCGGGGCAACACAGGCAAACTAATCTTTCENRRGNTGKLIFTRAV13-2*01TRAJ37*014 15 9 7
0.124117TGTGCCGTGGAGAGGATGGATAGCAGCTATAAATTGATCTTCCAVERMDSSYKLIFTRAV12-2*01TRAJ12*019 13 4 0
0.124117TGTGCAGAGAATTCCCCTACCTCAGGAACCTACAAATACATCTTTCAENSPTSGTYKYIFTRAV13-2*01TRAJ40*0112 16 1 1
0.116816TGTGCTCTGATCGCCCAGGCAGGAACTGCTCTGATCTTTCALIAQAGTALIFTRAV9-2*01TRAJ15*0110 14 4 4
0.116816TGTGCCGTGGAGAGGATGGATAGCAGCTATAAATTGATCTTCCAVERMDSSYKLIFTRAV12-2*01TRAJ12*019 13 4 0
0.109515TGTGCTGTGCCGCCGGTATCAGGAGGAAGCTACATACCTACATTTCAVPPVSGGSYIPTFTRAV20*01TRAJ6*0110 17 3 3
0.102214TGTGCTGTGAAGGATAGCAACTATCAGTTAATCTGGCAVKDSNYQLIWTRAV1-2*01TRAJ33*0110 11 4 1

Shown is the partial result of one sample (GQ01-E120).Freq: the frequency (percentage) of this clone in the entire population (eg. 1.5 = 1.5%). Count: read count of the clone. CDR3nt: CDR3 nucleotide sequence. CDR3aa: CDR3 amino acid sequence. V_gene: best aligned germline v gene. D_gene: best aligned germline d gene. J_gene: best aligned germline j gene. V_end_pos: the ending position of v gene in the CDR3 nucleotide sequence. D_start_pos: the starting position of d gene in the CDR3 nucleotide sequence. D_end_pos: the edning position of d gene in the CDR3 nucleotide sequence. J_start_pos: the starting position of j gene in the CDR3 nucleotide sequence. V_end_del: the number of nucleotide deleted from 3' end of v gene. D_start_del: the number of nucleotide deleted from 5' end of d gene. D_end_del: the number of nucleotide deleted from 3' end of d gene. J_start_del: the number of nucleotide deleted from 5' end of j gene. "-1": gene sequence not defined.



Table 4.1.2 CDR3 alignment result for TCR beta

FreqCountCDR3ntCDR3aaV_geneD_geneJ_geneV_end_posD_start_posD_end_posJ_start_posV_end_delD_start_delD_end_delJ_start_del
0.77794TGTGCCAGCAGTTTATCAAGACAGGGTAGGGGGGGTACGCAGTATTTTCASSLSRQGRGGTQYFTRBV27*01TRBD1*01TRBJ2-3*01172733350518
0.512562TGTGCCAGCAGTCGACAGAGAGGGGGCAATCAGCCCCAGCATTTTCASSRQRGGNQPQHFTRBV12-3*01TRBD1*01TRBJ1-5*01121318255252
0.504261TGTGCCAGCAGTTTAGACGCGACAACAGATACGCAGTATTTTCASSLDATTDTQYFTRBV28*01.TRBJ2-3*0115-1-1242-1-13
0.347242TGTGCCAGCAGTTTGAGAGGGGGGGGACTCCGGGAGCAGTACTTCCASSLRGGGLREQYFTRBV12-3*01TRBD2*01TRBJ2-7*01141926333907
0.322439TGTGCCAGCAGTTTTCTCGACGGAGCGGCAGGAGTGGATACGCAGTATTTTCASSFLDGAAGVDTQYFTRBV7-8*01TRBD2*01TRBJ2-3*01142328363656
0.322439TGCGCCAGCAGCCTTCGGACAGGACCAAAGCAGTACTTCCASSLRTGPKQYFTRBV5-1*01TRBD1*01TRBJ2-7*01121623284148
0.272833TGCGCCAGCAGCCAATGGACAGGAATCAATCAGCCCCAGCATTTTCASSQWTGINQPQHFTRBV5-1*01TRBD1*01TRBJ1-5*01121623264143
0.214926TGTGCCACTACGTTGCAGGGGGTGGATGGGGCCAACGTCCTGACTTTCCATTLQGVDGANVLTFTRBV6-5*01TRBD1*01TRBJ2-6*01715222610413
0.206625TGCGCCAGCAGCCAAGATCTAACAGTCGAAAACATTCAGTACTTCCASSQDLTVENIQYFTRBV4-3*01.TRBJ2-4*0117-1-1280-1-15
0.165320TGTGCCAGCAGCGCGAAAAACTATGGCTACACCTTCCASSAKNYGYTFTRBV27*01.TRBJ1-2*0111-1-1186-1-12
0.157119TGCAGTGCCTTCATGGTGGGGGATGAGCAGTTCTTCCSAFMVGDEQFFTRBV20-1*01TRBD1*01TRBJ2-1*0181722226618
0.157119TGTGCCAGCAGTGAAGTGAGCGGCTCCGGAGGAGATACGCAGTATTTTCASSEVSGSGGDTQYFTRBV6-1*01TRBD2*01TRBJ2-3*01161823321655
0.148818TGCGCCAGCAGCTTGAATTTTCTGTCCGGGAACCCCTACAATGAGCAGTTCTTCCASSLNFLSGNPYNEQFFTRBV4-1*01.TRBJ2-1*0112-1-1345-1-12
0.148818TGTGCCAGCAGCTTCCCGAGGTTGGCAGATACGCAGTATTTTCASSFPRLADTQYFTRBV11-2*01.TRBJ2-3*0114-1-1253-1-14
0.140517TGCGCCAGCAGCCACTTAGGGGGAGAAGGCTACGAGCAGTACTTCCASSHLGGEGYEQYFTRBV5-1*01TRBD1*01TRBJ2-7*01121723294513
0.12415TGCGCCAGCAGCCGCGGATTAGGGACAACAAGCACAGATACGCAGTATTTTCASSRGLGTTSTDTQYFTRBV5-1*01TRBD1*01TRBJ2-3*01122127304060
0.12415TGTGCCAGCAGCGCCCACACCGGGGAGCTGTTTTTTCASSAHTGELFFTRBV9*01.TRBJ2-2*0113-1-1163-1-13
0.115714TGCAGCGACAGGGAATACAATGAGCAGTTCTTCCSDREYNEQFFTRBV29-1*01TRBD1*01TRBJ2-1*017613157234
0.107513TGTGCCAGCAGCTTACGGGGGGGACAGGGCGGAGAGACCCAGTACTTCCASSLRGGQGGETQYFTRBV7-2*01TRBD1*01TRBJ2-5*01151621322614

Shown is the partial result of one sample (GQ01-E120).



 

4.2 CDR3 Amino Acid Abundancy Analysis

The outputs for the CDR3 amino acid abundancy analysis for all the samples are in the 'TCR_analysis/CDR3_abundancy' directory and can be accessed using the link below:

CDR3 amino acid abundancy   CDR3 amino acid abundancy





Table 4.2.1 CDR3 amino acid abundancy for TCR alpha

FreqCountCDR3aaV_geneD_geneJ_gene
1.9101261CAVLNAGGTSYGKLTFTRAV21*01.TRAJ52*01
1.5735215CPRRALTFTRAV26-2*01.TRAJ5*01
0.7977109CAVMDSNYQLIWTRAV1-2*01.TRAJ33*01
0.673392CAVSRGSTLGRLYFTRAV36/DV7*01.TRAJ18*01
0.424558CAVQDLLTSGSRLTFTRAV20*01.TRAJ58*01
0.387953CAVRDSNYQLIWTRAV1-2*01.TRAJ33*01
0.307442CAAMDSNYQLIWTRAV1-2*01.TRAJ33*01
0.292740CAVMDSSYKLIFTRAV1-2*01.TRAJ12*01
0.270837CAVERMDSSYKLIFTRAV12-2*01.TRAJ12*01
0.270837CAYRSSRDDKIIFTRAV38-2/DV8*01.TRAJ30*01
0.270837CAVLDSNYQLIWTRAV1-2*01.TRAJ33*01
0.256135CAAPNAGGTSYGKLTFTRAV13-1*01.TRAJ52*01
0.212229CAYRRGSRLTFTRAV38-2/DV8*01.TRAJ58*01
0.190326CALIAQAGTALIFTRAV9-2*01.TRAJ15*01
0.168323CAENSPTSGTYKYIFTRAV13-2*01.TRAJ40*01
0.168323CLVGGAMGGADGLTFTRAV4*01.TRAJ45*01
0.16122CAYLGNTPLVFTRAV38-2/DV8*01.TRAJ29*01
0.131718CAVKDSNYQLIWTRAV1-2*01.TRAJ33*01
0.131718CENRRGNTGKLIFTRAV13-2*01.TRAJ37*01

Shown is the partial result of one sample (GQ01-E120). Count: read count of the clone. Freq: the frequency (percentage) of this clone in the entire population (eg. 1.5 = 1.5%). CDR3aa: CDR3 amino acid sequence



Table 4.2.2 CDR3 amino acid abundancy for TCR beta

FreqCountCDR3aaV_geneD_geneJ_gene
0.9425114CASSLSRQGRGGTQYFTRBV27*01TRBD1*01TRBJ2-3*01
0.727588CASSLDATTDTQYFTRBV28*01.TRBJ2-3*01
0.653179CASSRQRGGNQPQHFTRBV12-3*01TRBD1*01TRBJ1-5*01
0.46356CASSLRGGGLREQYFTRBV12-3*01TRBD2*01TRBJ2-7*01
0.388647CASSLRTGPKQYFTRBV5-1*01TRBD1*01TRBJ2-7*01
0.380346CASSFLDGAAGVDTQYFTRBV7-8*01TRBD2*01TRBJ2-3*01
0.272833CASSQWTGINQPQHFTRBV5-1*01TRBD1*01TRBJ1-5*01
0.272833CATTLQGVDGANVLTFTRBV6-5*01TRBD1*01TRBJ2-6*01
0.223227CASSLNFLSGNPYNEQFFTRBV4-1*01.TRBJ2-1*01
0.223227CASSQDLTVENIQYFTRBV4-3*01.TRBJ2-4*01
0.206725CASSAKNYGYTFTRBV27*01.TRBJ1-2*01
0.198424CSAFMVGDEQFFTRBV20-1*01TRBD1*01TRBJ2-1*01
0.190123CASSFPRLADTQYFTRBV11-2*01.TRBJ2-3*01
0.190123CASSEVSGSGGDTQYFTRBV6-1*01TRBD2*01TRBJ2-3*01
0.173621CASSHLGGEGYEQYFTRBV5-1*01TRBD1*01TRBJ2-7*01
0.165320CASSAHTGELFFTRBV9*01.TRBJ2-2*01
0.157119CASSRGLGTTSTDTQYFTRBV5-1*01TRBD1*01TRBJ2-3*01
0.140517CASSPVASGRGEQYFTRBV7-8*01TRBD2*01TRBJ2-7*01
0.12415CSDREYNEQFFTRBV29-1*01TRBD1*01TRBJ2-1*01

Shown is the partial result of one sample (GQ01-E120).


 

4.3 CDR3 Length Distribution

CDR3 amino acid sequences were further extracted and the length distribution was plotted as in Figure 4.3.


GQ01-E120-TRA_CDR3_len.png
  • GQ01-E120-TRA_CDR3_len.png
  • GQ01-E120-TRB_CDR3_len.png
  • GQ02-E120-TRA_CDR3_len.png
  • GQ02-E120-TRB_CDR3_len.png
  • GQ03-E120-TRA_CDR3_len.png
  • GQ03-E120-TRB_CDR3_len.png
  • GQ04-E120-TRA_CDR3_len.png
  • GQ04-E120-TRB_CDR3_len.png
  • GQ05-E120-TRA_CDR3_len.png
  • GQ05-E120-TRB_CDR3_len.png
  • GQ06-E120-TRA_CDR3_len.png
  • GQ06-E120-TRB_CDR3_len.png
  • GQ07-E120-TRA_CDR3_len.png
  • GQ07-E120-TRB_CDR3_len.png
  • GQ08-E120-TRA_CDR3_len.png
  • GQ08-E120-TRB_CDR3_len.png

Figure 4.3 CDR3 amino acid sequence length distribution.

 

4.4 Top V(D)J Analysis

V(D)J usage of CDR3 were further analyzed. CDR3 sequences were collapsed on V(D)J combination. Figures 4.4.1 and 4.4.2 show the proportion of the top ten combinations in the entire population.


GQ01-E120-TRA_vj_stackbar.png
  • GQ01-E120-TRA_vj_stackbar.png
  • GQ02-E120-TRA_vj_stackbar.png
  • GQ03-E120-TRA_vj_stackbar.png
  • GQ04-E120-TRA_vj_stackbar.png
  • GQ05-E120-TRA_vj_stackbar.png
  • GQ06-E120-TRA_vj_stackbar.png
  • GQ07-E120-TRA_vj_stackbar.png
  • GQ08-E120-TRA_vj_stackbar.png

Figure 4.4.1 Top VJ combination usage for TCR alpha.

GQ01-E120-TRB_vdj_stackbar.png
  • GQ01-E120-TRB_vdj_stackbar.png
  • GQ02-E120-TRB_vdj_stackbar.png
  • GQ03-E120-TRB_vdj_stackbar.png
  • GQ04-E120-TRB_vdj_stackbar.png
  • GQ05-E120-TRB_vdj_stackbar.png
  • GQ06-E120-TRB_vdj_stackbar.png
  • GQ07-E120-TRB_vdj_stackbar.png
  • GQ08-E120-TRB_vdj_stackbar.png

Figure 4.4.2 Top VDJ combination usage for TCR beta.

 

4.5 V Gene Usage Analysis

To assess germline gene coverage, V gene annotation information was extracted for all the sequences and the total count of each germline gene was calculated and summarized in Table 4.5.1 (TCR alpha) and Table 4.5.2 (TCR beta).


Table 4.5.1 TCR alpha V gene usage

V_geneGQ01-E120-TRAGQ02-E120-TRAGQ03-E120-TRAGQ04-E120-TRAGQ05-E120-TRAGQ06-E120-TRAGQ07-E120-TRAGQ08-E120-TRA
TRAV166969364861751508505573
TRAV1024224120120513193182157
TRAV129528629268031019311109940
TRAV131807154515871483142126013351272
TRAV14221
TRAV1611810911411610110144128
TRAV1735630629429044362405412
TRAV19546155578165159174
TRAV231432728235954588545595
TRAV2082674873372929280349349
TRAV2198499399396843497538512
TRAV2214512315812313114136131
TRAV23178
TRAV24728265643685878
TRAV25981051148918119131127
TRAV2696995288490264680590678
TRAV2736734431734031279315338
TRAV29301
TRAV39684948713151115168
TRAV3017913716114120151159154
TRAV34252834207334230
TRAV3521519419417029210257212
TRAV3699
TRAV381904185118981766112105712401142
TRAV39781098888128712494
TRAV445841044742127311299285
TRAV40303025212223324
TRAV4118417815516315210217179
TRAV516816813716121212247200
TRAV68388747113180191169
TRAV873269672067084918971935
TRAV945947841040661639570674
TRAV7NANA2NANANANANA



Table 4.5.2 TCR beta V gene usage

V_geneGQ01-E120-TRBGQ02-E120-TRBGQ03-E120-TRBGQ04-E120-TRBGQ05-E120-TRBGQ06-E120-TRBGQ07-E120-TRBGQ08-E120-TRB
TRBV1112NANANANANA
TRBV1033728931632235229537278
TRBV1124624926029139934040339
TRBV12125811661187127599982498821
TRBV13241821204847239
TRBV1460586364113101972
TRBV1520117017420924919831194
TRBV1633239525
TRBV1817013417116328420430222
TRBV1917912917013722818023197
TRBV215614818016835420128226
TRBV201298116612631250148012501531285
TRBV21292042204036234
TRBV23181211102834235
TRBV2417714315013819611920112
TRBV2570657163148871072
TRBV2741845341943533626438262
TRBV2867559162865459347450422
TRBV29805828748792101495991953
TRBV321917622322640727639282
TRBV3015715316616916013512119
TRBV452952452658281670069734
TRBV52079189921752185244120332341712
TRBV6955927958102013021037101998
TRBV71828173218741931276223542162105
TRBV920421722322639334340330
TRBV247

The distribution of V gene usage is also illustrated in bar graphs (Figure 4.5.1 and 4.5.2).


GQ01-E120-TRA_v_gene_stat.png
  • GQ01-E120-TRA_v_gene_stat.png
  • GQ02-E120-TRA_v_gene_stat.png
  • GQ03-E120-TRA_v_gene_stat.png
  • GQ04-E120-TRA_v_gene_stat.png
  • GQ05-E120-TRA_v_gene_stat.png
  • GQ06-E120-TRA_v_gene_stat.png
  • GQ07-E120-TRA_v_gene_stat.png
  • GQ08-E120-TRA_v_gene_stat.png

Figure 4.5.1 V gene usage of TCR alpha.

GQ01-E120-TRB_v_gene_stat.png
  • GQ01-E120-TRB_v_gene_stat.png
  • GQ02-E120-TRB_v_gene_stat.png
  • GQ03-E120-TRB_v_gene_stat.png
  • GQ04-E120-TRB_v_gene_stat.png
  • GQ05-E120-TRB_v_gene_stat.png
  • GQ06-E120-TRB_v_gene_stat.png
  • GQ07-E120-TRB_v_gene_stat.png
  • GQ08-E120-TRB_v_gene_stat.png

Figure 4.5.2 V gene usage of TCR bets.



 

4.6 Clonal Type Distribution

The clonal type distribution was analysed based on V-J usage and relationship. The results for each sample are shown in Figures 4.6.1 and 4.6.2.


GQ01-E120-TRA_chord_diagram.png
  • GQ01-E120-TRA_chord_diagram.png
  • GQ02-E120-TRA_chord_diagram.png
  • GQ03-E120-TRA_chord_diagram.png
  • GQ04-E120-TRA_chord_diagram.png
  • GQ05-E120-TRA_chord_diagram.png
  • GQ06-E120-TRA_chord_diagram.png
  • GQ07-E120-TRA_chord_diagram.png
  • GQ08-E120-TRA_chord_diagram.png

Figure 4.6.1 Chord diagram of TCR alpha

GQ01-E120-TRB_chord_diagram.png
  • GQ01-E120-TRB_chord_diagram.png
  • GQ02-E120-TRB_chord_diagram.png
  • GQ03-E120-TRB_chord_diagram.png
  • GQ04-E120-TRB_chord_diagram.png
  • GQ05-E120-TRB_chord_diagram.png
  • GQ06-E120-TRB_chord_diagram.png
  • GQ07-E120-TRB_chord_diagram.png
  • GQ08-E120-TRB_chord_diagram.png

Figure 4.6.2 Chord diagram of TCR beta

 

4.7 Correlation Analysis

To explore relationship across samples, correlation efficiency was calculation based on CDR3 amino acid sequences. Result is shown in Figure 4.7.



Figure 4.7 Sample correlation based on CDR3 amino acid sequences .

 

4.8 Diversity Analysis

The clonal abundance distribution was calculated with confidence intervals derived via bootstrapping. The clonal diversity of the repertoire was accessed using diversity index [6]. Abundance curves are in Figure 4.8.1 and diversity curves are in Figure 4.8.2.



Figure 4.8.1 Abundance curves.

Figure 4.8.2 Diversity curves.

 

5. References

[1] Murphy K. Janeway's Immunobiology. New York: Garland Science (2012).

[2] Bolger, AM. et al., Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. (2014) 30(15): 2114▒~@~S2120.

[3] Lefranc, MP. et al., The international ImMunoGeneTics database. Nucleic Acids Res. (1999) 27 (1): 209-212.

[4] Andrews S. et al., FastQC: a quality control tool for high throughput sequence data. (2010) Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

[5] Ye J. et al., IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. (2013) 41: W34▒~@~SW40.

[6] Hill, M. et al., Diversity and evenness: a unifying notation and its consequences. Ecology (1973) 54:427-432.