Immunoglobulin Repertoire Sequencing Analysis Report

 

1. Project Summary

 

2. Workflow Overview

 

2.1 Experiment Workflow

The incredible diversity of B and T lymphocyte receptors is the key to the adaptive immune system [1]. For high-throughput sequencing of the variable domains of these antibodies, total RNA is extracted and reverse transcribed using proprietary gene-specific primers that bind to the constant regions (Figure 1.1). The resulting cDNA is amplified using proprietary gene-specific primers to create next-generation sequencing libraries. These libraries were then subjected to deep sequencing.

Figure 1.1 Immune repertoire sequencing (IgSeq) workflow

 

2.2 Bioinformatics Analysis Workflow

The raw sequencing data are in fastq format and can be found in FASTQ folder.The raw data were subjected to quality filtering. Sequences that passed the quality filtering were mapped against either IMGT database [2] or customized databases to find the best germline V(D)J gene matches. CDR sequences were then further characterized and analyzed.


Figure 2.1 Bioinformatics analysis workflow

 

3. Data Quality Analysis

Raw fastq files were first subject to quality assessment [4] (Figure 3.1.1). Bases with poor quality scores (Q<20) were removed using Trimmomatic (v0.30) [3]. After quality trimming, reads shorter than 150bp were also removed from subsequent processing. Trimmed data were also subject to quality assessment [4] (Figure 3.1.2).

ASC-S-vH_R1_raw_per_base_quality.png
  • ASC-S-vH_R1_raw_per_base_quality.png
  • ASC-S-vH_R2_raw_per_base_quality.png
  • ASC-S-vL_R1_raw_per_base_quality.png
  • ASC-S-vL_R2_raw_per_base_quality.png
  • PBMC-0-S-vH_R1_raw_per_base_quality.png
  • PBMC-0-S-vH_R2_raw_per_base_quality.png
  • PBMC-0-S-vL_R1_raw_per_base_quality.png
  • PBMC-0-S-vL_R2_raw_per_base_quality.png
  • PBMC-7-S-vH_R1_raw_per_base_quality.png
  • PBMC-7-S-vH_R2_raw_per_base_quality.png
  • PBMC-7-S-vL_R1_raw_per_base_quality.png
  • PBMC-7-S-vL_R2_raw_per_base_quality.png


Figure 3.1.1 Sequence quality across all bases on raw reads.

ASC-S-vH_R1_clean_per_base_quality.png
  • ASC-S-vH_R1_clean_per_base_quality.png
  • ASC-S-vH_R2_clean_per_base_quality.png
  • ASC-S-vL_R1_clean_per_base_quality.png
  • ASC-S-vL_R2_clean_per_base_quality.png
  • PBMC-0-S-vH_R1_clean_per_base_quality.png
  • PBMC-0-S-vH_R2_clean_per_base_quality.png
  • PBMC-0-S-vL_R1_clean_per_base_quality.png
  • PBMC-0-S-vL_R2_clean_per_base_quality.png
  • PBMC-7-S-vH_R1_clean_per_base_quality.png
  • PBMC-7-S-vH_R2_clean_per_base_quality.png
  • PBMC-7-S-vL_R1_clean_per_base_quality.png
  • PBMC-7-S-vL_R2_clean_per_base_quality.png


Figure 3.1.2 Sequence quality across all bases on clean reads.

After quality trimming, each Read1 and Read2 sequence pair was merged based on overlapping sequences. Merged reads with length less than 200bp were removed from subsequent analysis. The statistics of each processing step are summarized in Table 3.1. Length distribution of the merged contigs is shown in Figure 3.2.


Table 3.1 Raw sequencing data quality statistics

SampleRaw Count% of Raw(>Q20)% of Raw(>Q30)Clean Count% of Clean(>Q20)% of Clean(>Q30)% of Trimmed Merged Count % of Merged Mapped Count % of Mapped
SampleRaw_count%_of_rawQ20%_of_rawQ30Clean_count%_of_cleanQ20%_of_cleanQ30%_of_trimmedMerged_count%_of_mergedMapped_count%_of_mapped
ASC-S-vH1e+0683.3170.3942035099.0894.9442.0330055430.0617231357.33
ASC-S-vL1e+0685.2773.7744710799.0594.9644.7133367733.3711917235.71
PBMC-0-S-vH1e+0683.7871.3444843899.0994.9744.8428377428.3817368661.21
PBMC-0-S-vL1e+0685.0873.3543147799.0394.9143.1531900831.911494536.03
PBMC-7-S-vH1e+0683.717143918499.0994.9743.9229723029.7217169757.77
PBMC-7-S-vL1e+0685.1573.3842707499.0394.9342.7132017432.0211605236.25

Contig_length_ASC-S-vH.png
  • Contig_length_ASC-S-vH.png
  • Contig_length_ASC-S-vL.png
  • Contig_length_PBMC-0-S-vH.png
  • Contig_length_PBMC-0-S-vL.png
  • Contig_length_PBMC-7-S-vH.png
  • Contig_length_PBMC-7-S-vL.png


Figure 3.2 Assembled sequence length distribution.

 

4. Immunoglobulin Sequence Analysis

 

4.1 Sequence Alignment

The assembled reads were blasted against IMGT database to identify the best germline V(D)J gene matches [5]. The alignment result for each individual sequences is as in Table 4.1. The complete output for all the samples are in the ‘Report\Clonal_tables’ directory and can be accessed using the link below:

Clonal tables   Clonal tables




Table 4.1.1 VDJ gene mapping of clonal reads (heavy chain)

Read_nameCDR1ntCDR1aaCDR2ntCDR2aaCDR3ntCDR3aa Top_V_Gene Top_D_Gene Top_J_Gene
M04670:18:000000000-ATLPC:1:2106:8878:1748:ACTCGCTA+CCTAGAGTGGTGACTCCATCAGCAGTATTAGTTACTACGDSISSISYYATCTATTATAGTGGGAACACCIYYSGNTTGTGCGAGACATGGGGCGAGTAGTGCTAGTTGTTACCACGACTATTGGCARHGASSASCYHDYWIGHV3-5*02IGHD6-2*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:19468:2597:ACTCGCTA+CCTAGAGTGGTGCCTCCGTCAGTAGTTACTATGASVSSYYATCTCTTTCAGTGGGATCACCISFSGITTGTGCGAGCCCCTGTTGGTATGACTGGAAGTGTCTCATGGACGTCTGGCASPCWYDWKCLMDVWIGHV3-2*02IGHD1-1*02IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:6990:3052:ACTCGCTA+CCTAGAGTGGATTCACCTTCAGTAGCTATGCTGFTFSSYAATATCATATGATGGAAGTAATAAAISYDGSNKTGTGCGAGAGGTTTGGATCTGGGGGGCTACTACTACGGTATGGACGTCTGGCARGLDLGGYYYGMDVWIGHV5-17*02IGHD1-1*01IGHJ1*01
M04670:18:000000000-ATLPC:1:2106:8601:2756:ACTCGCTA+CCTAGAGTGGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATCAGGAGATCCGGGCTCCATAATGATGCTTTTGATATATGGCARLPIRRSGLHNDAFDIWIGHV3-2*02IGHD2-3*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:6561:2747:ACTAGCTA+CCTAGAGTNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXTGTGTTAGAAGGCACCCAGCACCAACTGGCAACATTTTTGACTTCTGGCVRRHPAPTGNIFDFWIGHV5-9*02IGHD4-1*02IGHJ2*01
M04670:18:000000000-ATLPC:1:2106:12327:1762:ACTCGCTA+CCTAGAGTGGTGACTCCATCACCAGAGGTGATTACTTCGDSITRGDYFNNNNNNNNNNNNNNNNNNNNNXXXXXXXTGTGCGAGAGGTTCGGCCCCGACGGGGAACAACTACTTCGACCCCTGGCARGSAPTGNNYFDPWIGHV3-2*02IGHD2-1*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:15690:2814:ACTCGCTA+CCTAGAGTGGGTTCTCACTCAACACTCGTGGAACGACTGFSLNTRGTTATTTATTGGGATGGTGATGACIYWDGDDTGTGCACACGGACGCCCAGACTGGGGAGCAGATGCTTTTGATGTCTGGCAHGRPDWGADAFDVWIGHV8-12*01IGHD4-1*01IGHJ1*01
M04670:18:000000000-ATLPC:1:2106:23881:2561:ACTCGCTA+CCTAGAGTGGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATCAGGAGATCCGGACTCCTTAATGATGCTTTTGATATCTGGCARLPIRRSGLLNDAFDIWIGHV3-2*02IGHD2-3*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:11036:3087:ACTCGCTA+CCTAGAGTGGATTCTCACTGAGTCATAGTGGAGTGGGTGFSLSHSGVGATTTATTGGGATGATGATAAAIYWDDDKTGTGCACACAGGGATGTAGTCACCTTTGACTCTTGGCAHRDVVTFDSWIGHV8-13*01IGHD1-1*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:17653:2689:ACTCGCTA+CCTAGAGTGGTTACACCTTTACCAGCTATGGTGYTFTSYGATCAGCGCTTACAATGGTAAGACAISAYNGKTTGTGCGAGAGGGCAGTGGAACTATGATATAAGTGGAGCGTGGGACTACTGGCARGQWNYDISGAWDYWIGHV1S126*01IGHD2-4*01IGHJ4*01
M04670:18:000000000-ATLPC:1:2106:15780:2769:ACTCGCTA+CCTAGAGTNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXTGTGCGAGAGATTACCACTGGTCTGGGTTTGACTACTGGCARDYHWSGFDYWIGHV5-17*01IGHD2-4*01IGHJ2*01
M04670:18:000000000-ATLPC:1:2106:12947:3010:ACTCGCTA+CCTAGAGTGGTGGGTCGTTTAGTGGTTTCGACGGSFSGFDATCAGCCACACTGGAACTACGISHTGTTTGTGCGCGAATTCCCATGAGGAGAACCGGGGTCAACGATGATGCCTTTGATATGTGGCARIPMRRTGVNDDAFDMWIGHV3-2*02IGHD2-3*01IGHJ3*02
M04670:18:000000000-ATLPC:1:2106:13319:1512:ACTCGCTA+CCTAGAGTNNNNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXNNNNNNNNNNNNNNNNNNNNNXXXXXXXTGTGCCCGTATAGTGGGAGGTAGTGTGGACTACTGGCARIVGGSVDYWIGHV3-1*02IGHD1-1*01IGHJ4*01
M04670:18:000000000-ATLPC:1:2106:14336:3105:ACTCGCTA+CCTAGAGTGATGGCTCCATCATCAGTGGTAATTACTACDGSIISGNYYCGTATCTCTRISTGTGCGAGATTTCTCGCTGGGAGTCAGTACCTTAACTACTGGCARFLAGSQYLNYWIGHV3-4*02IGHD5-1*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:11082:2663:ACTCGCTA+CCTAGAGTGGTTACACCTTTACCAGATATGGTGYTFTRYGACTAGCACTCACGATGAGGACTCATSTHDEDSTGTGCGAGAGATTGGGACGGGAGAAACGACTGCTTCGACCCCTGGCARDWDGRNDCFDPWIGHV1-85*01IGHD4-1*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:16412:3064:ACTCGCTA+CCTAGAGTGGTTACACCTTAACCACTTATGGTGYTLTTYGATCGGCGCTTACAATGGTAACACAIGAYNGNTTGTGCGACAGACGGGCAGCAGCTGGTTCCGATGTCCGCCTGGCATDGQQLVPMSAWIGHV1-81*01IGHD5-7*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:14422:2879:ACTCGCTA+CCTAGAGTGGTGCCTCCATCAGCACTAGTGATTACTACGASISTSDYYATCTATTATAGTGGGAGTAGCIYYSGSSTGTGCGAGACATTCTACAGATAGTACCGAGAAGTTCGACCCCTGGCARHSTDSTEKFDPWIGHV3-5*02IGHD2-5*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:10907:2923:ACTCGCTA+CCTAGAGTGGGTTCTCACTCAACACTCCTGGAGTGTGTGFSLNTPGVCATTGATTGGGATGATGATAAGIDWDDDKTGCGCACGGACGGATTGCAGTAACTACGGATGTTGGTTCGACCCCTGGCARTDCSNYGCWFDPWIGHV8-12*01IGHD2-5*01IGHJ3*01
M04670:18:000000000-ATLPC:1:2106:16525:2101:ACTCGCTA+CCTAGAGTGGGGACAGTGTCTCTAGCAACAGTGCTACTGDSVSSNSATACATACTACAGGTCCAAGTGGTATAATTYYRSKWYNTGTACAAGAGGCTATAAGCAGCAAGACTACTGGCTRGYKQQDYWIGHV3S1*01IGHD3-2*02IGHJ4*01
M04670:18:000000000-ATLPC:1:2106:20471:3646:ACTCGCTA+CCTAGAGTGGTGCCTCCATTAACAGTGGTAGTTACTATGASINSGSYYTTCTATACTACTGGGAGGACCFYTTGRTTGTGCGAGAGATCCCCCTCCCTACGTAATGGGAGCGCAGGCAGTGTGGCARDPPPYVMGAQAVWIGHV3-5*02IGHD5-7*01IGHJ1*01

Shown is the partial result of one sample (ASC-S-vH). Most of the sequences were truncated due to format limitation. Read_name: fastq header of the DNA sequence; CDR[1-3]nt/CDR[1-3]aa: the nucleotide/amino acid sequences of CDR1,2,3; Top_V_gene: the best matched germline v gene; Top_D_gene: the best matched germline D gene, Top_J_gene: the best matched germline J gene. Repetitive 'N's in nucleotide columns and repetitive 'X's in amino acid columns indicate no valid sequences were returned.

Table 4.1.2 VJ gene mapping of clonal reads (light chain)

Read_nameCDR1ntCDR1aaCDR2ntCDR2aaCDR3ntCDR3aa Top_V_Gene Top_J_Gene
M04670:21:000000000-ATFAC:1:2106:21903:1604:GGAGCTAC+TCGACTAGCAGGGCGTTGCCAGTGGTQGVASGGATGCCTCCDASTGTCAGCAGTATAATAACTGGCCTTTGTGGACGTTCCQQYNNWPLWTFIGKV11-125*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:11438:1763:GGAGCTAC+TCGACTAGCAGAGTGTTAGTAGCAACQSVSSNGGCGCATCGACGASTGTCAGCAGTATGATAACTGGCCTTTGTGGACGTTCCQQYDNWPLWTFIGKV5-45*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:10469:1752:GGAGCTAC+TCGACTAGCAGACTATTAGCATCACCTACQTISITYGGTGCATCCGASTGTCAGCAGTACGGTAGTTCACCGTGGACGTTCCQQYGSSPWTFIGKV18-36*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:14873:1850:GGAGCTAC+TCGACTAGNNNNNNNNNNNNNNNNNNXXXXXXNNNNNCTCCXXSTGTCAGCAATATTTTGTTACTCCGCTCACTTTCCQQYFVTPLTFIGKV18-36*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:16330:1899:GGAGCTAC+TCGACTAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXXXNNNNNNNNNXXXTGCATGCAATCTCTACAGACTCCTCTCACTTTCCMQSLQTPLTFIGKV2-a*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:13147:1941:GGAGCTAC+TCGACTAGCAGACCATTACCAATTCCQTITNSGGTACATCCGTSTGTCAACAGAGTTACACCATCCCCTGGACGTTCCQQSYTIPWTFIGKV11-125*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:11304:1625:GGAGCTAC+TCGACTAGNNNNNNNNNNNNNNNNNNXXXXXXGGTGCATCGGASTGTCAGCAGTATAATAACTGGCCTTTGTGGACGTTCCQQYNNWPLWTFIGKV18-36*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:10681:1583:GGAGCTAC+TCGACTAGCAGAGCATTAGCAGTGGTQSISSGGATGCCTCCDASTGTCAACAATATAATAATTACCCCATCACCTTCCQQYNNYPITFIGKV11-125*01 IGKJ4*01
M04670:21:000000000-ATFAC:1:2106:17161:1570:GGAGCTAC+TCGACTAGCAGAGTATTAGTACCCAGQSISTQGAGGCATCTEASTGCCTACAATATTTTTATTATTGGACGTTCCLQYFYYWTFIGKV11-125*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:15148:1742:GGAGCTAC+TCGACTAGCAGAGTGTTAGCAGCTACQSVSSYGATGCATCCDASTGTCAGCAGTATAATAACTACTGGACGTTCCQQYNNYWTFIGKV18-36*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:12690:1861:GGAGCTAC+TCGACTAGCAGAGCATTAGCGTCTATQSISVYACTGCATCCTASTGTCAACAGAGTTACAGTCGCCCTCGGACGTTCCQQSYSRPRTFIGKV11-125*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:12369:1900:GGAGCTAC+TCGACTAGNNNNNNNNNNNNNNNNNNXXXXXXNNNNNNNNNXXXTGTCAACAGCTTAGTAGTTACCCGCTCACTTTCCQQLSSYPLTFIGKV11-125*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:11496:1948:GGAGCTAC+TCGACTAGAAGAGTGTTTTATCCATCTCCAACGACAAAAACTACKSVLSISNDKNYTGGTCTTCTWSSTGTCAGCAGTATTATGATAAGCCGGTCACTTTCCQQYYDKPVTFIGKV8-30*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:13285:1814:GGAGCTAC+TCGACTAGCAGGGCATAAGCAGGGGTQGISRGTATGCCTCCYASTGTCAACAGTTTAATCAGTATCCCATCACGTTCCQQFNQYPITFIGKV11-125*01 IGKJ4*01
M04670:21:000000000-ATFAC:1:2106:16163:1988:GGAGCTAC+TCGACTAGCAGGACATTGGCAGTTCTQDIGSSGGTGCATCCGASTGTCAACAGCTTAAAAGTTACCCCATCAATTTCCQQLKSYPINFIGKV11-125*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:12925:2184:GGAGCTAC+TCGACTAGCAGAGTGTTAGCAGCAACQSVSSNGGTGCATCGGASTGTCAGCAGTATAATAACTGGCCTTTGTGGACGTTCCQQYNNWPLWTFIGKV5-45*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:13761:2193:GGAGCTAC+TCGACTAGNNNNNNNNNNNNNNNNNNXXXXXXNNNNNNNNNXXXTGTCAGCAGTATGATAACTGGCCTTTGTGGACGTTCCQQYDNWPLWTFIGKV16-104*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:19380:2309:GGAGCTAC+TCGACTAGCAGACTATTGGTGCCGACTACQTIGADYGCTGCCTCCAASTGTCAGCAGTATGGTACTTCACTTAGGACGTTCCQQYGTSLRTFIGKV18-36*01 IGKJ1*01
M04670:21:000000000-ATFAC:1:2106:19537:2312:GGAGCTAC+TCGACTAGCAGAGCTTTGGCTGCTGCQSFGCCGGTGCCTCCGASTGTCACCAGCGAAGTAGCTGGCCTCCGTTCACTTTCCHQRSSWPPFTFIGKV18-36*01 IGKJ4*01
M04670:21:000000000-ATFAC:1:2106:24104:2288:CGAGCTAC+TCGACTAGCAGAGTGTTCGCGGCAACTACQSVRGNYGGTGCGTGTCAACAATATGGTAGCTCCACGTGGACGTTCCQQYGSSTWTFIGKV18-36*01 IGKJ1*01

Shown is the partial result of one sample (ASC-S-vL). Most of the sequences were truncated due to format limitation.

 

4.2 CDR Abundancy Analysis

To assess sequence abundancy, reads with the same CDR sequences were collapsed together and the counts of each unique sequences were calculated. The aggregation was performed on three levels – sequences were first collapsed based on nucleotide sequences of CDR1,2,3 (Tables 4.2.1 and 4.2.2), then further collapsed based on CDR1,2,3 amino acid sequences (Tables 4.2.3 and 4.2.4), and then further collapsed based on CDR3 amino acid sequence alone (Tables 4.2.5 and 4.2.6). The complete CDR1,2,3 enrichment analysis for all the samples are in “CDR_all_enrichment” directory, the CDR amino acid sequence enrichment analysis for all the samples are in “CDR_aa_enrichment” directory, and the CDR3 enrichment analysis for all the samples are in “CDR3_aa_enrichment” directory in the the Report/Enrichment_tables/ folder or can be accessed using the link below:

CDR_all_enrichment   CDR_all_enrichment
CDR_aa_enrichment   CDR_aa_enrichment
CDR3_aa_enrichment   CDR3_aa_enrichment






Table 4.2.1 CDR enrichment by CDR1,2,3 nucleotide sequence (heavy chain)

CountCDR1ntCDR1aaCDR2ntCDR2aaCDR3ntCDR3aa Top_V_Gene Top_J_Gene
11289GGTGGGTCGTTTAGTGGTTTCGACGGSFSGFDATCAGCCACACTGGAACTACGISHTGTTTGTGCGCGAATTCCCATGAGGAGAACCGGGGTCAACGATGATGCCTTTGATATGTGGCARIPMRRTGVNDDAFDMWIGHV3-2*02IGHJ3*02
3100GGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATCAGGAGATCCGGACTCCTTAATGATGCTTTTGATATCTGGCARLPIRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
2339GGATTCATCTTCAGCACCTACTGGGFIFSTYWATTAATGAAGATGGCAGGATTACCINEDGRITTGTGTTAGAAGGCACCCAGCACCAACTGGCAACATTTTTGACTTCTGGCVRRHPAPTGNIFDFWIGHV5-6-3*01IGHJ2*01
1980GGTGACTCCATCAGCAATACTAGATATTACGDSISNTRYYATATATAATAGTGGAAATATCIYNSGNITGTGCGGGGCACGTTTGGAACTACGAAGTTGACTACTGGCAGHVWNYEVDYWIGHV3-5*02IGHJ2*01
1452GGTGTCGCCATCACTAGTTTCCACGVAITSFHATATATCACAATGGAGACACCIYHNGDTTGTGCGAGAGTTGATGCAATCATTGAAATGGACTACTTCTACGGTCTGGACGTCTGGCARVDAIIEMDYFYGLDVWIGHV3-2*02IGHJ1*01
1339GGTGACTCCATCAGCAATACTAGATATTACGDSISNTRYYATATATAATAGTAGAAATATCIYNSRNITGTGCGGGGCACGTTTGGAACTACGAAGTTGACTACTGGCAGHVWNYEVDYWIGHV3-5*02IGHJ2*01
976GGGTTCTCACTCAATACTCGTGGAACGACTGFSLNTRGTTATTTATTGGGATGATGATAGCIYWDDDSTGTGCACACGGACGCCCAGACTGGGGAGCAGATGCTTTTGATGTCTGGCAHGRPDWGADAFDVWIGHV8-12*01IGHJ1*01
931GGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATCAGGAGATCCGGGCTCCATAATGATGCTTTTGATATATGGCARLPIRRSGLHNDAFDIWIGHV3-2*02IGHJ3*01
829GGCGGCTCCATCAGTAGCAGTAGTTACCACGGSISSSSYHATCTATTATAGTGGGAGCACGIYYSGSTTGTGCGAGTCGTCGAAATGAACCTGGAGGGTGGTTCGACTCCTGGCASRRNEPGGWFDSWIGHV3-1*01IGHJ3*01
753GGTGGGTCATTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATGAGGAGATCCGGGCTCCTTAATGATGCTTTTGATATCTGGCARLPMRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
705GGTTACACCTTTACTGATTATGCTGYTFTDYAATCAGCGTTTCCAATGGTAAAACAISVSNGKTTGTGCGAGAGCGTTTCAACCTCAAGTCTGGGTCGGGGAGTCTTATCTCGACTACTGGCARAFQPQVWVGESYLDYWIGHV1-77*01IGHJ2*01
666GGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTACCISHSGSTTGTGCGAGACTTCCCATCAGGAGATCCGGGCTCCATAATGATGCTTTTGATATATGGCARLPIRRSGLHNDAFDIWIGHV3-2*02IGHJ3*01
626NNNNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXNNNNNNNNNNNNNNNNNNNNNXXXXXXXTGTGCGCGAATTCCCATGAGGAGAACCGGGGTCAACGATGATGCCTTTGATATGTGGCARIPMRRTGVNDDAFDMWIGHV12-3*01IGHJ3*02
611GGGTTCTCATTCACTTCTAGTGGACGGGGTGFSFTSSGRGATTTATTGGGATGACGATAAGIYWDDDKTGTGCACACAGACCACCATATCAAGGGTACTACTACTTTGACTATTGGCAHRPPYQGYYYFDYWIGHV8-12*01IGHJ2*01
602GGGTTCTCATTCAATACTCTTGGAACGACTGFSFNTLGTTATTTATTGGGATGATGATAGCIYWDDDSTGTGCACACGGACGCCCAGACTGGGGAGAAGATGCTTTTGATGTCTGGCAHGRPDWGEDAFDVWIGHV8-12*01IGHJ1*01
572GGCTTCTCATTCAGTTCTAGTGGACTGGGTGFSFSSSGLGATTTATTGGGATGATGATAAGIYWDDDKTGTGCACACAGACCGCCATATCAACGGTATTACTACTTTGACTATTGGCAHRPPYQRYYYFDYWIGHV8-6*01IGHJ2*01
514GGATACACGTTCACCAATTATGCTGYTFTNYAATGGACCCGAATAGCGGCGACACAMDPNSGDTTGTGCGAGGACCAACTGGGCAGCCTACGGTGTCCCCGACTACTGGCARTNWAAYGVPDYWIGHV1-72*04IGHJ4*01
485NNNNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXNNNNNNNNNNNNNNNNNNNNNXXXXXXXTGTGCGCGAATTCCCATGAGGAGAACCGGGGTCAACGATGATGCCTTTGATATGTGGCARIPMRRTGVNDDAFDMWIGHV12-3*02IGHJ3*02
459GGTGACTCCGTCAGCAATCATAAATACTACGDSVSNHKYYATCTATTCTGGTGGGAACACCIYSGGNTTGTGCGGGGCACAATTGGAATTACGAGGTTGACTACTGGCAGHNWNYEVDYWIGHV3-5*02IGHJ4*01
455GGATTCTCACTCACTACTCGTGGAGGGGGTGFSLTTRGGGCTTTATTGGGATGAAAAGACALYWDEKTTGTGCACACCGTATTGGTTCGGAGACATACTTCGACTACTGGCAHRIGSETYFDYWIGHV8-12*01IGHJ2*01

Shown is the partial result of one sample (ASC-S-vH).

Table 4.2.2 CDR enrichment by CDR1,2,3 nucleotide sequence (light chain)

CountCDR1ntCDR1aaCDR2ntCDR2aaCDR3ntCDR3aa Top_V_Gene Top_J_Gene
11289GGTGGGTCGTTTAGTGGTTTCGACGGSFSGFDATCAGCCACACTGGAACTACGISHTGTTTGTGCGCGAATTCCCATGAGGAGAACCGGGGTCAACGATGATGCCTTTGATATGTGGCARIPMRRTGVNDDAFDMWIGHV3-2*02IGHJ3*02
3100GGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATCAGGAGATCCGGACTCCTTAATGATGCTTTTGATATCTGGCARLPIRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
2339GGATTCATCTTCAGCACCTACTGGGFIFSTYWATTAATGAAGATGGCAGGATTACCINEDGRITTGTGTTAGAAGGCACCCAGCACCAACTGGCAACATTTTTGACTTCTGGCVRRHPAPTGNIFDFWIGHV5-6-3*01IGHJ2*01
1980GGTGACTCCATCAGCAATACTAGATATTACGDSISNTRYYATATATAATAGTGGAAATATCIYNSGNITGTGCGGGGCACGTTTGGAACTACGAAGTTGACTACTGGCAGHVWNYEVDYWIGHV3-5*02IGHJ2*01
1452GGTGTCGCCATCACTAGTTTCCACGVAITSFHATATATCACAATGGAGACACCIYHNGDTTGTGCGAGAGTTGATGCAATCATTGAAATGGACTACTTCTACGGTCTGGACGTCTGGCARVDAIIEMDYFYGLDVWIGHV3-2*02IGHJ1*01
1339GGTGACTCCATCAGCAATACTAGATATTACGDSISNTRYYATATATAATAGTAGAAATATCIYNSRNITGTGCGGGGCACGTTTGGAACTACGAAGTTGACTACTGGCAGHVWNYEVDYWIGHV3-5*02IGHJ2*01
976GGGTTCTCACTCAATACTCGTGGAACGACTGFSLNTRGTTATTTATTGGGATGATGATAGCIYWDDDSTGTGCACACGGACGCCCAGACTGGGGAGCAGATGCTTTTGATGTCTGGCAHGRPDWGADAFDVWIGHV8-12*01IGHJ1*01
931GGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATCAGGAGATCCGGGCTCCATAATGATGCTTTTGATATATGGCARLPIRRSGLHNDAFDIWIGHV3-2*02IGHJ3*01
829GGCGGCTCCATCAGTAGCAGTAGTTACCACGGSISSSSYHATCTATTATAGTGGGAGCACGIYYSGSTTGTGCGAGTCGTCGAAATGAACCTGGAGGGTGGTTCGACTCCTGGCASRRNEPGGWFDSWIGHV3-1*01IGHJ3*01
753GGTGGGTCATTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTATCISHSGSITGTGCGAGACTTCCCATGAGGAGATCCGGGCTCCTTAATGATGCTTTTGATATCTGGCARLPMRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
705GGTTACACCTTTACTGATTATGCTGYTFTDYAATCAGCGTTTCCAATGGTAAAACAISVSNGKTTGTGCGAGAGCGTTTCAACCTCAAGTCTGGGTCGGGGAGTCTTATCTCGACTACTGGCARAFQPQVWVGESYLDYWIGHV1-77*01IGHJ2*01
666GGTGGGTCCTTTAGTGGTTACGACGGSFSGYDATCAGTCACAGTGGAAGTACCISHSGSTTGTGCGAGACTTCCCATCAGGAGATCCGGGCTCCATAATGATGCTTTTGATATATGGCARLPIRRSGLHNDAFDIWIGHV3-2*02IGHJ3*01
626NNNNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXNNNNNNNNNNNNNNNNNNNNNXXXXXXXTGTGCGCGAATTCCCATGAGGAGAACCGGGGTCAACGATGATGCCTTTGATATGTGGCARIPMRRTGVNDDAFDMWIGHV12-3*01IGHJ3*02
611GGGTTCTCATTCACTTCTAGTGGACGGGGTGFSFTSSGRGATTTATTGGGATGACGATAAGIYWDDDKTGTGCACACAGACCACCATATCAAGGGTACTACTACTTTGACTATTGGCAHRPPYQGYYYFDYWIGHV8-12*01IGHJ2*01
602GGGTTCTCATTCAATACTCTTGGAACGACTGFSFNTLGTTATTTATTGGGATGATGATAGCIYWDDDSTGTGCACACGGACGCCCAGACTGGGGAGAAGATGCTTTTGATGTCTGGCAHGRPDWGEDAFDVWIGHV8-12*01IGHJ1*01
572GGCTTCTCATTCAGTTCTAGTGGACTGGGTGFSFSSSGLGATTTATTGGGATGATGATAAGIYWDDDKTGTGCACACAGACCGCCATATCAACGGTATTACTACTTTGACTATTGGCAHRPPYQRYYYFDYWIGHV8-6*01IGHJ2*01
514GGATACACGTTCACCAATTATGCTGYTFTNYAATGGACCCGAATAGCGGCGACACAMDPNSGDTTGTGCGAGGACCAACTGGGCAGCCTACGGTGTCCCCGACTACTGGCARTNWAAYGVPDYWIGHV1-72*04IGHJ4*01
485NNNNNNNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXNNNNNNNNNNNNNNNNNNNNNXXXXXXXTGTGCGCGAATTCCCATGAGGAGAACCGGGGTCAACGATGATGCCTTTGATATGTGGCARIPMRRTGVNDDAFDMWIGHV12-3*02IGHJ3*02
459GGTGACTCCGTCAGCAATCATAAATACTACGDSVSNHKYYATCTATTCTGGTGGGAACACCIYSGGNTTGTGCGGGGCACAATTGGAATTACGAGGTTGACTACTGGCAGHNWNYEVDYWIGHV3-5*02IGHJ4*01
455GGATTCTCACTCACTACTCGTGGAGGGGGTGFSLTTRGGGCTTTATTGGGATGAAAAGACALYWDEKTTGTGCACACCGTATTGGTTCGGAGACATACTTCGACTACTGGCAHRIGSETYFDYWIGHV8-12*01IGHJ2*01

Shown is the partial result of one sample (ASC-S-vH).

If the reads collapsed on nucleotide sequence did not have replicates, they were deleted from subsequent analysis for the sake of sequence accuracy and result reliability.

Table 4.2.3 CDR enrichment by CDR amino acid sequence (heavy chain)

CountCDR1aaCDR2aaCDR3aaTop_V_GeneTop_J_Gene
11412GGSFSGFDISHTGTTCARIPMRRTGVNDDAFDMWIGHV3-2*02IGHJ3*02
3317GGSFSGYDISHSGSICARLPIRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
2444GFIFSTYWINEDGRITCVRRHPAPTGNIFDFWIGHV5-6-3*01IGHJ2*01
2037GDSISNTRYYIYNSGNICAGHVWNYEVDYWIGHV3-5*02IGHJ2*01
1634GFSLNTRGTTIYWDDDSCAHGRPDWGADAFDVWIGHV8-12*01IGHJ1*01
1472GVAITSFHIYHNGDTCARVDAIIEMDYFYGLDVWIGHV3-2*02IGHJ1*01
1347GDSISNTRYYIYNSRNICAGHVWNYEVDYWIGHV3-5*02IGHJ2*01
1169GGSFSGYDISHSGSICARLPMRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
965GGSFSGYDISHSGSICARLPIRRSGLHNDAFDIWIGHV3-2*02IGHJ3*01
836GGSISSSSYHIYYSGSTCASRRNEPGGWFDSWIGHV3-1*01IGHJ3*01
763GYTFTDYAISVSNGKTCARAFQPQVWVGESYLDYWIGHV1-77*01IGHJ2*01
692GGSFSGYDISHSGSTCARLPIRRSGLHNDAFDIWIGHV3-2*02IGHJ3*01
637XXXXXXXXXXXXXXXXCARIPMRRTGVNDDAFDMWIGHV12-3*01IGHJ3*02
616GFSFTSSGRGIYWDDDKCAHRPPYQGYYYFDYWIGHV8-12*01IGHJ2*01
609GFSLSTTGVGIYWDDDRCARQNSGYDWNSRCYDYWIGHV8-12*01IGHJ4*01
604GFSFNTLGTTIYWDDDSCAHGRPDWGEDAFDVWIGHV8-12*01IGHJ1*01
577GFSFSSSGLGIYWDDDKCAHRPPYQRYYYFDYWIGHV8-6*01IGHJ2*01
519GYTFTNYAMDPNSGDTCARTNWAAYGVPDYWIGHV1-72*04IGHJ4*01
509GFSLNTRGTTIYWDGDDCAHGRPDWGADAFDVWIGHV8-12*01IGHJ1*01
498XXXXXXXXXXXXXXXXCARIPMRRTGVNDDAFDMWIGHV12-3*02IGHJ3*02

Shown is the partial result of one sample (ASC-S-vH).

Table 4.2.4 CDR enrichment by CDR amino acid sequence (light chain)

CountCDR1aaCDR2aaCDR3aaTop_V_GeneTop_J_Gene
7248QGISRGYASCQQFNQYPITFIGKV11-125*01IGKJ4*01
4116QGIGSYAASCQQLSSYPLTFIGKV11-125*01IGKJ1*01
3135QGIGNEAASCLQHKNHVWTFIGKV11-125*01IGKJ1*01
2934QSISSGDASCQQYNNYPITFIGKV11-125*01IGKJ4*01
1919QAIGGGYASCQQFNAYPITFIGKV11-125*01IGKJ4*01
1473QSFGCCGASCHQRSGWPPFTFIGKV18-36*01IGKJ4*01
1374QSISSGDASCQQYNSYPITFIGKV11-125*01IGKJ4*01
1373QSFGCCGASCHQRSDWPPFTFIGKV18-36*01IGKJ4*01
1280QSVSSNDASCQQYNNWPLWTFIGKV5-45*01IGKJ1*01
804QSVLHSSNNKNYWSSCQQHYIIPWTFIGKV8-30*01IGKJ1*01
788QSISSGNASCQQYNVYPITFIGKV11-125*01IGKJ4*01
735QSVSSNGASCQQYNNWPLWTFIGKV5-45*01IGKJ1*01
593QGINTADASCQHFNSFPLAFIGKV11-125*01IGKJ1*01
579QSVSSNYGASCQQYGTSPWTFIGKV18-36*01IGKJ1*01
578ESVSSYDASCQQRSGWPWTFIGKV18-36*01IGKJ1*01
539QSFGCCGASCHQRSSWPPFTFIGKV18-36*01IGKJ4*01
535QDIGSSGASCQQLKSYPINFIGKV11-125*01IGKJ1*01
449RGVGSNGASCQQYDDWPPWTFIGKV18-36*01IGKJ1*01
448QSISSGDASCQQYNAYPITFIGKV11-125*01IGKJ4*01
419QSVSSNGASCQQYDNWPLWTFIGKV5-45*01IGKJ1*01

Shown is the partial result of one sample (ASC-S-vL).

Table 4.2.5 CDR enrichment by CDR3 amino acid sequence (heavy chain)

CountCDR3aaTop_V_GeneTop_J_Gene
11978CARIPMRRTGVNDDAFDMWIGHV3-2*02IGHJ3*02
3733CARLPIRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
3662CAGHVWNYEVDYWIGHV3-5*02IGHJ2*01
2971CVRRHPAPTGNIFDFWIGHV5-6-3*01IGHJ2*01
2778CAHGRPDWGADAFDVWIGHV8-12*01IGHJ1*01
2213CAHGRPDWGEDAFDVWIGHV8-12*01IGHJ1*01
1877CARLPIRRSGLHNDAFDIWIGHV3-2*02IGHJ3*01
1842CARVDAIIEMDYFYGLDVWIGHV3-2*02IGHJ1*01
1527CAGHNWNYEVDYWIGHV3-5*02IGHJ4*01
1403CARDGKRTYSYDRGEDYWIGHV5-17*01IGHJ4*01
1298CARLPMRRSGLLNDAFDIWIGHV3-2*02IGHJ3*01
1228CARIPMRRTGVNDDAFDMWIGHV12-3*02IGHJ3*02
950CARQNSGYDWNSRCYDYWIGHV8-12*01IGHJ4*01
855CASRRNEPGGWFDSWIGHV3-1*01IGHJ3*01
827CARAFQPQVWVGESYLDYWIGHV1-77*01IGHJ2*01
655CARCRGDSNYGWYDPWIGHV3-5*02IGHJ3*01
645CARIPMRRTGVNDDAFDMWIGHV12-3*01IGHJ3*02
638CAHRPPYQGYYYFDYWIGHV8-12*01IGHJ2*01
603CAHRPPYQRYYYFDYWIGHV8-6*01IGHJ2*01
577CATGPTMVMLDYWIGHV1-64*01IGHJ2*01

Shown is the partial result of one sample (ASC-S-vH).

Table 4.2.6 CDR enrichment by CDR3 amino acid sequence (light chain)

CountCDR3aaTop_V_GeneTop_J_Gene
8864CQQFNQYPITFIGKV11-125*01IGKJ4*01
5516CQQLSSYPLTFIGKV11-125*01IGKJ1*01
4171CQQYNNWPLWTFIGKV5-45*01IGKJ1*01
3892CLQHKNHVWTFIGKV11-125*01IGKJ1*01
3741CQQYNNYPITFIGKV11-125*01IGKJ4*01
2536CQQFNAYPITFIGKV11-125*01IGKJ4*01
2162CHQRSDWPPFTFIGKV18-36*01IGKJ4*01
2079CHQRSGWPPFTFIGKV18-36*01IGKJ4*01
1744CQQHYIIPWTFIGKV8-30*01IGKJ1*01
1737CQQRSGWPWTFIGKV18-36*01IGKJ1*01
1697CQQYNSYPITFIGKV11-125*01IGKJ4*01
1668CLQHYNHVWTFIGKV11-125*01IGKJ1*01
1022CQQYNVYPITFIGKV11-125*01IGKJ4*01
996CQQYGTSPWTFIGKV18-36*01IGKJ1*01
987CQQYNNWPLWTFIGKV18-36*01IGKJ1*01
780CHQRSSWPPFTFIGKV18-36*01IGKJ4*01
722CQQLKSYPINFIGKV11-125*01IGKJ1*01
690CQQYDNWPLWTFIGKV5-45*01IGKJ1*01
686CQHFNSFPLAFIGKV11-125*01IGKJ1*01
683CQQYNNYWTFIGKV18-36*01IGKJ1*01

Shown is the partial result of one sample (ASC-S-vL).

 

4.3 CDR3 Length Distribution

CDR3 amino acid sequences were further extracted and the length distribution was plotted as in Figure 4.3.

CDR3_len_ASC-S-vH.png
  • CDR3_len_ASC-S-vH.png
  • CDR3_len_ASC-S-vL.png
  • CDR3_len_PBMC-0-S-vH.png
  • CDR3_len_PBMC-0-S-vL.png
  • CDR3_len_PBMC-7-S-vH.png
  • CDR3_len_PBMC-7-S-vL.png


Figure 4.3 CDR3 amino acid sequence length distribution.

 

4.4 Top V(D)J Analysis

V(D)J usage of sequences were analyzed. Sequences were collapsed on V(D)J combination. The proportions of the top ten combinations in the heavy and light chain populations are shown in Figure 4.4.1 and Figure 4.4.2, respectively.

VDJ_comp_ASC-S-vH.png
  • VDJ_comp_ASC-S-vH.png
  • VDJ_comp_PBMC-0-S-vH.png
  • VDJ_comp_PBMC-7-S-vH.png


Figure 4.4.1 Ig top VDJ combination usage for heavy chains.

VJ_comp_ASC-S-vL.png
  • VJ_comp_ASC-S-vL.png
  • VJ_comp_PBMC-0-S-vL.png
  • VJ_comp_PBMC-7-S-vL.png


Figure 4.4.2 Ig top VJ combination usage for light chains.

 

4.5 V Gene Usage Analysis

To assess germline gene usage, V gene annotation information was extracted for all the sequences and the total count of each germline gene was calculated and summarized in Tables 4.5.1 and 4.5.2.

Table 4.5.1 Heavy chain V gene usage

Top_V_GeneASC-S-vHPBMC-0-S-vHPBMC-7-S-vH
IGHV1263853392028208
IGHV10141818681583
IGHV10S316107
IGHV10S41NA1
IGHV12385825483551
IGHV1331420
IGHV1491917381189
IGHV14S42NANA
IGHV1521NANA
IGHV16612386406
IGHV1S12577258322
IGHV1S126647523317
IGHV1S127510436250
IGHV1S13034NA
IGHV1S13210154
IGHV1S134381568
IGHV1S13641328
IGHV1S137119
IGHV1S142NA1
IGHV1S155118109
IGHV1S171115
IGHV1S187258141
IGHV1S19412103
IGHV1S2220192
IGHV1S26697270322
IGHV1S2821216
IGHV1S2914266
IGHV1S3441412
IGHV1S4519025120
IGHV1S464081032600
IGHV1S5272
IGHV1S50318358
IGHV1S55726
IGHV1S56178146635
IGHV1S61485341
IGHV1S811115524
IGHV2422121
IGHV3616754018756460
IGHV3S188002198415880
IGHV3S716212
IGHV422114
IGHV5359953522036499
IGHV5S123177103
IGHV5S21266166113
IGHV5S2418111
IGHV5S459944
IGHV5S9128
IGHV6455333428
IGHV7148113331406
IGHV8265393052022482
IGHV8S22896
IGHV9363744
IGHV1S16NA1NA
IGHV1S47NA1NA
IGHV2S3NA1NA


Table 4.5.2 Light chain V gene usage

Top_V_GeneASC-S-vLPBMC-0-S-vLPBMC-7-S-vL
IGKV1742049245979
IGKV10506309351
IGKV11495163639041839
IGKV12425677435958
IGKV131155385
IGKV14327128245
IGKV15126111001443
IGKV16377521431970
IGKV17232
IGKV18318774272938078
IGKV191274324
IGKV2115917141566
IGKV3623757
IGKV41001832
IGKV51003387028996
IGKV6483744597
IGKV8806681318794
IGKV9852734
IGLV1NA2NA


The distribution of V gene usage is also illustrated in bar graphs (Figure 4.5.1 and Figure 4.5.2).

v_gene_stat_ASC-S-vH.png
  • v_gene_stat_ASC-S-vH.png
  • v_gene_stat_PBMC-0-S-vH.png
  • v_gene_stat_PBMC-7-S-vH.png


Figure 4.5.1 V gene usage of heavy chains.

v_gene_stat_ASC-S-vL.png
  • v_gene_stat_ASC-S-vL.png
  • v_gene_stat_PBMC-0-S-vL.png
  • v_gene_stat_PBMC-7-S-vL.png


Figure 4.5.2 V gene usage of light chains.

 

4.6 Clustering Analysis

To analyze CDR3 amino acid usage frequency, CDR3 sequences were clustered according to similarity (threshold: 0.8). The summary of top 20 clusters of each sample is as in Table 4.6. The complete results for all the samples are in the 'Report\Clustering_summary_tables' folder and can be accessed using the link below:

Clustering tables   Clustering tables



The logo of the most representative sequence of each cluster [6] is in the 'Report\Clustering_weblogo' folder and can be viewed using the links below:

Weblogo   Weblogo




Table 4.6 Top 20 CDR3 clusters and analysis summary

Cluster_IDTotal_cntFreq_total_cntUnique_cntFreq_unique_cntCDR3_seq
Cluster_0146788.5182891.3035Cluster_0
Cluster_1101475.88871261.8453Cluster_1
Cluster_269394.027650.952Cluster_2
Cluster_366413.854570.8348Cluster_3
Cluster_439642.3005280.4101Cluster_4
Cluster_533181.9256580.8494Cluster_5
Cluster_625741.4938350.5126Cluster_6
Cluster_720071.1647280.4101Cluster_7
Cluster_817371.008260.3808Cluster_8

Shown is the partial result of one sample (ASC-S-vH). Cluster_ID: the identification number of the cluster; Total_cnt: the total clonal count of CDR3 sequences in the cluster; Freq_total_cnt: the sequence count percentage of the entire read count of the sample; Unique_cnt: the count of unique CDR3 sequences in the cluster; Freq_total_cnt: the unique sequence count percentage of the total unique count of the sample; CDR3_seq: the most abundant CDR3 amino acid sequence; CDR3_length: the length of the most abundant CDR3 amino acid sequence; weblogo of the representative sequence of the cluster can be found in ‘Top20_clusters/weblogo’ directory.

 

4.7 Diversity Analysis

The clonal abundance distribution was calculated with confidence intervals derived via bootstrapping. The clonal diversity of the repertoire was accessed using diversity index [7]. Abundance curves are in figure 4.7.1 and diversity curves are in Figure 4.7.2.



Figure 4.7.1 Abundance curves.


Figure 4.7.2 Diversity curves.

 

5. References

[1] Murphy K. Janeway's Immunobiology. New York: Garland Science (2012).

[2] Lefranc, MP. et al., The international ImMunoGeneTics database. Nucleic Acids Res. (1999) 27 (1): 209-212.

[3] Bolger, AM. et al., Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. (2014) 30(15): 2114–2120.

[4] Andrews S. et al., FastQC: a quality control tool for high throughput sequence data. (2010) Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

[5] Ye J. et al., IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. (2013) 41: W34–W40.

[6] Crooks GE. et al., WebLogo: a sequence logo generator. Genome Res. (2004) 14(6):1188-90.

[7] Hill, M. et al., Diversity and evenness: a unifying notation and its consequences. Ecology (1973) 54:427-432.