Skip to content

Preprocessing

The preprocessing module ensures that sequencing data is of high quality and free from technical artifacts before alignment.

Workflow

graph LR
    A[Raw FASTQ] --> B[FastQC Raw]
    B --> C[Adaptive Trimming]
    C --> D[FastQC Trimmed]
    D --> E[Processed FASTQ]

Quality Control (FastQC)

The pipeline automatically executes FastQC on every raw fastq file.

  • Results are stored in qc/fastqc-raw/.
  • Key metrics (Sequence Quality, Adapter Content) are used to drive the downstream trimming process.

Trimming (Trimmomatic)

Trimmomatic is used to remove adapters and low-quality bases.

Adaptive Trimming

By default, 3t-seq uses its Adaptive Trimming algorithm. This dynamically selects the best adapter file and quality thresholds based on the specific library's properties.

Manual Configuration

You can provide fixed parameters for specific libraries in the config.yaml:

sequencing_libraries:
  - name: Sample1
    trimmomatic:
      adaptive: false
      extra_params: "ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36"

Parameters & Defaults

Parameter Default Description
trimmomatic.adaptive false Enable/disable Adaptive Trimming.
trimmomatic.extra_params - Fixed Trimmomatic modules (e.g., CROP:100).

Results

Location Description
results/trim/ Contains the trimmed FASTQ files.
results/qc/fastqc-raw/ FastQC reports for the raw input data.
results/qc/fastqc-trimmed/ FastQC reports for the data after trimming.
results/trim/_shared/ Deduplicated shared trimmed files (internal).