`umi_combine`¶

Purpose¶

Aggregates read-level evidence into UMI-level counts across candidate loci.

Upstream¶

mpileup

Required inputs¶

in_filter_bam
filter_mpileup_file
db_path (chunk database from mpileup)
sequence_type

Input interpretation¶

Input key	Source	Required	Interpretation
`in_filter_bam`	`bam_processing` output	Yes	Filtered BAM used to aggregate read evidence into UMI-level counts.
`filter_mpileup_file`	`mpileup` output	Yes	Candidate loci table that defines UMI aggregation targets.
`db_path` / chunk manifest	`mpileup` output	Yes	Chunk metadata controlling parallel chunk processing.
`sequence_type`	top-level config	Yes	Controls mode-specific handling during evidence aggregation.

Parameters¶

Most controls are currently internal defaults in the step implementation (for example chunk-level parallel workers and buffer sizes), while threads are inherited from run.threads.

Parameter interpretation¶

Parameter area	Interpretation
`run.threads` / runtime parallel settings	Controls chunk-level parallel processing throughput.
internal filtering flags	Duplicate/secondary/QC-fail/supplementary filtering behavior is defined in implementation defaults unless exposed in your version config.

Outputs¶

spot_count_file: output_dir/umi_combine/spot.count.parquet
error_count_file: output_dir/umi_combine/error.count.parquet

Tuning notes¶

This step is parallelized by chunk files loaded from split_chunk.db.
Outputs are parquet and used by both genotyping and RNA_feature.

umi_combine¶