merge_feature¶
Purpose¶
Merges RNA, spatial, read, and mappability feature tables into one integrated feature matrix and computes filtration tags.
Upstream¶
spatial_featuremappability_featureread_featureRNA_feature
Required inputs¶
RNA_featurespatial_featureread_featuremappability_feature
Input interpretation¶
| Input key | Source step | Required | Interpretation |
|---|---|---|---|
RNA_feature |
RNA_feature |
Yes | RNA-level annotation feature table. |
spatial_feature / spatial_feature_results |
spatial_feature |
Yes | Spatial feature table or chunk manifest to be merged. |
read_feature / read_feature_results |
read_feature |
Yes | Read-level feature table or chunk manifest to be merged. |
mappability_feature |
mappability_feature |
Yes | Mappability/confounder feature table. |
phasing_result |
phasing |
Recommended | Phasing summary that enriches merged features before filtration. |
cluster_event_result |
phasing |
Recommended | Cluster-event flags used by filtration grouping (CLUSTER_EVENTS). |
Parameters (steps.feature_filtration)¶
This step uses group-based filtration switches. If the section is omitted, the default enabled-group set in code is applied.
| Group key | Interpretation |
|---|---|
ASE |
Filter loci flagged by allele-specific expression evidence. |
hFDR |
Filter loci with high hFDR-style artifact risk. |
imprinted |
Filter loci in imprinted-gene contexts. |
homopolymer |
Filter loci affected by homopolymer-related artifacts. |
PON |
Filter loci present in panel-of-normals evidence. |
RNA_editing |
Filter loci overlapping known RNA-editing contexts. |
ABNORMAL_MISMATCHES |
Filter loci with high/bias mismatch patterns. |
LOW_READ_DIVERSITY |
Filter loci with poor alternative-read diversity. |
HIGH_MULTIPLE_MAPPIN |
Filter loci with high multiple-mapping signal. |
WIDE_DISTRIBUTION |
Filter loci with broad suspicious mutant-probability distribution. |
NEAR_READ_END |
Filter loci dominated by near-read-end signal. |
CLUSTER_EVENTS |
Filter loci associated with cluster-like event patterns. |
LOW_MAPQ |
Filter low mapping-quality loci. |
LOW_BASEQ |
Filter low base-quality loci. |
Tuning notes¶
- Start with defaults for production runs.
- Disable only specific groups when investigating why candidates were filtered.
- Document any filtration-group changes to preserve run comparability.
Outputs¶
combine_feature:output_dir/all_feature.txt- parquet mirror:
output_dir/all_feature.parquet
Tuning notes¶
- Multi-index keys are
#chrom,pos,ref,alt. - Filtration summary is consolidated into the
Filtrationcolumn (PASSor semicolon-separated tags).