Invalid memory access in BCFtools 1.9
Loginsoft-2018-1004
August 18, 2018
CWE
CWE-476: NULL Pointer Dereference
Product Details
BCFtools is a program for variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. In order to avoid tedious repetion, throughout this document we will use “VCF” and “BCF” interchangeably, unless specifically noted.
Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF work in all situations. Unindexed VCF and BCF and streams work in most, but not all situations. In general, whenever multiple VCFs are read simultaneously, they must be indexed and therefore also compressed.
URL: https://samtools.github.io/bcftools
Vulnerable Versions
bcftools 1.9
Vulnerability Details
An Invalid memory access was discovered in bcftools 1.9 versions.
SYNOPSIS
Two issue were addressed while parsing in a broken bcf file as an input, both being an Invalid memory access issue.
1. Issue in main_vcfcall()
``` int main_vcfcall(int argc, char *argv[]) { char *ploidy_fname = NULL, *ploidy = NULL; args_t args; . . . if ( (args.flag & CF_INDEL_ONLY) && !is_indel ) continue; if ( (args.flag & CF_NO_INDEL) && is_indel ) continue; if ( (args.flag & CF_ACGT_ONLY) && (bcf_rec->d.allele[0][0]=='N' || bcf_rec->d.allele[0][0]=='n') ) [1] continue; // REF[0] is 'N' ```
BCFtools while parsing a supplied bcf file, main_vcfcall() in vcf_call.cpp is called . It incorrectly handles a broken bcf file, resulting in populating NULL values inside the bcf record struct `bcf_rec`. Later the code, we have an if statement, which tries to access the member of s structure of type char** for comparison operation [1], causing an segmentation fault, as the value contained is 0, creating a NULL dereference issue.
2. Issue in bcf_seqname()
The function main_vcfcall() calls the set_ploidy(), internally calling an inline function bcf_seqname() located in the header file vcfcall.h. In bcf_seqname() , while returning the value, it tries to access the members of the strcuture`hdr`.
``` static inline const char *bcf_seqname(const bcf_hdr_t *hdr, bcf1_t *rec) { return hdr->id[BCF_DT_CTG][rec->rid].key; } ```
`hdr` is a struct, accessing its member id of index value BCF_DT_CTG (hardcoded as 1)
`rec` again being a struct trying to access its member rid (1)
`key` being a const character pointer, member of hdr.id struct.
While accessing the structure member `key`, which is a character pointer is having an invalid memory address, possibly due to heap overflow giving away a segmentation fault signal.
Fix: As both the issue were the result of a broken bcf file, as a part of fix, a bound check has been added in vcfcall.c to check the correctness of the provided bcf file as input before parsing the bcf file.
``` + if ( args.aux.srs->errnum || bcf_rec->errcode ) error("Error: could not parse the input VCF\n"); if ( args.samples_map ) bcf_subset(args.aux.hdr, bcf_rec, args.nsamples, args.samples_map); ```
Commit: f9ab25129be77da536e03486327b9832c4bd6778
Analysis
gef➤ i r rax 0x60f00000ee60 0x60f00000ee60 rbx 0x100 0x100 rcx 0x0 0x0 rdx 0x611000009780 0x611000009780 rsi 0x611000009780 0x611000009780 rdi 0x60f00000ee60 0x60f00000ee60 rbp 0x7fffffffd030 0x7fffffffd030 rsp 0x7fffffffd020 0x7fffffffd020 r8 0x0 0x0 r9 0x6110000098c0 0x6110000098c0 r10 0x8 0x8 r11 0x611000009780 0x611000009780 r12 0x60200000e8d0 0x60200000e8d0 r13 0xffffffffa18 0xffffffffa18 r14 0x7fffffffd0c0 0x7fffffffd0c0 r15 0x0 0x0 rip 0x53dae0 0x53dae0 <bcf_seqname+16> eflags 0x202 [ IF ] cs 0x33 0x33 ss 0x2b 0x2b ds 0x0 0x0 es 0x0 0x0 fs 0x0 0x0 gs 0x0 0x0 0x53dad4 <bcf_seqname+4> sub rsp, 0x10 0x53dad8 <bcf_seqname+8> mov QWORD PTR [rbp-0x8], rdi 0x53dadc <bcf_seqname+12> mov QWORD PTR [rbp-0x10], rsi → 0x53dae0 <bcf_seqname+16> mov rax, QWORD PTR [rbp-0x8] 0x53dae4 <bcf_seqname+20> add rax, 0x18 0x53dae8 <bcf_seqname+24> mov rdx, rax 0x53daeb <bcf_seqname+27> shr rdx, 0x3 0x53daef <bcf_seqname+31> add rdx, 0x7fff8000 0x53daf6 <bcf_seqname+38> movzx edx, BYTE PTR [rdx]
gef➤ p hdr $9 = (const bcf_hdr_t *) 0x60f00000ee60 gef➤ x/d 0x60f00000ee60 0x60f00000ee60: 11 gef➤ p hdr->id $10 = {0x611000009c80, 0x60200000e330, 0x60600000ed80} gef➤ x/d 0x611000009c80 0x611000009c80: 59120 gef➤ p 0x60200000e330 $11 = 0x60200000e330 gef➤ x/d 0x60200000e330 0x60200000e330: 58224 gef➤ x/d 0x60600000ed80 0x60600000ed80: 58096 gef➤ p hdr->id[1] $12 = (bcf_idpair_t *) 0x60200000e330 gef➤ x/d 0x60200000e330 0x60200000e330: 58224 gef➤ ptype hdr->id[1][rec] type = struct { const char *key; const bcf_idinfo_t *val; } gef➤ p hdr->id[1][rec->rid] $13 = { key = 0x2ffffff00000002 <error: Cannot access memory at address 0x2ffffff00000002>, val = 0x2c00000300000004 } gef➤ p hdr->id[1][rec.rid].key $22 = 0x2ffffff00000002 <error: Cannot access memory at address 0x2ffffff00000002>
Backtrace
gef➤ bt #0 bcf_seqname (hdr=0x60f00000ee60, rec=0x611000009780) at htslib-develop/htslib/vcf.h:757 #1 0x00000000005452b8 in set_ploidy (args=0x7fffffffd120, rec=0x611000009780) at vcfcall.c:550 #2 0x0000000000547d57 in main_vcfcall (argc=0x3, argv=0x7fffffffde10) at vcfcall.c:839 #3 0x0000000000411762 in main (argc=0x4, argv=0x7fffffffde08) at main.c:278
Proof of concept
bcftools call -c $POC
`call` is used for performing SNP/indel calling. SNP / Indel calling is one the most frequently performed type of next generation sequencing analysis.
Timeline
Vendor Disclosure: 2018-08-16
Patch Release: 2018-08-17
Public Disclosure: 2018-08-18
Credit
Discovered by ACE Team – Loginsoft