Invalid memory access in BCFtools 1.9

Invalid memory access in BCFtools 1.9

Loginsoft-2018-1004

August 18, 2018

CWE

CWE-476: NULL Pointer Dereference

Product Details

BCFtools is a program for variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. In order to avoid tedious repetion, throughout this document we will use “VCF” and “BCF” interchangeably, unless specifically noted.

Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF work in all situations. Unindexed VCF and BCF and streams work in most, but not all situations. In general, whenever multiple VCFs are read simultaneously, they must be indexed and therefore also compressed.

URL: https://samtools.github.io/bcftools

Vulnerable Versions

bcftools 1.9

Vulnerability Details

An Invalid memory access was discovered in bcftools 1.9 versions.

SYNOPSIS

Two issue were addressed while parsing in a broken bcf file as an input, both being an Invalid memory access issue.

1. Issue in main_vcfcall()

``` 
int main_vcfcall(int argc, char *argv[]) 
{ 
char *ploidy_fname = NULL, *ploidy = NULL; 
args_t args; 
. 
. 
. 
if ( (args.flag & CF_INDEL_ONLY) && !is_indel ) continue; 
if ( (args.flag & CF_NO_INDEL) && is_indel ) continue; 
if ( (args.flag & CF_ACGT_ONLY) && (bcf_rec->d.allele[0][0]=='N' || bcf_rec->d.allele[0][0]=='n') ) [1] continue; // REF[0] is 'N' 
``` 

BCFtools while parsing a supplied bcf file, main_vcfcall() in vcf_call.cpp is called . It incorrectly handles a broken bcf file, resulting in populating NULL values inside the bcf record struct `bcf_rec`.  Later the code, we have an if statement, which tries to access the member of s structure of type char** for comparison operation [1], causing an segmentation fault, as the value contained is 0, creating a NULL dereference issue.

2. Issue in bcf_seqname()

The function main_vcfcall() calls the set_ploidy(), internally calling an inline function bcf_seqname() located in the header file vcfcall.h. In bcf_seqname() , while returning the value, it tries to access the members of the strcuture`hdr`.

```
static inline const char *bcf_seqname(const bcf_hdr_t *hdr, bcf1_t *rec)
{
return hdr->id[BCF_DT_CTG][rec->rid].key;
}
```

`hdr` is a struct, accessing its member id of index value BCF_DT_CTG (hardcoded as 1)
`rec` again being a struct trying to access its member rid (1)
`key` being a const character pointer, member of hdr.id struct.

While accessing the structure member `key`, which is a character pointer is having an invalid memory address, possibly due to heap overflow giving away a segmentation fault signal.

Fix: As both the issue were the result of a broken bcf file, as a part of fix, a bound check has been added in vcfcall.c to check the correctness of the provided bcf file as input before parsing the bcf file.

```
+ if ( args.aux.srs->errnum || bcf_rec->errcode ) error("Error: could not parse the input VCF\n");
if ( args.samples_map ) bcf_subset(args.aux.hdr, bcf_rec, args.nsamples, args.samples_map);
```

Commit: f9ab25129be77da536e03486327b9832c4bd6778

Analysis
gef➤ i r
rax 0x60f00000ee60 0x60f00000ee60
rbx 0x100 0x100
rcx 0x0 0x0
rdx 0x611000009780 0x611000009780
rsi 0x611000009780 0x611000009780
rdi 0x60f00000ee60 0x60f00000ee60
rbp 0x7fffffffd030 0x7fffffffd030
rsp 0x7fffffffd020 0x7fffffffd020
r8 0x0 0x0
r9 0x6110000098c0 0x6110000098c0
r10 0x8 0x8
r11 0x611000009780 0x611000009780
r12 0x60200000e8d0 0x60200000e8d0
r13 0xffffffffa18 0xffffffffa18
r14 0x7fffffffd0c0 0x7fffffffd0c0
r15 0x0 0x0
rip 0x53dae0 0x53dae0 <bcf_seqname+16>
eflags 0x202 [ IF ]
cs 0x33 0x33
ss 0x2b 0x2b
ds 0x0 0x0
es 0x0 0x0
fs 0x0 0x0
gs 0x0 0x0


0x53dad4 <bcf_seqname+4>  sub    rsp, 0x10
     0x53dad8 <bcf_seqname+8>  mov    QWORD PTR [rbp-0x8], rdi
     0x53dadc <bcf_seqname+12> mov    QWORD PTR [rbp-0x10], rsi
 →   0x53dae0 <bcf_seqname+16> mov    rax, QWORD PTR [rbp-0x8]
     0x53dae4 <bcf_seqname+20> add    rax, 0x18
     0x53dae8 <bcf_seqname+24> mov    rdx, rax
     0x53daeb <bcf_seqname+27> shr    rdx, 0x3
     0x53daef <bcf_seqname+31> add    rdx, 0x7fff8000
     0x53daf6 <bcf_seqname+38> movzx  edx, BYTE PTR [rdx]


gef➤  p hdr
$9 = (const bcf_hdr_t *) 0x60f00000ee60
gef➤  x/d 0x60f00000ee60
0x60f00000ee60:	11
gef➤  p hdr->id
$10 = {0x611000009c80, 0x60200000e330, 0x60600000ed80}
gef➤  x/d 0x611000009c80
0x611000009c80:	59120
gef➤  p 0x60200000e330
$11 = 0x60200000e330
gef➤  x/d 0x60200000e330
0x60200000e330:	58224
gef➤  x/d 0x60600000ed80
0x60600000ed80:	58096
gef➤  p hdr->id[1]
$12 = (bcf_idpair_t *) 0x60200000e330
gef➤  x/d 0x60200000e330
0x60200000e330:	58224
gef➤  ptype hdr->id[1][rec]
type = struct {
    const char *key;
    const bcf_idinfo_t *val;
}
gef➤  p hdr->id[1][rec->rid]
$13 = {
  key = 0x2ffffff00000002 <error: Cannot access memory at address 0x2ffffff00000002>, 
  val = 0x2c00000300000004
}
gef➤  p hdr->id[1][rec.rid].key
$22 = 0x2ffffff00000002 <error: Cannot access memory at address 0x2ffffff00000002>


Backtrace
gef➤ bt
#0 bcf_seqname (hdr=0x60f00000ee60, rec=0x611000009780) at htslib-develop/htslib/vcf.h:757
#1 0x00000000005452b8 in set_ploidy (args=0x7fffffffd120, rec=0x611000009780) at vcfcall.c:550
#2 0x0000000000547d57 in main_vcfcall (argc=0x3, argv=0x7fffffffde10) at vcfcall.c:839
#3 0x0000000000411762 in main (argc=0x4, argv=0x7fffffffde08) at main.c:278



Proof of concept

bcftools call -c $POC

`call` is used for performing SNP/indel calling. SNP / Indel calling is one the most frequently performed type of next generation sequencing analysis.

 

Timeline

Vendor Disclosure: 2018-08-16

Patch Release: 2018-08-17

Public Disclosure: 2018-08-18

 

Credit

Discovered by ACE Team – Loginsoft