Microbial genomes have recently been reconstructed from metagenomic dataset using binning approaches. Inconsistent binning results are however often observed between different binning programs, likely due to the different algorithm or statistical models used. We present Binning_refiner, a pipeline that merges the results of different binning programs.
Binning_refiner’s performance was first assessed on the MBARC-26 mock dataset, which consists of shotgun sequences for a defined mixture of 23 bacterial and 3 archaeal strains with publicly available complete genomes. Binning programs MetaBAT and MyCC were used to provide the input bins for Binning_refiner with default parameters. Our results showed that the precision (defined as how pure a bin is) was increased from 87% (MetaBAT) and 91% (MyCC) to 96% (Binning_refiner), while the recall (defined as how complete a bin is) was decreased from 91% (MetaBAT) and 94% (MyCC) to 88% (Binning_refiner). Binning_refiner was also tested on metagenomic sequence data for three replicate bacterial communities from the surface of the marine alga Caulerpa filiformis. Our results showed that the contamination level of refined bins was significantly reduced when compared to the bins from MetaBAT (p = 0.026) and MyCC (p = 0.004). The total length of the refined contamination-free bins is 39.2 Mbp, which is 1.8 times higher than that of the MetaBAT bins (22.0 Mbp) and 3.4 times higher than that of the MyCC bins (11.5 Mbp).
Our results demonstrated that Binning_refiner can significantly reduce the contamination level of genome bins and increase the contamination-free total genome size, it is thus an useful tool to improve the quality of genome bins derived from metagenomic data. Binning_refiner is implemented in Python3 and is freely available at: https://github.com/songweizhi/Binning_refiner.