Quick Start

After successfully installing BIT and configuring the reference ChIP-seq database, you are ready to use BIT for your analyses. The process is straightforward if you have any of the following file types indicating the chromosomal start and end positions of your epigenomic regions: .bed, .narrowPeak, .broadPeak, .bigNarrowPeak (.bb), or .csv.

Assuming your input file is located at “file_path/file_name” (acceptable formats: .bed, .bb, etc.), and you want to store the output in “output_path”, the command below will start a comprehensive comparison of your input regions against all available ChIP-seq reference datasets. It then initiates the BIT model using the Gibbs sampler to provide the Bayesian inference of the TR-level BIT score. The output data from each iteration will be saved as an R data object named “file_name.rds”.

> BIT(, "file_path/file_name.bed", "output_path/", show = TRUE, plot_bar = TRUE, N = 5000, bin_width = 1000, genome = "hg38")
Loading and mapping peaks to bins...
Done loading.
Comparing the input regions with the pre-compiled reference ChIP-seq data, using a bin width of 1000 bps...
Loading meta table...
Starting alignment process...
  |==================================================| 100%
Alignment complete.
Starting BIT Gibbs sampler with 5000 iterations...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Gibbs sampling completed.
Output data saved as output_path/file_name.rds
Loading data from file...
Processing theta matrix and TR names...
Compiling results...
Results saved to output_path/file_name_rank_table.csv
BIT process completed.

There are several default parameters you can change:

Note

  • show: TRUE / FALSE. Whether to display the ranking table. Default: TRUE.

  • plot.bar: TRUE / FALSE. Whether to plot the top 10 TRs BIT scores in a horizontal bar plot. Default: TRUE.

  • format: One of “bed”, “narrowPeak”, “broadPeak”, “bigNarrowPeak”, “csv”, or NULL. Specifies the format of the input file. Default: NULL. If set to NULL, BIT will automatically determine the file format based on its extension.

  • N: Integer. The number of iterations in the Gibbs sampler. Default: 5000.

  • bin_width: Integer. The width of the bin used to divide the chromatin into non-overlapping bins. Default: 1000. Only change this if you compile a different reference database.

  • burnin: NULL or an integer. Specifies the burn-in period when deriving the Bayesian inference for TR-level parameters. Default: NULL, which will use N/2.

  • genome: hg38 or mm10 for TR ChIP-seq data collected from different genome.

We can check the results:

> Results_Table <- read.csv("output_path/file_name_rank_table.csv")
> head(Results_Table,10)
         TR   Theta_i     lower     upper  BIT_score BIT_score_lower BIT_score_upper Rank
1      CTCF -2.010571 -2.011593 -2.009676 0.11809745      0.11799114      0.11819079    1
2     RAD21 -2.028610 -2.031747 -2.025619 0.11623164      0.11590978      0.11653925    2
3      SMC3 -2.110100 -2.120542 -2.100907 0.10811898      0.10711622      0.10900866    3
4     SMC1A -2.181305 -2.193447 -2.170246 0.10144192      0.10034048      0.10245443    4
5      PHF2 -2.222166 -2.633869 -1.990377 0.09777755      0.06699025      0.12021697    5
6    GABPB1 -2.301673 -2.689461 -2.062208 0.09098450      0.06359813      0.11282466    6
7      CHD2 -2.317842 -2.370804 -2.270597 0.08965600      0.08542628      0.09358759    7
8      NFYA -2.353494 -2.396030 -2.313238 0.08678843      0.08347594      0.09003250    8
9    NELFCD -2.371794 -2.476460 -2.276018 0.08534898      0.07752502      0.09312872    9
10     NFYC -2.373795 -2.767204 -2.122314 0.08519290      0.05912233      0.10694685   10