万字长文教你入门biocondutor基石,小白都能看懂~

万字长文教你入门biocondutor基石,小白都能看懂~

探索生信之美,解构每一段代码的故事

大家好啊,我是风间琉璃。上周给大家介绍了基因组学可视化的基石——IRanges对象(小白都能学会的基因组可视化,不来看看吗?)。不知道大家学习得怎么样了呀?今天我将给大家另外一类基于IRanges对象更加贴近基因组数据分析的GRanges对象。其所需要使用的包——“GenomicRanges”是Bioconductor项目中代表基因位置信息的基础。在Biconductor包的层次中,是基于IRanges包构建的。而GenomicRanges则是全基因组序列处理包——“BSgenome”、转录本相关分析包——“GenimicFeatures”的基础;同时作为sam文件转化包——“Rsamtools”,Fastq文件读取包——“ShortRead”,GTF/bed文件读取包——“rtracklayer”的输出对象格式;并且也是基因注释包——“GenomicAlignments”以及变异位点注释包——“VariantAnnotation”所使用的对象。

总而言之,IRanges以及GRanges对象就是biconductor基因组数据处理的基石。我们的专题是“基因组学可视化”也就绕不开IRanges以及GRanges对象的处理。我希望大家不要害怕这些新东西,其实熟悉之后IRanges和GRanges对象就类似于我们熟知的matrix、dataframe、list结构一样。所以不要害怕,我们就直接开始吧~ 首先我们先加载包。

〇、加载包

ling#  if (!require("BiocManager"))#      install.packages("BiocManager")#  BiocManager::install("GenomicRanges")library(GenomicRanges)

一、构建GRanges对象

gr <- GRanges( seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10))#查看一下gr## GRanges object with 10 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   d     chr2   104-114      * |         4  0.666667##   e     chr1   105-115      * |         5  0.555556##   f     chr1   106-116      + |         6  0.444444##   g     chr3   107-117      + |         7  0.333333##   h     chr3   108-118      + |         8  0.222222##   i     chr3   109-119      - |         9  0.111111##   j     chr3   110-120      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

我们可以看到一共有五列:

分别是染色体信息——seqnames;位置范围信息——ranges;正负链信息——strand;以及自定义score和GC含量两列,最前面的a-j则是每个基因的基因名。


其中的ranges列则是我们上一周学到的IRanges对象,而seqname以及strand则是我们上周学习的Rle对象,大家还有印象嘛?除此之外GRanges对象还可以定义每条染色体长度——seqlengths等信息,我们之后再说。那么构建出GRanges对象了,我们可以怎么处理呢?

二、获取GRanges对象的相关信息

#染色体信息seqnames(gr)## factor-Rle of length 10 with 4 runs##   Lengths:    1    3    2    4##   Values : chr1 chr2 chr1 chr3## Levels(3): chr1 chr2 chr3#范围信息ranges(gr)## IRanges object with 10 ranges and 0 metadata columns:##         start       end     width##     <integer> <integer> <integer>##   a       101       111        11##   b       102       112        11##   c       103       113        11##   d       104       114        11##   e       105       115        11##   f       106       116        11##   g       107       117        11##   h       108       118        11##   i       109       119        11##   j       110       120        11#起点start(gr)##  [1] 101 102 103 104 105 106 107 108 109 110#终点end(gr)##  [1] 111 112 113 114 115 116 117 118 119 120#正负链信息strand(gr)## factor-Rle of length 10 with 5 runs##   Lengths: 1 2 2 3 2##   Values : - + * + -## Levels(3): + - *#基因名names(gr)##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"##提取GRanges信息granges(gr)## GRanges object with 10 ranges and 0 metadata columns:##     seqnames    ranges strand##        <Rle> <IRanges>  <Rle>##   a     chr1   101-111      -##   b     chr2   102-112      +##   c     chr2   103-113      +##   d     chr2   104-114      *##   e     chr1   105-115      *##   f     chr1   106-116      +##   g     chr3   107-117      +##   h     chr3   108-118      +##   i     chr3   109-119      -##   j     chr3   110-120      -##   -------##   seqinfo: 3 sequences from an unspecified genome; no seqlengths##提取metadata信息,GC以及scoremcols(gr)## DataFrame with 10 rows and 2 columns##       score        GC##   <integer> <numeric>## a         1  1.000000## b         2  0.888889## c         3  0.777778## d         4  0.666667## e         5  0.555556## f         6  0.444444## g         7  0.333333## h         8  0.222222## i         9  0.111111## j        10  0.000000#提取scoremcols(gr)$score##  [1]  1  2  3  4  5  6  7  8  9 10##染色体长度seqlengths(gr)## chr1 chr2 chr3 ##   NA   NA   NA#重新赋值——染色体长度seqlengths(gr) <- c(249250621, 243199373, 198022430)##再看一下seqlengths(gr)##      chr1      chr2      chr3 ## 249250621 243199373 198022430##多少个基因length(gr)## [1] 10

三、操作GRanges对象

##将一个GRanges对象分为两个,并以GRangesList对象保存sp <- split(gr, rep(1:2, each=5))sp## GRangesList object of length 2:## $`1`## GRanges object with 5 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   d     chr2   104-114      * |         4  0.666667##   e     chr1   105-115      * |         5  0.555556##   -------##   seqinfo: 3 sequences from an unspecified genome## ## $`2`## GRanges object with 5 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   f     chr1   106-116      + |         6  0.444444##   g     chr3   107-117      + |         7  0.333333##   h     chr3   108-118      + |         8  0.222222##   i     chr3   109-119      - |         9  0.111111##   j     chr3   110-120      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genome##Granges对象的合并 c(sp[[1]], sp[[2]])## GRanges object with 10 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   d     chr2   104-114      * |         4  0.666667##   e     chr1   105-115      * |         5  0.555556##   f     chr1   106-116      + |         6  0.444444##   g     chr3   107-117      + |         7  0.333333##   h     chr3   108-118      + |         8  0.222222##   i     chr3   109-119      - |         9  0.111111##   j     chr3   110-120      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genome##取子集gr[2:3]## GRanges object with 2 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   -------##   seqinfo: 3 sequences from an unspecified genome##取子集并提取特定的metdata列gr[2:3, "GC"]## GRanges object with 2 ranges and 1 metadata column:##     seqnames    ranges strand |        GC##        <Rle> <IRanges>  <Rle> | <numeric>##   b     chr2   102-112      + |  0.888889##   c     chr2   103-113      + |  0.777778##   -------##   seqinfo: 3 sequences from an unspecified genome##替换 singles <- split(gr, names(gr))##按照基因名进行分割为GRangesListgrMod <- grgrMod[2] <- singles[[1]]##实现替换head(grMod, n=3)##查看前三行## GRanges object with 3 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr1   101-111      - |         1  1.000000##   c     chr2   103-113      + |         3  0.777778##   -------##   seqinfo: 3 sequences from an unspecified genome##重复GRanges对象rep(singles[[2]], times = 3)## GRanges object with 3 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   b     chr2   102-112      + |         2  0.888889##   b     chr2   102-112      + |         2  0.888889##   b     chr2   102-112      + |         2  0.888889##   -------##   seqinfo: 3 sequences from an unspecified genome##反转GRangesrev(gr)## GRanges object with 10 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   j     chr3   110-120      - |        10  0.000000##   i     chr3   109-119      - |         9  0.111111##   h     chr3   108-118      + |         8  0.222222##   g     chr3   107-117      + |         7  0.333333##   f     chr1   106-116      + |         6  0.444444##   e     chr1   105-115      * |         5  0.555556##   d     chr2   104-114      * |         4  0.666667##   c     chr2   103-113      + |         3  0.777778##   b     chr2   102-112      + |         2  0.888889##   a     chr1   101-111      - |         1  1.000000##   -------##   seqinfo: 3 sequences from an unspecified genome##提取前2个基因head(gr,n=2)## GRanges object with 2 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr2   102-112      + |         2  0.888889##   -------##   seqinfo: 3 sequences from an unspecified genome##提取后2个基因tail(gr,n=2)## GRanges object with 2 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   i     chr3   109-119      - |         9  0.111111##   j     chr3   110-120      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genome##提取第2个到第4个GRanges对象window(gr, start=2,end=4)## GRanges object with 3 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   d     chr2   104-114      * |         4  0.666667##   -------##   seqinfo: 3 sequences from an unspecified genome##提取第2-3,7-9g个的GRanges对象gr[IRanges(start=c(2,7), end=c(3,9))]## GRanges object with 5 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   g     chr3   107-117      + |         7  0.333333##   h     chr3   108-118      + |         8  0.222222##   i     chr3   109-119      - |         9  0.111111##   -------##   seqinfo: 3 sequences from an unspecified genome

四、GRanges对象内位置区间操作

##整合位置区间rangeg <- gr[1:3]#提取前三个基因GRanges对象g <- append(g, singles[[10]])#将第10个附加在g对象后start(g)#查看起点## [1] 101 102 103 110end(g)#查看终点## [1] 111 112 113 120width(g)#查看宽度## [1] 11 11 11 11range(g)##整合相近的区间## GRanges object with 3 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr1   101-111      -##   [2]     chr2   102-113      +##   [3]     chr3   110-120      -##   -------##   seqinfo: 3 sequences from an unspecified genome##翻转g#查看g## GRanges object with 4 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   j     chr3   110-120      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genomeflank(g, 10)#以原GRanges对象起点为对称点,等长翻转(方向在+/-链上相反)## GRanges object with 4 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   112-121      - |         1  1.000000##   b     chr2    92-101      + |         2  0.888889##   c     chr2    93-102      + |         3  0.777778##   j     chr3   121-130      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genomeflank(g, 10, start=FALSE)#以原GRanges对象终点为对称点,等长翻转## GRanges object with 4 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1    91-100      - |         1  1.000000##   b     chr2   113-122      + |         2  0.888889##   c     chr2   114-123      + |         3  0.777778##   j     chr3   100-109      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genome##提升shift(g, 5)#范围往后推进5个碱基,与+/-链无关## GRanges object with 4 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   106-116      - |         1  1.000000##   b     chr2   107-117      + |         2  0.888889##   c     chr2   108-118      + |         3  0.777778##   j     chr3   115-125      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genome#重新定义范围大小resize(g, 30)##以转录起始位点为起点,与+/-链相关## GRanges object with 4 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1    82-111      - |         1  1.000000##   b     chr2   102-131      + |         2  0.888889##   c     chr2   103-132      + |         3  0.777778##   j     chr3    91-120      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genome##缩减为互不相交的范围reduce(g)## GRanges object with 3 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr1   101-111      -##   [2]     chr2   102-113      +##   [3]     chr3   110-120      -##   -------##   seqinfo: 3 sequences from an unspecified genome##染色体起点到基因起点的的GRangesg## GRanges object with 4 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   j     chr3   110-120      - |        10  0.000000##   -------##   seqinfo: 3 sequences from an unspecified genomegaps(g)## GRanges object with 12 ranges and 0 metadata columns:##        seqnames        ranges strand##           <Rle>     <IRanges>  <Rle>##    [1]     chr1   1-249250621      +##    [2]     chr1         1-100      -##    [3]     chr1 112-249250621      -##    [4]     chr1   1-249250621      *##    [5]     chr2         1-101      +##    ...      ...           ...    ...##    [8]     chr2   1-243199373      *##    [9]     chr3   1-198022430      +##   [10]     chr3         1-109      -##   [11]     chr3 121-198022430      -##   [12]     chr3   1-198022430      *##   -------##   seqinfo: 3 sequences from an unspecified genome##去除重叠区域,得到差集disjoin(g)## GRanges object with 5 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr1   101-111      -##   [2]     chr2       102      +##   [3]     chr2   103-112      +##   [4]     chr2       113      +##   [5]     chr3   110-120      -##   -------##   seqinfo: 3 sequences from an unspecified genome##计算重叠的长度与次数coverage(g)## RleList of length 3## $chr1## integer-Rle of length 249250621 with 3 runs##   Lengths:       100        11 249250510##   Values :         0         1         0## ## $chr2## integer-Rle of length 243199373 with 5 runs##   Lengths:       101         1        10         1 243199260##   Values :         0         1         2         1         0## ## $chr3## integer-Rle of length 198022430 with 3 runs##   Lengths:       109        11 198022310##   Values :         0         1         0

五、多个GRanges对象位置区间操作

g2 <- head(gr, n=2)union(g, g2)#融合相同范围的GRanges对象内基因## GRanges object with 3 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr1   101-111      -##   [2]     chr2   102-113      +##   [3]     chr3   110-120      -##   -------##   seqinfo: 3 sequences from an unspecified genomeintersect(g, g2)#两个GRanges对象取交集## GRanges object with 2 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr1   101-111      -##   [2]     chr2   102-112      +##   -------##   seqinfo: 3 sequences from an unspecified genomesetdiff(g, g2)##找出差异位点以及区间## GRanges object with 2 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr2       113      +##   [2]     chr3   110-120      -##   -------##   seqinfo: 3 sequences from an unspecified genome##GRanges对象间的平行操作,即在GRanges间进行操作时,第一个GRanges对象的第一个元素对应第二个GRanges对象的第一个元素,以此类推。英文为parallel#所以在GRanges对象中的操作则是在对应的命令中添加p字母开头,比如:g3 <- g[1:2]g3## GRanges object with 2 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   a     chr1   101-111      - |         1  1.000000##   b     chr2   102-112      + |         2  0.888889##   -------##   seqinfo: 3 sequences from an unspecified genomeranges(g3[1]) <- IRanges(start=105, end=112)##替换g3的第一个元素范围#平行合并punion(g2, g3)## GRanges object with 2 ranges and 0 metadata columns:##     seqnames    ranges strand##        <Rle> <IRanges>  <Rle>##   a     chr1   101-112      -##   b     chr2   102-112      +##   -------##   seqinfo: 3 sequences from an unspecified genome#平行取交集pintersect(g2, g3)## GRanges object with 2 ranges and 3 metadata columns:##     seqnames    ranges strand |     score        GC       hit##        <Rle> <IRanges>  <Rle> | <integer> <numeric> <logical>##   a     chr1   105-111      - |         1  1.000000      TRUE##   b     chr2   102-112      + |         2  0.888889      TRUE##   -------##   seqinfo: 3 sequences from an unspecified genome#平行找差异psetdiff(g2, g3)## GRanges object with 2 ranges and 0 metadata columns:##     seqnames    ranges strand##        <Rle> <IRanges>  <Rle>##   a     chr1   101-104      -##   b     chr2   102-101      +##   -------##   seqinfo: 3 sequences from an unspecified genome##当然GRanges对象的相关位置区间还有很多,我们上一周提到的findOverlaps等等#这儿我们无法一一列举,如果同学们想知道更多,可以使用如下命令#  methods(class="GRanges")

在说完GRanges对象后,我们还需要知道GRanges的集合——GRangesList。

六、构建GRangesList对象以及获取相关信息

#构建GRangesList对象gr1 <- GRanges( seqnames = "chr2", ranges = IRanges(103, 106), strand = "+", score = 5L, GC = 0.45)gr2 <- GRanges( seqnames = c("chr1", "chr1"), ranges = IRanges(c(107, 113), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))grl <- GRangesList("txA" = gr1, "txB" = gr2)grl## GRangesList object of length 2:## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $txB## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths##获取GRangesList对象的相关信息,类似于GRanges对象#######其输出的结果Rle list对象seqnames(grl)#染色体名## RleList of length 2## $txA## factor-Rle of length 1 with 1 run##   Lengths:    1##   Values : chr2## Levels(2): chr2 chr1## ## $txB## factor-Rle of length 2 with 1 run##   Lengths:    2##   Values : chr1## Levels(2): chr2 chr1ranges(grl)#范围## IRangesList object of length 2:## $txA## IRanges object with 1 range and 0 metadata columns:##           start       end     width##       <integer> <integer> <integer>##   [1]       103       106         4## ## $txB## IRanges object with 2 ranges and 0 metadata columns:##           start       end     width##       <integer> <integer> <integer>##   [1]       107       109         3##   [2]       113       115         3strand(grl)#正负链## RleList of length 2## $txA## factor-Rle of length 1 with 1 run##   Lengths: 1##   Values : +## Levels(3): + - *## ## $txB## factor-Rle of length 2 with 2 runs##   Lengths: 1 1##   Values : + -## Levels(3): + - *##但是下列命令与GRanges对象命令得到的结果不同length(grl)#GRanges的个数## [1] 2names(grl)#GRanges的名称## [1] "txA" "txB"seqlengths(grl)#GRanges的染色体长度## chr2 chr1 ##   NA   NA##查看每个GRanges对象的元素个数 elementNROWS(grl)## txA txB ##   1   2##GRangesList对象是否为空isEmpty(grl)## [1] FALSE##构建GRangesList的metadata,首先对每个GRanges进行命名mcols(grl) <- c("Transcript A","Transcript B")mcols(grl)## DataFrame with 2 rows and 1 column##            value##      <character>## txA Transcript A## txB Transcript B##将GRanges的metadata去列表化,得到dataframe格式mcols(unlist(grl))## DataFrame with 3 rows and 2 columns##         score        GC##     <integer> <numeric>## txA         5      0.45## txB         3      0.30## txB         4      0.50##将GRangesList转化GRangesul <- unlist(grl)ul## GRanges object with 3 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   txA     chr2   103-106      + |         5      0.45##   txB     chr1   107-109      + |         3      0.30##   txB     chr1   113-115      - |         4      0.50##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths

七、GRangeList对象内元素的操作

grl1 <- GRangesList( gr1 = GRanges("chr2", IRanges(3, 6)), gr2 = GRanges("chr1", IRanges(c(7,13), width = 3)))grl2 <- GRangesList( gr1 = GRanges("chr2", IRanges(9, 12)), gr2 = GRanges("chr1", IRanges(c(25,38), width = 3)))## 将两个GRangesList内的GRanges对象一一对应进行合并pc(grl1, grl2)## GRangesList object of length 2:## $gr1## GRanges object with 2 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr2       3-6      *##   [2]     chr2      9-12      *##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $gr2## GRanges object with 4 ranges and 0 metadata columns:##       seqnames    ranges strand##          <Rle> <IRanges>  <Rle>##   [1]     chr1       7-9      *##   [2]     chr1     13-15      *##   [3]     chr1     25-27      *##   [4]     chr1     38-40      *##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths##提取GRangeList的子集grl[1]#提取第一个GRangesList## GRangesList object of length 1:## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsgrl[[1]]#提取第一个GRanges## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsgrl["txA"]#提取“txA”GRangesList## GRangesList object of length 1:## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsgrl$txB#提取“txB”这一GRanges## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths##提取GRangesList特定列表的特定列grl[1, "score"]## GRangesList object of length 1:## $txA## GRanges object with 1 range and 1 metadata column:##       seqnames    ranges strand |     score##          <Rle> <IRanges>  <Rle> | <integer>##   [1]     chr2   103-106      + |         5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsgrl["txB", "GC"]## GRangesList object of length 1:## $txB## GRanges object with 2 ranges and 1 metadata column:##       seqnames    ranges strand |        GC##          <Rle> <IRanges>  <Rle> | <numeric>##   [1]     chr1   107-109      + |       0.3##   [2]     chr1   113-115      - |       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths##其他对GRangesList对象的元素操作,类似于GRanges对象rep(grl[[1]], times = 3)## GRanges object with 3 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   [2]     chr2   103-106      + |         5      0.45##   [3]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsrev(grl)## GRangesList object of length 2:## $txB## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthshead(grl, n=1)## GRangesList object of length 1:## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthstail(grl, n=1)## GRangesList object of length 1:## $txB## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthswindow(grl, start=1, end=1)## GRangesList object of length 1:## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsgrl[IRanges(start=2, end=2)]## GRangesList object of length 1:## $txB## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths

八、GRangeList对象内的位置信息区间操作

start(grl)## IntegerList of length 2## [["txA"]] 103## [["txB"]] 107 113end(grl)## IntegerList of length 2## [["txA"]] 106## [["txB"]] 109 115width(grl)## IntegerList of length 2## [["txA"]] 4## [["txB"]] 3 3##计算每个GRanges对象内的总宽度sum(width(grl))## txA txB ##   4   6##GRangeList内的基因向后平移位置信息shift(grl, 20)## GRangesList object of length 2:## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   123-126      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $txB## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   127-129      + |         3       0.3##   [2]     chr1   133-135      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths#GRangeList内的基因查看交集coverage(grl)## RleList of length 2## $chr2## integer-Rle of length 106 with 2 runs##   Lengths: 102   4##   Values :   0   1## ## $chr1## integer-Rle of length 115 with 4 runs##   Lengths: 106   3   3   3##   Values :   0   1   0   1#####对GRangeList对象间进行循环操作#####lapply(grl, length)##输出列表格式## $txA## [1] 1## ## $txB## [1] 2sapply(grl, length)#输出interger格式## txA txB ##   1   2##提升grl2 <- shift(grl, 10)names(grl2) <- c("shiftTxA", "shiftTxB")###GRangesList对象间进行一一对应合并mapply(c, grl, grl2)#mapply和Map功能相同## $txA## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   [2]     chr2   113-116      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $txB## GRanges object with 4 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   [3]     chr1   117-119      + |         3       0.3##   [4]     chr1   123-125      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsMap(c, grl, grl2)## $txA## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   [2]     chr2   113-116      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $txB## GRanges object with 4 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   [3]     chr1   117-119      + |         3       0.3##   [4]     chr1   123-125      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths##GRangesList中每个List内进行反转endoapply(grl, rev)## GRangesList object of length 2:## $txA## GRanges object with 1 range and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $txB## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   113-115      - |         4       0.5##   [2]     chr1   107-109      + |         3       0.3##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths#合并GRangesList中每个List内进行反转mendoapply(c, grl, grl2)## GRangesList object of length 2:## $txA## GRanges object with 2 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   [2]     chr2   113-116      + |         5      0.45##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths## ## $txB## GRanges object with 4 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr1   107-109      + |         3       0.3##   [2]     chr1   113-115      - |         4       0.5##   [3]     chr1   117-119      + |         3       0.3##   [4]     chr1   123-125      - |         4       0.5##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths##将GRangesList去交集Reduce(c, grl)## GRanges object with 3 ranges and 2 metadata columns:##       seqnames    ranges strand |     score        GC##          <Rle> <IRanges>  <Rle> | <integer> <numeric>##   [1]     chr2   103-106      + |         5      0.45##   [2]     chr1   107-109      + |         3      0.30##   [3]     chr1   113-115      - |         4      0.50##   -------##   seqinfo: 2 sequences from an unspecified genome; no seqlengths##如何大家还想进一步了解GRangesList的命令,可以使用#methods(class="GRangesList")

九、GRangesList和GRanges互相间的区间位置的操作

findOverlaps(gr, grl)##找到GRanges对象和GRangesList对象互相重叠区域## Hits object with 5 hits and 0 metadata columns:##       queryHits subjectHits##       <integer>   <integer>##   [1]         2           1##   [2]         3           1##   [3]         4           1##   [4]         5           2##   [5]         6           2##   -------##   queryLength: 10 / subjectLength: 2countOverlaps(gr, grl)##计算在gr(GRanges)对象中与grl对象的重叠区域## a b c d e f g h i j ## 0 1 1 1 1 1 0 0 0 0subsetByOverlaps(gr,grl)##提取与grl对象有交集的gr子集## GRanges object with 5 ranges and 2 metadata columns:##     seqnames    ranges strand |     score        GC##        <Rle> <IRanges>  <Rle> | <integer> <numeric>##   b     chr2   102-112      + |         2  0.888889##   c     chr2   103-113      + |         3  0.777778##   d     chr2   104-114      * |         4  0.666667##   e     chr1   105-115      * |         5  0.555556##   f     chr1   106-116      + |         6  0.444444##   -------##   seqinfo: 3 sequences from an unspecified genome##找出GRanges以及GRangesList对象内的每个元素重叠的次数findOverlaps(gr, grl, select="first")#以gr为query对象##  [1] NA  1  1  1  2  2 NA NA NA NAfindOverlaps(grl, gr, select="first")## [1] 2 5

十、实战

如果我们现在有一个GRanges对象,暂且命名为x。那么我们如何知道这个GRanges对象附近有哪些基因呢?

#提取Hg19版本的一直基因合集txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene#使用genomicFeature包的genes功能提取hg19的基因信息broads <- GenomicFeatures::genes(txdb)##   403 genes were dropped because they have exons located on both strands##   of the same reference sequence or on more than one reference sequence,##   so cannot be represented by a single genomic range.##   Use 'single.strand.genes.only=FALSE' to get all the genes in a##   GRangesList object, or use suppressMessages() to suppress this message.#x为我们随意设置的GRanges对象,为我们的查询对象x <- GRanges( seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10))#提取在第1、2、3号染色体的所有基因subject <- broads[ seqnames(broads) %in% seqlevels(x) ]##找到与GRanges对象——x相近的元素的距离nearest(x, subject)##  [1] 3887 2979 2979 3710   99   99  571  571  410  410nearest(x)##如果没有subject,则x自己作为subject##  [1]  5  4  4  3  6  5  8  7 10  9##找到x之前存在在subject的最近基因precede(x, subject)##  [1]   NA 2979 2979 2979   99   99  571  571   NA   NA## 找到x之后存在在subject的最近基因follow(x, subject)##  [1] 3887   NA   NA 3710 3887   NA   NA   NA  410  410##找到在subject中与x最近的元素,默认k=1nearestKNeighbors(x, subject)## IntegerList of length 10## [[1]] 3887## [[2]] 2979## [[3]] 2979## [[4]] 3710## [[5]] 99## [[6]] 99## [[7]] 571## [[8]] 571## [[9]] 410## [[10]] 410nearestKNeighbors(x, subject, k=10)## IntegerList of length 10## [[1]] 3887 4146 110 4344 1938 48 1776 4608 3502 2566## [[2]] 2979 3034 4082 4157 2870 233 3622 4248 3309 3924## [[3]] 2979 3034 4082 4157 2870 233 3622 4248 3309 3924## [[4]] 3710 2979 3034 4082 4157 2870 233 3622 4248 3309## [[5]] 99 3887 4283 3731 1023 2172 4504 4969 2383 1655## [[6]] 99 4283 3731 1023 2172 4504 4969 2383 1655 2611## [[7]] 571 1885 1192 2864 3481 3704 2344 4673 3194 4990## [[8]] 571 1885 1192 2864 3481 3704 2344 4673 3194 4990## [[9]] 410 2309 2885 2010 18 273 328 113 2857 2813## [[10]] 410 2309 2885 2010 18 273 328 113 2857 2813###若无subject,x同样自己可以作为subjectnearestKNeighbors(x)## IntegerList of length 10## [[1]] 5## [[2]] 3## [[3]] 2## [[4]] 2## [[5]] 1## [[6]] 5## [[7]] 8## [[8]] 7## [[9]] 10## [[10]] 9nearestKNeighbors(x, k=10)## IntegerList of length 10## [[1]] 5 5## [[2]] 3 4## [[3]] 2 4## [[4]] 2 3## [[5]] 1 6 1## [[6]] 5## [[7]] 8## [[8]] 7## [[9]] 10 10## [[10]] 9 9####那么得到了我们的索引结果,如何知道是哪个基因呢?index=nearestKNeighbors(x, subject)[[1]]gene_id=subject[index,]gene_id#查看一下,基因id为653635## GRanges object with 1 range and 1 metadata column:##          seqnames      ranges strand |     gene_id##             <Rle>   <IRanges>  <Rle> | <character>##   653635     chr1 14362-29961      - |      653635##   -------##   seqinfo: 93 sequences (1 circular) from hg19 genome

正如前文所说的,GRanges对象是Bioconductor项目的基石,在我们的实际应用中具有非常重要的作用,大家看到最后真的非常不容易啦在写稿子的时候我一直在想怎么把GRanges对象的介绍写的既平易近人又干货满满,大家上手还快。希望这一篇稿子有助于你快速了解GRanges对象,并且能有不同的启发。好啦,我是风间琉璃,我们下期见~


往期传送门


手把手教你甲基化生信分析——甲基化minfi包的使用(一)

手把手教你甲基化生信分析—甲基化minfi包的使用(二)

手把手教你甲基化分析——甲基化CHAMP包的使用(一)

五分钟学会甲基化芯片处理,快上车!!!

一首歌的时间,我就掌握了这个生信技能

解锁高阶甲基化分析流程?你的SCI又多了一张图

2020年了!!这样的文章也能发近5分!

甲基化分析实战,将你的数据用在刀刃上!

小白都能学会的基因组可视化,不来看看吗?

万字长文教你入门biocondutor基石,小白都能看懂~

END
撰文丨风间琉璃
排版丨四金兄
值班 | 菠小萝
主编丨小雪球

欢迎大家关注解螺旋生信频道-挑圈联靠公号~

又是一年国自然报告季!
经常有小伙伴来问:

国自然标书不会写,怎么?

本周四老谈酸菜直播讲解:

如何写好一份国自然标书!

赶紧点击:预约

老谈酸菜助你冲刺国自然!

👇👇👇

周四酸谈直播基本信息

主题:《如何写好一份国自然标书》
日期:1月14日 18:00—20:00
(👆点击“预”就完事了!!!

长按识别二维码免费包邮领取!

万字长文教你入门biocondutor基石,小白都能看懂~

万字长文教你入门biocondutor基石,小白都能看懂~

生物医学科研方法

诺奖技术培训!第17期CRISPR基因编辑技术与动物建模学习班

2021-1-16 6:14:38

生物医学科研方法

​武大博后做社工,清北硕博去街道办!是大材小用?还是内卷下的逃离?

2021-1-16 6:24:47