apply - Overlap across dataframes in R

Question

Welcome To Ask or Share your Answers For Others

apply - Overlap across dataframes in R

asked Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

apply - Overlap across dataframes in R

I am trying to check the overlap between one and several other files (overlap_files in code below).

Main file:

chr1    8014812 8014812
chr1    22371954    22371954
chr1    35328666    35328666

Example of overlap_files:

chr1    8014812 8014812
chr1    22371954    22371954

My code looks like this:

# Load variants
a1 <- read.table("main.txt", header=FALSE)

#Begin looping
overlap=lapply(overlap_files, 
function(x) {

#Load in "x" file skipping empty files
t=if(!file.size(x) == 0) {
read.table(x, header=FALSE)
}
#Overlap
apply(a1, 1, function(x) 
    ifelse(any(x[1]==t$V1 & x[2]==t$V2 & x[3]==t$V3), '1','0')) 
})

Although the two first rows exist in both files, in the output the first variant is marked as 0 (it should have been 1), the second as 1 (correct) and the third as 0 (correct). It seems to be because of the difference in length (i.e. 8014812 has 7 digits, while the other two numbers 8 digits). Is there a way of fixing this? Thank you.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-24T02:53:20+0000

From your example, I am not entirely sure what the separators in your files are. (tabs?)

Either way, I would propose the following approach:

Read in files as data frames (one per file)
Using dplyr::join will give you all rows that match (you can define multiple columns to match across with the by property)

Categories

apply - Overlap across dataframes in R

apply - Overlap across dataframes in R

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags