如何删除以R中的特殊字符开头的行

问题描述:

我有一个数据框,我想删除所有以#开头的行。任何人都可以告诉我如何去做。提前致谢。如何删除以R中的特殊字符开头的行

#ID_REF = The name of the probe set, blank for control probes   
    #VALUE = The signal value calculated by MAS5, normalized    
    #ABS_CALL = The detection value calculated by the MAS5   
    #DETECTION P-VALUE = The detection p-value calculated by the MAS5   
    *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE* 
    AFFX-BioB-5_at 757.7 P 0.00039 
    AFFX-BioB-M_at 933.7 P 0.000095 
    AFFX-BioB-3_at 525.6 P 0.000095 
    AFFX-BioC-5_at 1999.5 P 0.000044 
    AFFX-BioC-3_at 2339.5 P 0.000044 
    AFFX-BioDn-5_at 4321.3 P 0.000044 
    AFFX-BioDn-3_at 9229.4 P 0.00007 
    AFFX-CreX-5_at 21949.9 P 0.000044 
    AFFX-CreX-3_at 26022.8 P 0.000044 
    AFFX-DapX-5_at 1171.1 P 0.00006 
+2

尝试'read.delim( 'yourfile',comment.char = '#')' – akrun 2015-02-10 16:33:00

+0

http://*.com/questions/28433328/skip-comment-line-in-csv的可能的复制-file-using-r – akrun 2015-02-10 16:36:07

+0

@akrun,它使用'#'删除一些行,但合并一行中的所有数据 – AwaitedOne 2015-02-10 16:38:37

部分行中的注释字符(#)不是第一个字符。一种方法是删除其使用grep注释字符(#)(“lines2”)的线,然后用read.csv

lines <- readLines('awaited.csv') 
lines1 <- gsub('^ +| +$', '', lines) 
lines2 <- lines1[!grepl('^#|^.*#', lines1)] 
d1 <- read.csv(text=lines2, check.names=FALSE, stringsAsFactors=FALSE) 
str(d1) 
#'data.frame': 54682 obs. of 4 variables: 
# $ *ID_REF**   : chr "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" "AFFX-BioC-5_at" ... 
# $ VALUE**   : num 758 934 526 2000 2340 ... 
# $ ABS_CALL**  : chr "P" "P" "P" "P" ... 
# $ DETECTION P-VALUE*: num 3.9e-04 9.5e-05 9.5e-05 4.4e-05 4.4e-05 4.4e-05 7.0e-05 4.4e-05 4.4e-05 6.0e-05 ... 
head(d1,3) 
#  *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE* 
#1 AFFX-BioB-5_at 757.7   P   3.9e-04 
#2 AFFX-BioB-M_at 933.7   P   9.5e-05 
#3 AFFX-BioB-3_at 525.6   P   9.5e-05 

读或你可以#之前删除所有其他字符后使用comment.char='#'论点read.csv#sub(.*...))的行中。

d2 <- read.csv(text=sub('.*(#.*)', '\\1', lines), 
    check.names=FALSE, stringsAsFactors=FALSE, comment.char='#') 
dim(d2) 
#[1] 54682  4 
head(d2,3) 
#  *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE* 
#1 AFFX-BioB-5_at 757.7   P   3.9e-04 
#2 AFFX-BioB-M_at 933.7   P   9.5e-05 
#3 AFFX-BioB-3_at 525.6   P   9.5e-05 
+0

对我来说同样的错误:'在扫描中的错误(文件,什么,nmax,sep,dec,引用,跳过,nlines,na.strings,: 行2没有2个元素' – AwaitedOne 2015-02-10 17:16:24

+0

@AwaitedOne尝试'fill = TRUE ' – akrun 2015-02-10 17:24:15

+0

我给了一个尝试,但似乎不工作。我运行你的上面的代码。 – AwaitedOne 2015-02-10 17:26:07