SAS:如何在字符串中找到第n个字符/字符组的第n个实例?

问题描述:

我试图找到一个函数,它会索引一个字符的第n个实例。SAS:如何在字符串中找到第n个字符/字符组的第n个实例?

例如,如果我有字符串ABABABBABSSSDDEE并且我想查找A的第三个实例,那么我该怎么做?如果我想找到AB

ABAB 的第四实例的 BB AB SSSDDEE

data HAVE; 
    input STRING $; 
    datalines; 
ABABABBASSSDDEE 
; 
RUN; 
+1

你到目前为止尝试过什么。你有没有读过'正则表达式'。 'SAS'使用'PERL'正则表达式引擎。 – gwillie

+1

你也可以在循环中使用FIND来查找简单的例子,虽然正则表达式对于复杂情况是更好的方法... – kl78

+0

@gwillie到目前为止没有'regex',但我会研究它..... .....在过去使用'index'和'find'与'substr'的​​组合,但是这个下一级复杂度可能需要regex。 TY。 –

data _null_; 
findThis = 'A'; *** substring to find; 
findIn = 'ADABAACABAAE'; **** the string to search; 
instanceOf=1; *** and the instance of the substring we want to find; 
pos = 0; 
len = 0; 
startHere = 1; 
endAt = length(findIn); 
n = 0; *** count occurrences of the pattern; 
pattern = '/' || findThis || '/'; 
rx = prxparse(pattern); 
CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len); 
if pos le 0 then do; 
    put 'Could not find ' findThis ' in ' findIn; 
end; 
else do while (pos gt 0); 
    n+1; 
    if n eq instanceOf then leave; 
    CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len); 
end; 
if n eq instanceOf then do; 
    put 'found ' instanceOf 'th instance of ' findThis ' at position ' pos ' in ' findIn; 
end; 
else do; 
    put 'No ' instanceOf 'th instance of ' findThis ' found'; 
end; 
run; 

下面是使用find()功能和datastep内做循环的解决方案。然后我拿这个代码,并把它放到一个proc fcmp程序中来创建我自己的函数find_n()。这将大大简化任何使用此任务的任务并允许代码重用。

定义数据:

data have; 
    length string $50; 
    input string $; 
    datalines; 
ABABABBABSSSDDEE 
; 
run; 

DO循环解决方案:

data want; 
    set have; 
    search_term = 'AB'; 
    nth_time = 4; 
    counter = 0; 
    last_find = 0; 

    start = 1; 
    pos = find(string,search_term,'',start); 
    do while (pos gt 0 and nth_time gt counter); 
    last_find = pos; 
    start = pos + 1; 
    counter = counter + 1; 
    pos = find(string,search_term,'',start+1); 
    end; 

    if nth_time eq counter then do;  
    put "The nth occurrence was found at position " last_find; 
    end; 
    else do; 
    put "Could not find the nth occurrence"; 
    end; 

run; 

定义proc fcmp功能:

注意:如果第n-发生不能被发现返回0.

options cmplib=work.temp.temp; 

proc fcmp outlib=work.temp.temp; 

    function find_n(string $, search_term $, nth_time) ;  

    counter = 0; 
    last_find = 0; 

    start = 1; 
    pos = find(string,search_term,'',start); 
    do while (pos gt 0 and nth_time gt counter); 
     last_find = pos; 
     start = pos + 1; 
     counter = counter + 1; 
     pos = find(string,search_term,'',start+1); 
    end; 

    result = ifn(nth_time eq counter, last_find, 0); 

    return (result); 
    endsub; 

run; 

proc fcmp用法:

注意这两次调用该函数。第一个例子是显示原始请求解决方案。第二个例子显示当找不到匹配时会发生什么。

data want; 
    set have; 
    nth_position = find_n(string, "AB", 4); 
    put nth_position =; 

    nth_position = find_n(string, "AB", 5); 
    put nth_position =; 
run; 
+0

为什么downvote? –