SAS:如何在字符串中找到第n个字符/字符组的第n个实例?
问题描述:
我试图找到一个函数,它会索引一个字符的第n个实例。SAS:如何在字符串中找到第n个字符/字符组的第n个实例?
例如,如果我有字符串ABABABBABSSSDDEE
并且我想查找A
的第三个实例,那么我该怎么做?如果我想找到AB
ABAB 的第四实例的 BB AB SSSDDEE
data HAVE;
input STRING $;
datalines;
ABABABBASSSDDEE
;
RUN;
答
data _null_;
findThis = 'A'; *** substring to find;
findIn = 'ADABAACABAAE'; **** the string to search;
instanceOf=1; *** and the instance of the substring we want to find;
pos = 0;
len = 0;
startHere = 1;
endAt = length(findIn);
n = 0; *** count occurrences of the pattern;
pattern = '/' || findThis || '/';
rx = prxparse(pattern);
CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len);
if pos le 0 then do;
put 'Could not find ' findThis ' in ' findIn;
end;
else do while (pos gt 0);
n+1;
if n eq instanceOf then leave;
CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len);
end;
if n eq instanceOf then do;
put 'found ' instanceOf 'th instance of ' findThis ' at position ' pos ' in ' findIn;
end;
else do;
put 'No ' instanceOf 'th instance of ' findThis ' found';
end;
run;
答
下面是使用find()
功能和datastep内做循环的解决方案。然后我拿这个代码,并把它放到一个proc fcmp
程序中来创建我自己的函数find_n()
。这将大大简化任何使用此任务的任务并允许代码重用。
定义数据:
data have;
length string $50;
input string $;
datalines;
ABABABBABSSSDDEE
;
run;
DO循环解决方案:
data want;
set have;
search_term = 'AB';
nth_time = 4;
counter = 0;
last_find = 0;
start = 1;
pos = find(string,search_term,'',start);
do while (pos gt 0 and nth_time gt counter);
last_find = pos;
start = pos + 1;
counter = counter + 1;
pos = find(string,search_term,'',start+1);
end;
if nth_time eq counter then do;
put "The nth occurrence was found at position " last_find;
end;
else do;
put "Could not find the nth occurrence";
end;
run;
定义proc fcmp
功能:
注意:如果第n-发生不能被发现返回0.
options cmplib=work.temp.temp;
proc fcmp outlib=work.temp.temp;
function find_n(string $, search_term $, nth_time) ;
counter = 0;
last_find = 0;
start = 1;
pos = find(string,search_term,'',start);
do while (pos gt 0 and nth_time gt counter);
last_find = pos;
start = pos + 1;
counter = counter + 1;
pos = find(string,search_term,'',start+1);
end;
result = ifn(nth_time eq counter, last_find, 0);
return (result);
endsub;
run;
例proc fcmp
用法:
注意这两次调用该函数。第一个例子是显示原始请求解决方案。第二个例子显示当找不到匹配时会发生什么。
data want;
set have;
nth_position = find_n(string, "AB", 4);
put nth_position =;
nth_position = find_n(string, "AB", 5);
put nth_position =;
run;
+0
为什么downvote? –
你到目前为止尝试过什么。你有没有读过'正则表达式'。 'SAS'使用'PERL'正则表达式引擎。 – gwillie
你也可以在循环中使用FIND来查找简单的例子,虽然正则表达式对于复杂情况是更好的方法... – kl78
@gwillie到目前为止没有'regex',但我会研究它..... .....在过去使用'index'和'find'与'substr'的组合,但是这个下一级复杂度可能需要regex。 TY。 –