提取docIDs并从文件的文件并把它们放在一个HashMap
问题描述:
我有这样的文字:提取docIDs并从文件的文件并把它们放在一个HashMap
.I 1
.T
experimental investigation of the aerodynamics of a
wing in a slipstream .
.A
brenckman,m.
.B
j. ae. scs. 25, 1958, 324.
.W
experimental investigation of the aerodynamics of a
wing in a slipstream .
an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .
.I 2
.T
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .
.A
ting-yili
.B
department of aeronautical engineering, rensselaer polytechnic
institute
troy, n.y.
.W
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .the discussion here is restricted to two-dimensional incompressible steady flow .
.I 3
.T
the boundary layer in simple shear flow past a flat plate .
.A
m. b. glauert
.B
department of mathematics, university of manchester, manchester,
england
.W
the boundary layer in simple shear flow past a flat plate .
the boundary-layer equations are presented for steady
flow with no pressure gradient .
我需要一个正则表达式在Java中,这将给如下: 每当GET一个“.I 1”,将给出以“.W”结尾之前的文本。“I 2”
答
我认为最简单的方法是使用以下模式找到第一个匹配:
(?<=\.I\s1\s)[\W\w]+(?=\.I\s2)
你会得到第一个匹配:
(?<=\.W\s)[\W\w]+
你会得到一个结果:
.T
experimental investigation of the aerodynamics of a
wing in a slipstream .
.A
brenckman,m.
.B
j. ae. scs. 25, 1958, 324.
.W
experimental investigation of the aerodynamics of a
wing in a slipstream .
an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .
然后通过以下方式找到从第一场比赛的第二场比赛
experimental investigation of the aerodynamics of a
wing in a slipstream .
an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .
你的情况可能是这样的:
public static void main(String[] args) {
Map<String, String> hashMap = new HashMap<>();
String text = " ... "; // your text here
String p1 = null, p2 = "(?<=\\.W\\s)[\\W\\w]+";
Pattern r1 = null, r2 = null;
Matcher m1 = null, m2 = null;
int i = 1;
do {
if(i == 3) {
p1 = "(?<=\\.I\\s"+ i +"\\s)[\\W\\w]+(?=($))";
i++;
} else
p1 = "(?<=\\.I\\s"+ i +"\\s)[\\W\\w]+(?=(\\.I\\s"+ ++i +"))";
r1 = Pattern.compile(p1);
r2 = Pattern.compile(p2);
m1 = r1.matcher(text);
String textPart;
if(m1.find()) {
textPart = m1.group(0);
m2 = r2.matcher(textPart);
if(m2.find())
hashMap.put(".I " + (i - 1), m2.group(0));
}
} while(i < 4);
for(Map.Entry<String, String> item : hashMap.entrySet()) {
System.out.println(item.getKey());
System.out.println(item.getValue());
System.out.println();
}
}
结果:
.I 2
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .the discussion here is restricted to two-dimensional incompressible steady flow .
.I 1
experimental investigation of the aerodynamics of a
wing in a slipstream .
an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .
.I 3
the boundary layer in simple shear flow past a flat plate .
the boundary-layer equations are presented for steady
flow with no pressure gradient .
+0
谢谢你阿列克谢。它工作。 – user3701435
+0
不客气! –
好,现在的问题是:什么样的图案你试过了吗? –
使用Java,您需要打开MULTILINE模式https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#MULTILINE 然后,像\ .I之类的东西\ s1。*?\。W(。*?)\。我\ s2应该工作(需要一些转义)。如果我之后的数字对您很重要,您可能需要添加更多组。或者,由于您匹配的最后一件事似乎是您想要匹配的下一件事的一部分,您可能希望排除它。我倾向于为这类东西编写单元测试,然后调整正则表达式直到它工作。也许你可以发布一些代码来说明你确切需要什么? –