Java的顺序解析从文件信息
问题描述:
可以说我有这样一个结构的文件:Java的顺序解析从文件信息
线0:
354858
Some String That Is Important
AA其他的东西SOMESTUFF 应BE IGNORED第1行:
543788
Another String That Is Important
AA其他的东西 SOMESTUFF需要忽略
等等...
现在我想获得那就是信息在我的示例中标记(请参阅灰色背景)。序列AA始终存在(并可用作中断并跳到下一行),而信息字符串的长度不同。
什么是解析信息的最佳方式?与if, then, else
或缓冲的读者是有某种解析器,你可以告诉的,读一些lenth XYZ然后阅读一切为String的,直到你找到AA然后跳过线。
答
我会逐行阅读文件,并将每行与正则表达式进行匹配。我希望我在下面的代码中的评论足够详细。
// The pattern to use
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA");
// Read file line by line
BufferedReader br = new BufferedReader(new FileReader(myFile));
String line;
while((line = br.readLine()) != null) {
// Match line against our pattern
Matcher m = p.matcher(line);
if(m.find()) {
// Line is valid, process it however you want
// m.group(1) contains the number
// m.group(2) contains the text between number and AA
} else {
// Line has invalid format (pattern does not match)
}
}
正则表达式(pattern)的说明我用:
^([0-9]+)\s+(([^A]|A[^A])+)AA
^ matches the start of the line
([0-9]+) matches any integral number
\s+ matches one or more whitespace characters
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A
AA matches the terminating AA
更新作为回复评论:
如果每行有一个前|
性格,表达外观像这样:
^\|([0-9]+)\s+(([^A]|A[^A])+)AA
在Java中,你需要逃避这样的:
"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA"
字符|
在正则表达式特殊含义,来转义。
答
要告诉你哪个是最适合你的问题是不可能的,没有更多的信息。
一个解决方案可能
String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2);
System.out.println("split = " + Arrays.toString(split));
输出
split = [354858, Some String That Is Important]
答
这里是您的解决方案:
public static void main(String[] args) {
InputStream source; //select a text source (should be a FileInputStream)
{
String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" +
"543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
}
try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) {
Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$");
while(true) {
String line = stream.readLine();
if(line == null) {
break;
}
Matcher matcher = pattern.matcher(line);
if(matcher.matches()) {
String someNumber = matcher.group(1);
String someText = matcher.group(2);
//do something with someNumber and someText
} else {
throw new ParseException(line, 0);
}
}
} catch (IOException | ParseException e) {
e.printStackTrace(); // TODO ...
}
}
答
你可以使用正则表达式,但如果你知道每一行包含AA
和你想要的内容,以AA
你可以简单地做substring(int,int)
,以获得该行的部分达到AA
public List read(Path path) throws IOException {
return Files.lines(path)
.map(this::parseLine)
.collect(Collectors.toList());
}
public String parseLine(String line){
int index = line.indexOf("AA");
return line.substring(0,index);
}
这里是read
public List read(Path path) throws IOException {
List<String> content = new ArrayList<>();
try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){
String line;
while((line = reader.readLine()) != null){
content.add(parseLine(line));
}
}
return content;
}
答
非Java8版本,您可以逐行读取文件中的行,并排除其中包含AAcharSequence
部分:
final String charSequence = "AA";
String line;
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename")));
try {
while ((line = r.readLine()) != null) {
int pos = line.indexOf(charSequence);
if (pos > 0) {
String myImportantStuff = line.substring(0, pos);
//do something with your useful string
}
}
} finally {
r.close();
}
你想要什么叫[正则表达式](https://en.wikipedia.org/wiki/Regular_expression)。 – m0skit0
这就是我一直在寻找的,谢谢! – Flatron
确定“AA”不会出现在“某些重要的字符串”中吗? –