正则表达式在日志文件中传递Apache每行
问题描述:
需要帮助,我需要通过此正则表达式传递apache日志文件但不工作,返回false。正则表达式在日志文件中传递Apache每行
private String accessLogRegex()
{
String regex1 = "^([\\d.]+)"; // Client IP
String regex2 = " (\\S+)"; // -
String regex3 = " (\\S+)"; // -
String regex4 = " \\[([\\w:/]+\\s[+\\-]\\d{4})\\]"; // Date
String regex5 = " \"(.+?)\""; // request method and url
String regex6 = " (\\d{3})"; // HTTP code
String regex7 = " (\\d+|(.+?))"; // Number of bytes
String regex8 = " \"([^\"]+|(.+?))\""; // Referer
String regex9 = " \"([^\"]+|(.+?))\""; // Agent
return regex1+regex2+regex3+regex4+regex5+regex6+regex7+regex8+regex9;
}
Pattern accessLogPattern = Pattern.compile(accessLogRegex());
Matcher entryMatcher;
String log = "64.242.88.10 | 2004-07-25.16:36:22 | "GET /twiki/bin/rdiff/Main/ConfigurationVariables HTTP/1.1" 401 1284 | Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)";
entryMatcher = accessLogPattern.matcher(log);
if(!entryMatcher.matches()){
System.out.println("" + index +" : couldn't be parsed");
}
我已经包含apache日志的样本,它的点(“|”)分开。
答
是否有你想使用正则表达式的原因?这些都是很容易出错,容易出错,可维护的噩梦......
另一种可能是使用这个库,例如this one
也就是说,如果你想使用正则表达式,你包含了一些错误的:你给示例日志行
String regex1 = "^([\\d.]+)"; // while quite liberal, this should work
String regex2 = " (\\S+)"; // matches the first pipe
String regex3 = " (\\S+)"; // this will match the date field
String regex4 = " \\[([\\w:/]+\\s[+\\-]\\d{4})\\]"; // date has already been matched so this won't work, also this is all wrong
String regex5 = " \"(.+?)\""; // you're not matching the pipe character before the URL; also, why the ?
String regex6 = " (\\d{3})"; // HTTP code
String regex7 = " (\\d+|(.+?))"; // Why are you also matching any other characters than just digits?
String regex8 = " \"([^\"]+|(.+?))\""; // Your sample log line doesn't contain a referer
String regex9 = " \"([^\"]+|(.+?))\""; // Agent is not enclosed in quotes
一个可能的解决方案正则表达式是此:
String regex1 = "^([\\d.]+)"; // digits and dots: the IP
String regex2 = " \\|"; // why match any character if you *know* there is a pipe?
String regex3 = " ((?:\\d+[-:.])+\\d+)"; // match the date; don't capture the inner group as we are only interested in the full date
String regex4 = " \\|"; // pipe
String regex5 = " \"(.+)\""; // request method and url
String regex6 = " (\\d{3})"; // HTTP code
String regex7 = " (\\d+)"; // Number of bytes
String regex8 = " \\|"; // pipe again
String regex9 = " (.+)"; // The rest of the line is the user agent
当然这可能需要进一步调整,如果其他日志行不遵循完全相同的格式。
谢谢@ brain99,使用解析器lib,我试图以这种格式传递时间yyyy-MM-dd.HH:mm:ss用于'%{format} t'string。 –
如果使用我链接的库,当然 - 只需配置日志格式正确(我相信它应该像'String logformat =“%h |%{%Y-%m-%d。%H:%M:%然后,您可以选择将日期作为时间戳检索,或者在您的POJO(年份,月份)中包含单个字段,天,...) – brain99
哇,你是最好的...请能够上面的logformat工作为由pip分隔的任何apache日志文件谢谢 –