WhatWeb源码分析之运行流程
第一篇熟悉了部分WhatWeb源码,这一篇记录调试WhatWeb,梳理得到的WhatWeb运行流程。
调试之前,可以运行一下WhatWeb的帮助,得到WhatWeb提供的所有选项,大致知道WhatWeb提供的功能有哪些。
ruby whatweb -h
.$$$ $. .$$$ $.
$$$$ $$. .$$$ $$$ .$$$$$$. .$$$$$$$$$$. $$$$ $$. .$$$$$$$. .$$$$$$.
$ $$ $$$ $ $$ $$$ $ $$$$$$. $$$$$ $$$$$$ $ $$ $$$ $ $$ $$ $ $$$$$$.
$ `$ $$$ $ `$ $$$ $ `$ $$$ $$' $ `$ `$$ $ `$ $$$ $ `$ $ `$ $$$'
$. $ $$$ $. $$$$$$ $. $$$$$$ `$ $. $ :' $. $ $$$ $. $$$$ $. $$$$$.
$::$ . $$$ $::$ $$$ $::$ $$$ $::$ $::$ . $$$ $::$ $::$ $$$$
$;;$ $$$ $$$ $;;$ $$$ $;;$ $$$ $;;$ $;;$ $$$ $$$ $;;$ $;;$ $$$$
$$$$$$ $$$$$ $$$$ $$$ $$$$ $$$ $$$$ $$$$$$ $$$$$ $$$$$$$$$ $$$$$$$$$'
WhatWeb - Next generation web scanner version 0.4.8-dev.
Developed by Andrew Horton aka urbanadventurer and Brendan Coles.
Homepage: http://www.morningstarsecurity.com/research/whatweb
Usage: whatweb [options] <URLs>
TARGET SELECTION:
<TARGETs> Enter URLs, hostnames, IP adddresses,
filenames, or nmap-format IP address ranges.
--input-file=FILE, -i Read targets from a file. You can pipe
hostnames or URLs directly with -i /dev/stdin.
TARGET MODIFICATION:
--url-prefix Add a prefix to target URLs.
--url-suffix Add a suffix to target URLs.
--url-pattern Insert the targets into a URL.
e.g. example.com/%insert%/robots.txt
AGGRESSION:
The aggression level controls the trade-off between speed/stealth and
reliability.
--aggression, -a=LEVEL Set the aggression level. Default: 1.
1. Stealthy Makes one HTTP request per target and also
follows redirects.
3. Aggressive If a level 1 plugin is matched, additional
requests will be made.
4. Heavy Makes a lot of HTTP requests per target. URLs
from all plugins are attempted.
HTTP OPTIONS:
--user-agent, -U=AGENT Identify as AGENT instead of WhatWeb/0.4.8-dev.
--header, -H Add an HTTP header. eg "Foo:Bar". Specifying a
default header will replace it. Specifying an
empty value, e.g. "User-Agent:" will remove it.
--follow-redirect=WHEN Control when to follow redirects. WHEN may be
`never', `http-only', `meta-only', `same-site',
`same-domain' or `always'. Default: always.
--max-redirects=NUM Maximum number of redirects. Default: 10.
AUTHENTICATION:
--user, -u=<user:password> HTTP basic authentication.
--cookie, -c=COOKIES Use cookies, e.g. 'name=value; name2=value2'.
PROXY:
--proxy <hostname[:port]> Set proxy hostname and port.
Default: 8080.
--proxy-user <username:password> Set proxy user and password.
PLUGINS:
--list-plugins, -l List all plugins.
--info-plugins, -I=[SEARCH] List all plugins with detailed information.
Optionally search with keywords in a comma
delimited list.
--search-plugins=STRING Search plugins for a keyword.
--plugins, -p=LIST Select plugins. LIST is a comma delimited set
of selected plugins. Default is all.
Each element can be a directory, file or plugin
name and can optionally have a modifier, +/-.
Examples: +/tmp/moo.rb,+/tmp/foo.rb
title,md5,+./plugins-disabled/
./plugins-disabled,-md5
-p + is a shortcut for -p +plugins-disabled.
--grep, -g=STRING Search for STRING in HTTP responses. Reports
with a plugin named Grep.
--custom-plugin=DEFINITION Define a custom plugin named Custom-Plugin,
Examples: ":text=>'powered by abc'"
":version=>/powered[ ]?by ab[0-9]/"
":ghdb=>'intitle:abc \"powered by abc\"'"
":md5=>'8666257030b94d3bdb46e05945f60b42'"
"{:text=>'powered by abc'}"
--dorks=PLUGIN List Google dorks for the selected plugin.
OUTPUT:
--verbose, -v Verbose output includes plugin descriptions.
Use twice for debugging.
--colour,--color=WHEN control whether colour is used. WHEN may be
`never', `always', or `auto'.
--quiet, -q Do not display brief logging to STDOUT.
--no-errors Suppress error messages.
LOGGING:
--log-brief=FILE Log brief, one-line output.
--log-verbose=FILE Log verbose output.
--log-errors=FILE Log errors.
--log-xml=FILE Log XML format.
--log-json=FILE Log JSON format.
--log-sql=FILE Log SQL INSERT statements.
--log-sql-create=FILE Create SQL database tables.
--log-json-verbose=FILE Log JSON Verbose format.
--log-magictree=FILE Log MagicTree XML format.
--log-object=FILE Log Ruby object inspection format.
--log-mongo-database Name of the MongoDB database.
--log-mongo-collection Name of the MongoDB collection.
Default: whatweb.
--log-mongo-host MongoDB hostname or IP address.
Default: 0.0.0.0.
--log-mongo-username MongoDB username. Default: nil.
--log-mongo-password MongoDB password. Default: nil.
PERFORMANCE & STABILITY:
--max-threads, -t Number of simultaneous threads. Default: 25.
--open-timeout Time in seconds. Default: 15.
--read-timeout Time in seconds. Default: 30.
--wait=SECONDS Wait SECONDS between connections.
This is useful when using a single thread.
HELP & MISCELLANEOUS:
--short-help Short usage help.
--help, -h Complete usage help.
--debug Raise errors in plugins.
--version Display version information.
EXAMPLE USAGE:
* Scan example.com.
./whatweb example.com
* Scan reddit.com slashdot.org with verbose plugin descriptions.
./whatweb -v reddit.com slashdot.org
* An aggressive scan of wired.com detects the exact version of WordPress.
./whatweb -a 3 www.wired.com
* Scan the local network quickly and suppress errors.
whatweb --no-errors 192.168.0.0/24
* Scan the local network for https websites.
whatweb --no-errors --url-prefix https:// 192.168.0.0/24
* Scan for crossdomain policies in the Alexa Top 1000.
./whatweb -i plugin-development/alexa-top-100.txt \
--url-suffix /crossdomain.xml -p crossdomain_xml
OPTIONAL DEPENDENCIES
--------------------------------------------------------------------------------
To enable MongoDB logging install the mongo gem.
To enable character set detection and MongoDB logging install the rchardet gem.
可以看到WhatWeb提供了丰富选项,在这里我选参数v运行WhatWeb获取一个特定目标的指纹,来梳理WhatWeb的运行流程。
在whatweb源代码的680行下断点,开始调试。
上面这一段代码到741行结束是变量初始化的过程,其中GetoptLong.new是构建参数解析。继续向下执行,其中743行开始到933行结束是对用户输入的值进行解析。如下所示:
这里,我们只输入了-v参数,在745行下断点,可以看到如下所示:
变量verbose的值会加1。继续运行会跳转到对终端输出颜色的配置,根据操作系统类型进行设置。
继续跟进,根据判断条件来进行检测插件的选择。如果没有指定自定义插件,那么就会加载缺省插件。因为我只指定-v参数,那么use_custom_plugin是false、plugin_selection是nil。我们跟进到PluginSupport.load_plugins函数看看,这个函数就是加载插件目录的。
进入load_plugins函数可以看到设置了缺省目录。
在这个目录下搜索插件识别文件。下面的代码是加载相关插件文件。
继续跟进到函数load_plugin中去,跟进298这一行的load f,看下面三张截图
这个组合起来就比较好理解下面的赋值。
继续下去有一个优化插件的函数调用:
跟进去看一看,就是对插件识别脚本的进一步细化。
跳出上面的函数,继续往下调试,就到了定义HTTP Request的头了,可以用户自定义,也可以采用缺省值。
没什么好说的,继续下去,接着是对目标url的筛选:
跟进去看看:
这里需要注意的是这几个正则表达式,下面这个是匹配类似192.168.0.1-200这种表示形式的IP范围字符串,后面一个正则表达式是不匹配单个IP地址。
接下来是对URL地址进行规则化:
最重要的部分就要来了,处理指定的URL,获取指纹信息。这算是核心代码段了。这一块我调试了很久,也比较迷惑,因为是多线程,存在线程切换,容易糊涂,不一定说的完全明白,试着写一写。
这是调用next_target函数,获取目标URL地址。跟进到next_target函数看看:
这是赋值的过程。最后有一个判断最近目标的值是不是超过100个:
跳出next_target函数之后,继续执行,这里是一个类似do…while的代码段。跳转到判断线程是不是超过缺省值。
继续跟进,跳转到Thread.new(do) do |thistarget|代码块中进行执行,如下:
跟进进去,进入对target的初始化函数:
继续调试的过程中,会在几个代码段来回切换,继续跟进下去:
这里设置了一些参数,然后对目标URL进行访问,会得到HTTP请求的Response,包含很详细的各字段。
继续跟进,到了根据HTTP返回值与插件文件进行匹配的部分:
跟进去看看,就是实现的代码了:
上面的实现,涉及到锁的使用。
最后就是结果的输出了。
上面的运行流程分析还是比较粗糙,下一篇继续深入分析部分实现细节。