的mpirun:无法识别的参数MCA
问题描述:
我有一个C++解算器,我需要使用下面的命令并行运行:的mpirun:无法识别的参数MCA
nohup mpirun -np 16 ./my_exec > log.txt &
此命令将我的节点上独立地对可用16级的处理器上运行my_exec
。这用于完美地工作。
上周,HPC部门进行了操作系统升级,现在,当启动相同的命令时,我收到两条警告消息(针对每个处理器)。第一个是:
--------------------------------------------------------------------------
2 WARNING: It appears that your OpenFabrics subsystem is configured to only
3 allow registering part of your physical memory. This can cause MPI jobs to
4 run with erratic performance, hang, and/or crash.
5
6 This may be caused by your OpenFabrics vendor limiting the amount of
7 physical memory that can be registered. You should investigate the
8 relevant Linux kernel module parameters that control how much physical
9 memory can be registered, and increase them to allow registering all
10 physical memory on your machine.
11
12 See this Open MPI FAQ item for more information on these Linux kernel module
13 parameters:
14
15 http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
16
17 Local host: tamnun
18 Registerable memory: 32768 MiB
19 Total memory: 98294 MiB
20
21 Your MPI job will continue, but may be behave poorly and/or hang.
22 --------------------------------------------------------------------------
23 --------------------------------------------------------------------------
然后我从我的代码,它告诉我它认为我只发射1实现代码(Nprocs
= 1,而不是16)得到的输出。
177
178 # MPI IS ON; Nprocs = 1
179 Filename = ../input/odtParam.inp
180
181 # MPI IS ON; Nprocs = 1
182
183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there
最后,第二警告信息是:
185 --------------------------------------------------------------------------
186 An MPI process has executed an operation involving a call to the
187 "fork()" system call to create a child process. Open MPI is currently
188 operating in a condition that could result in memory corruption or
189 other system errors; your MPI job may hang, crash, or produce silent
190 data corruption. The use of fork() (or system() or other calls that
191 create child processes) is strongly discouraged.
192
193 The process that invoked fork was:
194
195 Local host: tamnun (PID 17446)
196 MPI_COMM_WORLD rank: 0
197
198 If you are *absolutely sure* that your application will successfully
199 and correctly survive a call to fork(), you may disable this warning
200 by setting the mpi_warn_on_fork MCA parameter to 0.
201 --------------------------------------------------------------------------
四处寻找在线后,我试图通过MCA
参数mpi_warn_on_fork
用命令设置为0以下的警告信息建议:
nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &
其中产生以下错误信息:
[[email protected]] match_arg (./utils/args/args.c:194): unrecognized argument mca
[[email protected]] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[[email protected]] parse_args (./ui/mpich/utils.c:2964): error parsing input array
[[email protected]] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
我使用的是RedHat 6.7(圣地亚哥)。我联系了HPC部门,但由于我在大学,这个问题可能需要一两天才能做出回应。任何帮助或指导,将不胜感激。响应
编辑回答:
事实上,我编译我与Open MPI的mpic++
代码在运行英特尔的mpirun
命令的执行,因此错误(OS升级后英特尔mpirun
被设置为默认)。我必须将Open MPI的mpirun
的路径放在$PATH
环境变量的开头。
代码现在按预期方式运行,但我仍然得到上面的第一条警告消息(它不建议我再使用MCA
参数mpi_warn_on_fork
)我认为(但不确定)这是我需要解决的问题HPC部门
答
[[email protected]] match_arg (./utils/args/args.c:194): unrecognized argument mca
[[email protected]] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[[email protected]] parse_args (./ui/mpich/utils.c:2964): error parsing input array
^^^^^
[[email protected]] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
^^^^^
您在最后一种情况下使用MPICH MPICH是不开放MPI及其进程启动不承认--mca
参数是特定于Open MPI(MCA代表模块化组件架构。 - 基本Open MPI构建于此框架之上)一个混合多个MPI实现的典型案例
你在commad中有一个错字是:mpi_warn_on_fork(你写的作品) – Marco
哈对,我用命令的权利,错字发布了问题。 – solalito