Jump to content

Search the Community

Showing results for tags 'parallel'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Altair Support Forum
    • Welcome to Altair Support Forum
    • Installation , Licensing and Altair One
    • Modeling & Visualisation
    • Solvers & Optimization
    • Multi Body Simulation
    • Conceptual design and Industrial design
    • Model-Based Development
    • Manufacturing Simulation
    • CAE Process Automation
  • Academic Partner Alliance Forum
    • APA - Composites
    • APA - CFD & Thermal
    • APA - Vehicle Dynamics
    • APA - Manufacturing
    • APA - Crash and Safety
    • APA - Noise, Vibration and Harshness
    • APA - System Level Design
    • APA - Structural and Fatigue
    • APA - Marine
    • APA - Optical Design
  • Japanユーザーフォーラム
    • ユーザーフォーラムへようこそ
    • Altair製品の意外な活用例
    • インストール / ライセンス / Altair One / その他
    • モデリング(プリプロセッシング)
    • シミュレーション技術(ソルバー)
    • データ可視化(ポストプロセッシング)
    • モデルベース開発
    • コンセプト設計と工業デザイン
    • 製造シミュレーション
    • CAE プロセスの自動化
    • エンタープライズソリューション
    • データアナリティクス
    • 学生向け無償版(Altair Student Edition)

Categories

There are no results to display.


Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


AIM


MSN


Website URL


ICQ


Yahoo


Jabber


Skype


Interests


Organization

Found 9 results

  1. I am receiving the following error while trying to run a job on a Linux cluster: (Scratch disk space usage for starting iteration = 2704 MB) *** ERROR # 158 *** Error encountered when accessing the scratch file. error from subroutine xdslrs Solver error no. = -702 This may be caused by insufficient disk space or some other system resource related limitations. Try one or more of the following workarounds if you think otherwise and if this issue happens on Windows. - Resubmit a job. - Avoid writing scratch files in the drive where the Operating System is installed (start the job on other drive or use TMPDIR/-tmpdir options). - Disable real time protection at minimum for files with extension rs~ . - Use of environment variable OS_SCRATCH_EXT=txt may help. This error was detected in subroutine adjslvtm. *** ERROR # 5019 *** Specified temprature vectors (1 - 1) out of allowed range (1 - 0). This error occurs in module "snsdrv". ************************************************************************ RESOURCE USAGE INFORMATION -------------------------- MAXIMUM MEMORY USED 4985 MB IN ADDITION 177 MB MEMORY WAS ALLOCATED FOR TEMPORARY USE INCLUDING MEMORY FOR MUMPS 2929 MB MAXIMUM DISK SPACE USED 5939 MB INCLUDING DISK SPACE FOR MUMPS 3934 MB ************************************************************************ I've tried some the troubleshooting suggestions in the error log without any luck: I've specified the scratch drive to a large file system (petabytes of storage) I've set OS_SCRATCH_EXT=txt Since this is a high performance computing environment, I don't have the ability to access "real time protection" options; however, I'm not aware of any virus protection running on the cluster and, furthermore, the OS_SCRATCH_EXT=txt should fix virus scan issues as I expect the system would let text files pass. I should note that I am trying to run this problem as a parallel job with the following command: $ALTAIR_HOME/scripts/invoke/init_solver.sh -mpi pl -ddm -np 2 -nt 8 -scr $SCRATCH -outfile $WORK <filename>.fem Below are some other relevant notes: If I try to run this job in serial (i.e., without the -mpi pl -ddm -np 2), I don't experience the above error; so, this appears to be something that arises when trying to run mpi jobs. I've tried running this job with -mpi i and my system doesn't seems to be setup for intel based mpi (unable to find required .so files). Cluster node information: Dual Socket Xeon E5-2690 v3 (Haswell) : 12 cores per socket (24 cores/node), 2.6 GHz 64 GB DDR4-2133 (8 x 8GB dual rank x8 DIMMS) Hyperthreading Enabled - 48 threads (logical CPUs) per node The $SCRATCH drive I'm pointing to is a network drive. I tried running -scr slow=1,$SCRATCH, but still get the same error. When I make the above ddm call, I have requested 2x nodes and a total of 48 mpi processes (although I'm not using them all) Thoughts?
  2. Hello Everyone! I am using Hyperworks 14.0. I Have two 8 core machine and I want to run a large analysis. Is it possible to run parallel in both machines using spmd ? Kindly guide me how to do. Thanks in Advance. P.S.: I have needed license
  3. AcuSolve uses a mix of shared and distributed memory parallel message passing. This hybrid MPI approach works very well for large compute systems and allows simulations to be scaled to thousands of compute cores. The suitability for a given simulation to scale to thousands of processors depends on a number of factors including the model size, the architecture of the compute platform, the network infrastructure, and the physics being solved for. However, there are some rough guidelines that can be used to determine an appropriate number of cores to use for a given simulation. For compute systems that utilize a high speed message passing network (i.e. Infiniband, Myrinet, etc.), AcuSolve is expected to scale nearly linearly down to approximately 10,000 finite element nodes per subdomain. In other words, AcuSolve should have nearly perfect parallel efficiency as long as there are at least 10,000 finite element nodes in each subdomain. If the number of processors is increased beyond this level, simulations will still run faster, but the parallel efficiency may not be ideal. To fully optimize the run time of your simulation on a given compute cluster, it is necessary to perform a scalability study to vary the number of processors and monitor the run time. However, the aformentioned guidelines can be used as a rough estimate. The AcuSolve log file contains a value called the "Interface node fraction". This value may also be of interest when determining how many processors is suitable for your parallel run. This value reports the fraction of nodes in the model that fall on subdomain boundaries. AcuSolve typically scales linearly when this value is less than ~.15.
  4. Method 1: acuRun -np 6 -hosts node1,node2 Result: The node list is repeated to use the specified number of processors. The processes are assigned: node1, node2, node1, node2, node1, node2 Method 2: acuRun -np 6 -hosts node1,node1,node2,node2,node1,node2 Result: The processes are assigned: node1, node1, node2, node2, node1, node2 Method 3: acuRun -np 6 -hosts node1:2,node2:4 Result: The processes are assigned: node1, node1, node2, node2, node2, node2
  5. Hello Trying to use gpu acceleration in windows 7 , but getting the following error mpid: CreateProcess failed: Cannot execute C:/Apl/Altair/12.0/acusolve/win64/bin/acuSolve-gpu-pmpi.exe acuRun: *** ERROR: error occurred executing acuSolve-gpu acuRun: Fri Nov 29 10:09:35 20 I have installed Cuda 5.5 and have a Cuda-capable GPU. any idea on what could be wrong ?
  6. 1. The head node needs to be able to RSH or SSH (without password prompt) to each compute node, and each compute node needs to be able to RSH or SSH (without password prompt) to the head node. 2. The installation and problem directories need to be 'seen' in the same location on the head node and on the compute nodes. Basically this means NFS mounted disks or the like.
  7. The binding of processes to compute cores is not handled by AcuSolve itself. However, when using HP-MPI as the message passing interface, it is possible to control how the processes are distributed on each host. Consider an example involving 2 compute nodes having dual socket motherboards, and quad core processors in each socket (total of 8 cores per node). A typical core map is shown below, illustrating the socket ID and processor rank of each core: Socket Id CPU Rank 0 0,2,4,6 1 1,3,5,7 With this in mind, the following environment variable can be used to force HP-MPI to fill the cores by rank id: setenv MPIRUN_OPTIONS="-cpu_bind=v,rank" When this is set, the first process on the host will be assigned to socket 0 (filling the core with rank 0), the second process to socket 1 (filling the core with rank 1), and so on. The appropriate acuRun command to place 1 process on each socket of a dual socket quad core system would simply be: acuRun -np 4 -hosts host1,host2
  8. Early versions of ofed had a bug in the implementation of the fork() function. This function is needed byAcuSolve to properly launch parallel processes. This bug is known to appear in ofed 1.1. To determine the version of ofed installed on your system, execute the following command: rpm -q -a | grep ofed If you are having trouble launching AcuSolve in parallel, and the ofed version is 1.1, please set the following environment variable before launching the solver: setenv ACUSIM_LIC_TYPE "LIGHT" This will force AcuSolve to spawn the parallel processes using a method that works around the bug in ofed. Note that the bug was fixed in ofed 1.2 and newer.
  9. HP-MPI requires a remote shell command to spawn remote processes. AcuSolve allows users to select this remote shell via the -rsh command line parameter. This allows users to use standard UNIX/Linux utilities such as rsh or ssh in addition to custom wrapper scripts that may be necessary on some systems. Although this provides a high level of flexibility, most systems simply use ssh to perform the remote shell calls. However, this requires that the system be set up to permit password free logins. To accomplish this, the following procedure should be followed. Execute the following sequence of commands from a shell prompt: $ ssh-keygen -t dsa Press return when prompted for a password (i.e. leave it blank). $ cd ~/.ssh $ cat id_dsa.pub >> authorized_keys $ chmod go-rwx authorized_keys $ chmod go-w ~ ~/.ssh $ cp /etc/ssh/ssh_config $HOME/.ssh/config $ echo "CheckHostIP no" >> $HOME/.ssh/config $ echo "StrictHostKeyChecking no" >> $HOME/.ssh/config It may be necessary to repeat the above procedure using rsa instead of dsa (i.e. ssh-keygen -t rsa)
×
×
  • Create New...