data gathering, pre-processing, model development and vali-
dation, and inverse design. Discovering new materials requires
complex calculations that require HPC systems. The research
facility’s HPC system includes servers, storage, networking,
and GPUs. This study examines the on-site research infrastruc-
ture and HPC system design used to find new materials. This
study makes system, user, and application assertions based on
novel material modeling use cases. The study facility’s HPC
system is designed to handle high-performance computing
workloads needed for material modeling and discovery. The
system’s infrastructure was carefully selected to provide high
computing power, large memory capacity, scalability, and fast
connectivity [Fig 2]:
1) Management node : The infrastructure server manages
all system resources, including compute units, GPUs, storage,
and networking, and schedules jobs. Dell EMC PowerEdge
R740 servers with Intel-Xeon Silver 4210 processors are this
system’s backbone servers. With 10 cores and 20 threads,
this processor can handle large-scale machine learning tasks.
The server’s 192GB memory allows efficient system resource
control. The Dell EMC PowerEdge R740 server has a 2.2
GHz Intel Xeon Silver 4210 processor that can turbo boost
to 3.2 GHz. It has 10 cores and 20 threads. CPU IPC is 2.
One CPU per unit. The 24-DIMM system has 3 terabytes of
RAM. Three double-width GPUs or six single-width GPUs
are allowed. Reference the benchmark study. [9]
2) Compute node : Compute nodes run machine
learning tasks. The AMD-EPYC-7742-powered Dell-
EMC-PowerEdge-C6525 server is the system’s compute
core. The Dell EMC PowerEdge C6525 server with 4
AMD-EPYC-7742 processors has a total of 256 cores .
The rack-mounted Dell-EMC-PowerEdge-C6525 server can
hold four nodes per chassis and takes up two vertical rack
units. AMD-EPYC-7742 processors power each unit. The
AMD-EPYC-7742 CPU has 64 cores, 128 threads, a base
clock speed of 2.25 GHz, and a peak clock speed of 3.4
GHz. The server supports 4 terabytes of node RAM. See
CPU benchmark result. [10]
3) GPU : The current High-Performance Computing
(HPC) system uses deep learning-optimized NVIDIA-A100-
Tensor-Core [11] Graphics Processing Units to accelerate the
research facility’s varied Machine Learning (ML) workloads.
These GPUs can speed up Machine Learning (ML) workloads,
improving system efficiency. Tensor Cores in NVIDIA A100
GPUs speed matrix calculations, making them ideal for Ma-
chine Learning (ML) workloads. It has 6,912 CUDA cores and
40 GB of HBM2 RAM with 1.6 TB/s bandwidth. The GPU
operates at 1.41 GHz and 1.54 GHz. The Ampere architecture
makes the device ideal for high-performance computing.
4) Storage : Dell EMC PowerScale OneFS provides
scalable, high-performance storage for big machine learning
datasets in the HPC system. PowerScale’s OneFS storage
solution can be expanded to meet workload needs, providing
high-capacity, high-performance storage for the system. Dell
EMC Isilon scale-out Network Attached Storage [3] is ideal
for machine learning (ML) tasks due to its high throughput and
IOPS. Dell EMC PowerScale OneFS can store machine learn-
ing data. The device can handle 50 petabytes of data. It also
provides a machine-learning-optimized computing platform.
OneFS supports NFS, SMB, and HDFS. It also compresses,
deletes, and protects data.
5) Networking: : Mellanox InfiniBand HDR networking
connects the HPC system’s components quickly. This network-
ing solution optimizes data transfer between the infrastructure
server, compute nodes, GPUs, and storage components, im-
proving system efficiency. Mellanox InfiniBand HDR network-
ing technology has 200Gb/s HDR adapters with low latency
and high bandwidth, making it ideal for machine learning
tasks. Mellanox InfiniBand HDR networking gives up to 200
Gb/s per port. Mellanox InfiniBand HDR 200Gb/s adapters
are designed for HPC and AI/ML apps. Mellanox offers PCIe
Gen4 and Gen5 InfiniBand HDR 200Gb/s ports. [12] Research
Facility is using 4 Mellanox InfiniBand HDR IB switches with
40 ports each which can give 16 TB of speed. It is calculated
by number of IB (4) x Number of ports (40) x (200 Gb/s) =
16TB
A. Performance Evaluation
Data pre-processing, machine learning model construction
and verification, and ANSYS software simulations to find
novel materials with desired qualities are expected workloads.
The system comprises of a 1 x infrastructure node, 100 x
compute nodes, 50 x instances of GPU nodes, one storage
unit, and 4 x Mellanox InfiniBand HDR network. On-site
high-performance computing (HPC) equipment uses a cluster
for parallel computing. This cluster architecture is optimized
for parallel processing, making it ideal for large-dataset ma-
chine learning tasks. Parallel computing clusters distribute
workloads across numerous nodes, improving performance
and processing speed.
B. TFLOPS Calculation for On Premise HPC Systems
The dual CPU server Dell EMC PowerEdge R740 [20]
has peak node efficiency of CPU Speed (2.2)x No. of CPU
cores(10)x CPU IPC (2) x No. of CPU node (2) = 88 GFLOPS
and PowerEdge C6525 [21] servers exhibit processing capa-
bilities of CPU Speed (2.25)x No. of CPU cores(64)x CPU
IPC (2) x No. of CPU node (2) = 576 GFLOPS X 100
(nodes) = 57600 GFLOPS The processing capability of the
NVIDIA A100 Tensor Core GPU is reported to be up to 19.5
TFLOPS. We can assume that the storage and networking
components do not contribute to the TFLOPS capacity of
the system. The computation of the overall TFLOPS capacity
of the system can be achieved by employing the subsequent
formula: Infra node (88 GFLOPS) +Compute Node( 57600
GFLOPS)= 57688 GFLOPS = 57.688 TFLOPS GPU (19.5
TFLOPS) x 50 (GPU Nodes) = 975 TFLOPS Total Peak
performance of existing HPC = 975 + 57.688 TFLOPS =
1032.688 = 1.032 PFLOPS The High-Performance Computing
(HPC) architecture employed for the purpose of material
discovery involves an infrastructure node, 100 compute nodes,