The Coaraci computer cluster:
Coaraci is a computer cluster that consists of 13008 processing cores of AMD EPYC ROME and 42 GPGPUs NVIDIA A30. The cluster has nodes with different specifications. The first node is the Login node, which has 16 cores at 2.8GHz, 64GB of memory, 2x480GB SSD, and 2x InfiniBand HDR100. The second node is the Management node with 32 cores at 2.8GHz, 256GB of memory, 2x480GB SSD, and 2x InfiniBand HDR100. The third node is the Visualization node with 16 cores at 2.8GHz, 64GB of memory, 2x480GB SSD, and 1x GPU NVIDIA Tesla T4.
The cluster has then 256 compute nodes with 48 cores at 2.8GHz, 128GB of memory, 1x480GB SSD, and InfiniBand HDR100 and 14 GPU nodes 48 cores at 2.8GHz, 128GB of memory, 1x 480GB SSD, InfiniBand HDR100, and 3x GPU A30 24GB.
The storage is handled by a Fat node with 148 cores at 2.8GHz, 2TB of memory, 4x 12TB HD, and InfiniBand HDR100, and a JBOD for Storage and Metadata with 80x HDDs of 16TB each, totaling 1.280 PB, and 4x SSDs of 3.84TB, totaling 15.36TB.
The Kahuna computer cluster:
Architecture
24-Port GigE | 48-Port GigE | |
Redundant PS for GigE | ||
IB | IB | |
48-Port GigE | IB | |
48-Port GigE | IB | |
Redundant PS for GigE | IB | |
4 Compute-Nodes n017-n020 |
||
4 Compute-Nodes n013-n016 |
||
4 Compute-Nodes n009-n012 |
||
4 Compute-Nodes n005-n008 |
||
UV20 Service3 |
4 Compute-Nodes n001-n004 |
|
Login-Node Service1 |
gn028 | gn024 |
gn027 | gn023 | |
Head-Node Kahuna-bkp |
gn026 | gn022 |
gn025 | gn021 | |
Head Node Kahuna-adm |
gn020 | |
Console | gn019 | |
gn046 | gn060 | gn018 |
gn045 | gn059 | gn017 |
gn044 | gn058 | gn016 |
gn043 | gn057 | gn015 |
gn042 | gn056 | gn014 |
gn041 | gn055 | gn013 |
gn040 | gn054 | gn012 |
gn039 | gn053 | gn011 |
gn038 | gn052 | gn010 |
gn037 | gn051 | gn009 |
gn036 | gn050 | gn008 |
gn035 | gn049 | gn007 |
gn034 | gn048 | gn006 |
gn033 | gn047 | gn005 |
gn032 | Storage Service2 |
gn004 |
gn031 | gn003 | |
gn030 | gn002 | |
gn029 | gn001 | |
Total | 85 Nodes |
Cluster Composition – Machines with Xeon Processor
1 Head node com 24 threads Intel Xeon E5-2643 v3 de 3.40GHz, 128G memory and one placa Nvidia Quadro 5000 GPU.
1 Login node com 40 threads HT Intel Xeon E5-2660 v2 de 2.20GHz, 128G memory and one Nvidia Quadro 5000 GPU.
1 Viz-Server node com 40 threads HT Intel Xeon E5-2660 v2 de 2.20GHz, 128G memory and one Nvidia Quadro 5000 GPU.
32 Graphic nodes com 40 threads HT Intel Xeon E5-2670 v2 de 2.50GHz, 64G memory and 2 Nvidia Tesla K20M GPU.
1 UV20 node com 64 threads HT Intel Xeon E5-4650L de 2.60GHz, 1TB memory and one Intel Xeon-Phi 57 threads co-processor.
28 Graphic nodes com 48 threads HT Intel Xeon E5-2670 v3 de 2.30GHz, 64G memory and 2 Nvidia Tesla K40M GPU.
20 Compute nodes com 48 threads HT Intel Xeon E5-2670 v3 de 2.30GHz, 64G memory.
1 Storage node com 32 threads HT Intel Xeon E5-2660 de 2.20GHz, 128G memory e 72 4TB HDD on RAID6.
Total Cores – Xeon Processors only
1896 physical cores
Total Cluster Memory
6,632 TB
Theoretical Performance (in flops) – Theoretical Calculation of CPUs
- 64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz 2.50 GHz x 10 CPU cores x 10 flops/cycle X 2 processors = 500 Gflops/node
- 32 nodes x 500 Gflops/node = 16 Tflops
- 4 processors Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz
- 2.6 GHz x 8 CPU cores x 8 flops/cycle x 4 processors = 665 Gflops/node
- 96 processors Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
- 2.30 GHz x 12 CPU cores x 12 flops/cycle x 2 processors = 662.4 Gflops/processor
- 48 nodes x 662.4 Gflops/node = 31.795 Tflops
Total Tflops All Cluster
- 16 Tflops + 665 Gflops/node + 31.795 Tflops = 48.46 Tflops/cluster
Sistema Operacional disponível no cluster:
SUSE Linux Enterprise Server 12 SP 1
Kernel: 3.12.59-60.45-default
Os principais softwares disponíveis no cluster são:
* Todos os softwares estão instalados de forma que todos os nodes do cluster tenham acesso e de forma que possibilite a instalação de várias versões sem gerar conflito de bibliotecas.
PBSPro 13.1.0.16
Ambertools 15
Cuda 6.5, Cuda 7.5, Cuda 8.0
Python 2.7 e Python 3.4
Anaconda Python 2.7, Anaconda Python 3.4 e Anaconda Python 3.6
NAMD 2.10, NAMD 2.11 e NAMD 2.12 *Todos com opção local, multi-node e GPU
VMD 1.9.2
Cmake 3.0.2
GCC 4.3, GCC 4.8, GCC 4.9 e GCC 6.2
OpenMPI 1.4.4, OpenMPI 1.6.5, OpenMPI-1.8.3 e OpenMPI-3.0.0
Gromacs-4.6.7, Gromacs 5.0.2, Gromacs-mpi-cuda-5.0.2 e Gromacs-mpi-cuda-5.1.4
FFTW 3.3
Orca 3.0.2 e Orca 4
Compiladores PGI 14.9
Compiladores Intel 2015, Compiladores Intel 2013 e Impi
XMGRACE 5.1.24
Boost 1.58
Openmx 3.7
CGAL 4.6
R-Base
Gamess
Lammps
Gaussian g09
Mpiblast 1.6.0
NWChem 6.6
JRE 1.8
Blast HTC
CCP 4-7.0
Eigen 3
HPCToolkit, version 2016.12
OpenFOAM
Ovito 2.4.2
Papi 5.5.1
Rism3d
Math Libs
Lapack, Blas e MKL.
Softwares para Administração e Monitoração
Software Management Cluster 3.3
Ganglia 3.6.0
NagiosPara utilizar qualquer software da lista, primeiro deve-se carregá-lo usando o sistema de
módulos, conforme os exemplos abaixo:
module avail (lista os módulos dispóníveis)
module load software/modulo (carrega o módulo escolhido)
module list (lista os módulos carregados)
module unload software/modulo (descarrega o módulo escolhido)
Processing Queues
Jobs and Queues
The submission of jobs to the processing nodes is accomplished through queues managed by PBSPro software.
Currently the following queues are defined(Kahuna server):
route: Queue uses maximum of 400 CPUs per user, its time limit is 72h.
bigmem: This queue works on service3 (UV20) node(with 1TB memory and Intel Xeon-Phi)
longa: Queue uses maximum of 400 CPUs per user, its time limit is 480h.
Contact
For any question or suggestion, please contact: