SISSA / ICTP · Trieste, Italy

Gabriel Pedde

|

Master's student in High Performance Computing at SISSA & ICTP. Building parallel and GPU-accelerated scientific solvers — from cache-aware single-core kernels to distributed simulations on Leonardo and LUMI — and applying that toolkit to ML/AI systems: LLM fine-tuning with RL, multi-agent LLM orchestration, and Bayesian inference on real scientific data.

0
featured projects
0
supercomputers
0
parallel paradigms
01

about

I work at the intersection of numerical methods, parallel architectures, and — increasingly — ML systems. Most of my time goes into making physics solvers run faster (vectorising single-core kernels, distributing them with MPI, offloading hot loops to GPUs), and applying that same toolkit to LLM fine-tuning, multi-agent LLM orchestration on cluster GPUs, and Bayesian inference on real data.

The repositories on this site are coursework, group projects, and solo experiments. The HPC ones build with CMake or Make and ship with benchmarks where they make sense; the ML ones are reproducible end-to-end (data fetched from public archives, training reports checked in).

whoami ~/profile
$ whoami
gabriel.pedde
$ cat ./role
MHPC student @ SISSA / ICTP
$ uname -m
x86_64 / sm_80 / aarch64
$ echo $INTERESTS
numerical-pdes parallel-solvers gpu
$ mpirun -n ∞ ./curiosity
02

stack

languages

C++17 / 20 C Fortran Python

parallelism

CUDA OpenACC MPI OpenMP cuBLAS

scientific libraries

PETSc SLEPc HDF5 (parallel) FFTW3-MPI ParaView

ml / ai

PyTorch Transformers TRL CrewAI Ollama scikit-learn

tooling

CMake Linux SLURM Git Bash
benchmarked on Leonardo · CINECA LUMI · EuroHPC SISSA Ulysses Argo · ICTP local · Linux
03

featured projects

llm · rl 01 / 09

llama-rloo-reasoning

From-scratch REINFORCE / RLOO with PPO-style clipping and a KL penalty, applied to fine-tuning Llama-3.2-1B-Instruct for <think> reasoning on GSM8K. A miniature, didactic version of the recipe behind DeepSeek-R1-style reasoners — rule-based reward, no preference data, no learned reward model. Includes an ablation over KL strength and group size with a write-up of the format-vs-reasoning trade-off.

PyTorchTransformersTRLRLOO
repository →
multi-agent llm 02 / 09

llm-opinion-dynamics

Treat the LLM as a particle in a statistical-mechanics system: sweep the qwen2.5 family from 0.5b to 14b × eight temperatures, study single-agent distributions and N=4 multi-agent opinion dynamics over 10 rounds. Headline finding: the genuine control parameter is model size, not sampling temperature. Local Ollama, CrewAI, SLURM template for CINECA Leonardo.

OllamaCrewAISLURMLeonardo
repository →
bayesian ml 03 / 09

gp-kepler-from-scratch

Quasi-periodic Gaussian Process built from scratch — hand-written kernel, log marginal likelihood via Cholesky factorisation, scipy.optimize for hyperparameters, posterior predictive from the standard equations — fit to a real Kepler stellar light curve to recover a candidate rotation period. Lomb–Scargle seeds the period prior; data fetched live from the STScI archive.

NumPySciPyAstropyGPs
repository →
gpu computing 04 / 09

GPU Offloading — MHPC

GPU portfolio from first CUDA kernels up to a production-quality Lattice Boltzmann fluid solver. Shared-memory transposes with bank-conflict avoidance, distributed matmul via MPI + cuBLAS (Cannon-style), and a multi-GPU 2D Jacobi solver.

CUDAOpenACCMPIOpenMP
repository →
distributed 05 / 09

Parallel Computing

Eight progressive HPC projects: distributed identity and matmul, Cannon's algorithm, OpenMP fundamentals, Jacobi solvers (pure MPI → hybrid MPI+OpenMP → parallel HDF5), and a 3D diffusion solver with FFTW3-MPI. Each project compares blocking, non-blocking, and collective comms.

MPIOpenMPHDF5FFTW3
repository →
microarchitecture 06 / 09

Single-Core Optimization

Annotated C examples for why code performs the way it does on modern hardware: cache hierarchies (memory mountains, blocked transpose, AoS vs. SoA), branch prediction, loop reordering and unrolling for ILP, software prefetching, sparse-matrix layouts, FP rounding. Benchmarked across laptop, Leonardo, and LUMI.

CperfcacheSIMD
repository →
eigenvalue solver 07 / 09

SLEPc Schrödinger Solver

Real-space finite-difference eigenvalue solvers for the time-independent Schrödinger equation on PETSc / SLEPc. 2D and 3D grids, Dirichlet and periodic boundaries, MPI through DMDA, VTK output. Validated against particle-in-a-box, hydrogen, and the 2D Kronig-Penney lattice.

PETScSLEPcMPIC++
repository →
atmospheric sim 08 / 09

Best Practice — Thermal Bubble

Group project (with C. Veraldi and Zhaokun): a 2D Fortran atmospheric simulation modelling a thermal vapour bubble rising in dry air, with potential-temperature forcing on a fixed grid. Configurable through namelists, parallelised with MPI + OpenACC, SLURM templates for Leonardo.

FortranMPIOpenACCSLURM
repository →
modern c++ 09 / 09

Advanced C++

Modern C++17 from fundamentals through templates and metaprogramming, then extending into parallel scientific computing. Smart pointers, STL with custom timers, a 2D heat-equation Jacobi solver, MPI distributed memory, std::execution shared memory, GoogleTest, and Conway's Game of Life in SFML.

C++17MPIGoogleTestSFML
repository →
04

currently

let's talk.

About a role, a paper, or a tricky parallel bug.