Gabriel Pedde

llm · rl 01 / 09

llama-rloo-reasoning

From-scratch REINFORCE / RLOO with PPO-style clipping and a KL penalty, applied to fine-tuning Llama-3.2-1B-Instruct for <think> reasoning on GSM8K. A miniature, didactic version of the recipe behind DeepSeek-R1-style reasoners — rule-based reward, no preference data, no learned reward model. Includes an ablation over KL strength and group size with a write-up of the format-vs-reasoning trade-off.

PyTorchTransformersTRLRLOO

repository →

multi-agent llm 02 / 09

llm-opinion-dynamics

Treat the LLM as a particle in a statistical-mechanics system: sweep the qwen2.5 family from 0.5b to 14b × eight temperatures, study single-agent distributions and N=4 multi-agent opinion dynamics over 10 rounds. Headline finding: the genuine control parameter is model size, not sampling temperature. Local Ollama, CrewAI, SLURM template for CINECA Leonardo.

OllamaCrewAISLURMLeonardo

repository →

bayesian ml 03 / 09

gp-kepler-from-scratch

Quasi-periodic Gaussian Process built from scratch — hand-written kernel, log marginal likelihood via Cholesky factorisation, scipy.optimize for hyperparameters, posterior predictive from the standard equations — fit to a real Kepler stellar light curve to recover a candidate rotation period. Lomb–Scargle seeds the period prior; data fetched live from the STScI archive.

NumPySciPyAstropyGPs

repository →

gpu computing 04 / 09

GPU Offloading — MHPC

GPU portfolio from first CUDA kernels up to a production-quality Lattice Boltzmann fluid solver. Shared-memory transposes with bank-conflict avoidance, distributed matmul via MPI + cuBLAS (Cannon-style), and a multi-GPU 2D Jacobi solver.

CUDAOpenACCMPIOpenMP

repository →

distributed 05 / 09

Parallel Computing

Eight progressive HPC projects: distributed identity and matmul, Cannon's algorithm, OpenMP fundamentals, Jacobi solvers (pure MPI → hybrid MPI+OpenMP → parallel HDF5), and a 3D diffusion solver with FFTW3-MPI. Each project compares blocking, non-blocking, and collective comms.

MPIOpenMPHDF5FFTW3

repository →

microarchitecture 06 / 09

Single-Core Optimization

Annotated C examples for why code performs the way it does on modern hardware: cache hierarchies (memory mountains, blocked transpose, AoS vs. SoA), branch prediction, loop reordering and unrolling for ILP, software prefetching, sparse-matrix layouts, FP rounding. Benchmarked across laptop, Leonardo, and LUMI.

CperfcacheSIMD

repository →

eigenvalue solver 07 / 09

SLEPc Schrödinger Solver

Real-space finite-difference eigenvalue solvers for the time-independent Schrödinger equation on PETSc / SLEPc. 2D and 3D grids, Dirichlet and periodic boundaries, MPI through DMDA, VTK output. Validated against particle-in-a-box, hydrogen, and the 2D Kronig-Penney lattice.

PETScSLEPcMPIC++

repository →

atmospheric sim 08 / 09

Best Practice — Thermal Bubble

Group project (with C. Veraldi and Zhaokun): a 2D Fortran atmospheric simulation modelling a thermal vapour bubble rising in dry air, with potential-temperature forcing on a fixed grid. Configurable through namelists, parallelised with MPI + OpenACC, SLURM templates for Leonardo.

FortranMPIOpenACCSLURM

repository →

modern c++ 09 / 09

Advanced C++

Modern C++17 from fundamentals through templates and metaprogramming, then extending into parallel scientific computing. Smart pointers, STL with custom timers, a 2D heat-equation Jacobi solver, MPI distributed memory, std::execution shared memory, GoogleTest, and Conway's Game of Life in SFML.

C++17MPIGoogleTestSFML

repository →

Gabriel Pedde

about

stack

languages

parallelism

scientific libraries

ml / ai

tooling

featured projects

llama-rloo-reasoning

llm-opinion-dynamics

gp-kepler-from-scratch

GPU Offloading — MHPC

Parallel Computing

Single-Core Optimization

SLEPc Schrödinger Solver

Best Practice — Thermal Bubble

Advanced C++

currently

let's talk.