llm · rl
01 / 09
llama-rloo-reasoning
From-scratch REINFORCE / RLOO with PPO-style clipping and a
KL penalty, applied to fine-tuning Llama-3.2-1B-Instruct for
<think> reasoning on GSM8K. A miniature, didactic version
of the recipe behind DeepSeek-R1-style reasoners — rule-based reward, no
preference data, no learned reward model. Includes an ablation over KL
strength and group size with a write-up of the format-vs-reasoning trade-off.
PyTorchTransformersTRLRLOO
repository →
multi-agent llm
02 / 09
llm-opinion-dynamics
Treat the LLM as a particle in a statistical-mechanics system: sweep the
qwen2.5 family from 0.5b to 14b ×
eight temperatures, study single-agent distributions and N=4 multi-agent
opinion dynamics over 10 rounds. Headline finding: the genuine control
parameter is model size, not sampling temperature. Local
Ollama, CrewAI, SLURM template for CINECA Leonardo.
OllamaCrewAISLURMLeonardo
repository →
bayesian ml
03 / 09
gp-kepler-from-scratch
Quasi-periodic Gaussian Process built from scratch —
hand-written kernel, log marginal likelihood via Cholesky factorisation,
scipy.optimize for hyperparameters, posterior predictive from
the standard equations — fit to a real Kepler stellar light curve to recover
a candidate rotation period. Lomb–Scargle seeds the period prior; data
fetched live from the STScI archive.
NumPySciPyAstropyGPs
repository →
gpu computing
04 / 09
GPU Offloading — MHPC
GPU portfolio from first CUDA kernels up to a production-quality
Lattice Boltzmann fluid solver. Shared-memory transposes
with bank-conflict avoidance, distributed matmul via MPI + cuBLAS
(Cannon-style), and a multi-GPU 2D Jacobi solver.
CUDAOpenACCMPIOpenMP
repository →
distributed
05 / 09
Parallel Computing
Eight progressive HPC projects: distributed identity and matmul,
Cannon's algorithm, OpenMP fundamentals, Jacobi solvers (pure MPI →
hybrid MPI+OpenMP → parallel HDF5), and a 3D diffusion solver with FFTW3-MPI.
Each project compares blocking, non-blocking, and collective comms.
MPIOpenMPHDF5FFTW3
repository →
microarchitecture
06 / 09
Single-Core Optimization
Annotated C examples for why code performs the way it does on
modern hardware: cache hierarchies (memory mountains, blocked transpose,
AoS vs. SoA), branch prediction, loop reordering and unrolling for ILP,
software prefetching, sparse-matrix layouts, FP rounding. Benchmarked
across laptop, Leonardo, and LUMI.
CperfcacheSIMD
repository →
eigenvalue solver
07 / 09
SLEPc Schrödinger Solver
Real-space finite-difference eigenvalue solvers for the time-independent
Schrödinger equation on PETSc / SLEPc. 2D and 3D grids, Dirichlet and periodic
boundaries, MPI through DMDA, VTK output. Validated against particle-in-a-box,
hydrogen, and the 2D Kronig-Penney lattice.
PETScSLEPcMPIC++
repository →
atmospheric sim
08 / 09
Best Practice — Thermal Bubble
Group project (with C. Veraldi and Zhaokun): a 2D Fortran
atmospheric simulation modelling a thermal vapour bubble rising in dry air,
with potential-temperature forcing on a fixed grid. Configurable through
namelists, parallelised with MPI + OpenACC, SLURM templates for Leonardo.
FortranMPIOpenACCSLURM
repository →
modern c++
09 / 09
Advanced C++
Modern C++17 from fundamentals through templates and metaprogramming, then
extending into parallel scientific computing. Smart pointers, STL with
custom timers, a 2D heat-equation Jacobi solver, MPI distributed memory,
std::execution shared memory, GoogleTest, and Conway's Game
of Life in SFML.
C++17MPIGoogleTestSFML
repository →