You may need to use the gpu_memory_limit and/or lora_on_cpu config options in order to avoid jogging outside of memory. If you continue to run away from CUDA memory, you can make an effort to merge in technique RAM https://lilianbjvf777336.mdkblog.com/profile