LlamaFactory is an easy-to-use and efficient framework for fine-tuning large language models. It can fine-tune hundreds of pre-trained models locally without writing any code. It supports popular models like Llama, Mistral, Qwen, Phi, etc. It streamlines the training process of LLMs from (continuous) pre-training, supervised fine-tuning to RLHF, accompanying with lower costs and better efficiency. By integrating GPTQ quantization and LoRA method, it can fine-tune a 30B model on a single RTX4090 GPU. It further supports flash attention and vLLM to accelerate the training and inference of LLMs.