# Unsloth LoRA scripts ## Installation just run `chmod +x setup.sh && ./setup.sh` on pod. remeber to `export HF_HOME=/worspace/hf` (to the pvc on the whatever pod) 1. Clone the repository: ```bash git clone https://git.hye.su/mira/unsloth-train-scripts.git cd unsloth-train-scripts ``` 2. Install pytorch and unsloth: ```bash wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python - pip install gdown # Optional: Only needed for Google Drive datasets ``` ## Project Structure ``` unsloth-lora-training/ ├── config.py # Configuration settings ├── data_loader.py # Dataset loading and processing ├── model_handler.py # Model initialization and PEFT setup ├── trainer.py # Training loop and metrics ├── main.py # Main training script └── README.md # This file ``` ## Configuration All configuration settings are managed in `config.py`. The main configuration class is `TrainingConfig` To modify the default configuration, edit the `TrainingConfig` class in `config.py`: ```python @dataclass class TrainingConfig: base_model: str = "unsloth/Qwen2.5-7B" max_seq_length: int = 16384 # ... modify other parameters as needed ``` ## Usage ``` gdown --fuzzy 'https://drive.google.com/file/d/1mqhq69dsSOK7ep7trTjRd3FMagTFTrzF/view?usp=sharing' ``` ```bash python main.py \ --dataset /workspace/dataset_v3.0_alpaca_noinstr_filtered_6144.json \ --output_dir /workspace/output \ --wandb_project qwen2.5-lora \ --wandb_entity luwoyuki-zhtl \ --wandb_sweep ``` ```bash python main.py \ --base_model mistralai/Mistral-7B-v0.1 \ --dataset path/to/your/dataset.json \ --output_dir ./custom_output --hub_token "secret" ``` ### Using Google Drive Dataset Train using a dataset stored on Google Drive: ```bash python main.py \ --dataset https://drive.google.com/file/d/your_file_id/view \ --output_dir ./drive_output ```