# Unsloth LoRA scripts

## Installation

just run `chmod +x setup.sh && ./setup.sh` on pod.  
remeber to `export HF_HOME=/worspace/hf` (to the pvc on the whatever pod)

1. Clone the repository:

```bash
git clone https://git.hye.su/mira/unsloth-train-scripts.git
cd unsloth-train-scripts
```

2. Install pytorch and unsloth:

```bash
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -
pip install gdown  # Optional: Only needed for Google Drive datasets
```

## Project Structure

```
unsloth-lora-training/
├── config.py          # Configuration settings
├── data_loader.py     # Dataset loading and processing
├── model_handler.py   # Model initialization and PEFT setup
├── trainer.py         # Training loop and metrics
├── main.py           # Main training script
└── README.md         # This file
```

## Configuration

All configuration settings are managed in `config.py`. The main configuration class is `TrainingConfig`

To modify the default configuration, edit the `TrainingConfig` class in `config.py`:

```python
@dataclass
class TrainingConfig:
    base_model: str = "unsloth/Qwen2.5-7B"
    max_seq_length: int = 16384
    # ... modify other parameters as needed
```

## Usage

```
gdown --fuzzy 'https://drive.google.com/file/d/1mqhq69dsSOK7ep7trTjRd3FMagTFTrzF/view?usp=sharing'
```

```bash
python main.py \
    --dataset /workspace/dataset_v3.0_alpaca_noinstr_filtered_6144.json \
    --output_dir /workspace/output \
    --wandb_project qwen2.5-lora \
    --wandb_entity luwoyuki-zhtl \
    --wandb_sweep
```

```bash
python main.py \
    --base_model mistralai/Mistral-7B-v0.1 \
    --dataset path/to/your/dataset.json \
    --output_dir ./custom_output
    --hub_token "secret"
```

### Using Google Drive Dataset

Train using a dataset stored on Google Drive:

```bash
python main.py \
    --dataset https://drive.google.com/file/d/your_file_id/view \
    --output_dir ./drive_output
```