80 lines
2.0 KiB
Markdown
80 lines
2.0 KiB
Markdown
# Unsloth LoRA scripts
|
|
|
|
## Installation
|
|
|
|
just run `chmod +x setup.sh && ./setup.sh` on pod.
|
|
remeber to `export HF_HOME=/worspace/hf` (to the pvc on the whatever pod)
|
|
|
|
1. Clone the repository:
|
|
|
|
```bash
|
|
git clone https://git.hye.su/mira/unsloth-train-scripts.git
|
|
cd unsloth-train-scripts
|
|
```
|
|
|
|
2. Install pytorch and unsloth:
|
|
|
|
```bash
|
|
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -
|
|
pip install gdown # Optional: Only needed for Google Drive datasets
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
unsloth-lora-training/
|
|
├── config.py # Configuration settings
|
|
├── data_loader.py # Dataset loading and processing
|
|
├── model_handler.py # Model initialization and PEFT setup
|
|
├── trainer.py # Training loop and metrics
|
|
├── main.py # Main training script
|
|
└── README.md # This file
|
|
```
|
|
|
|
## Configuration
|
|
|
|
All configuration settings are managed in `config.py`. The main configuration class is `TrainingConfig`
|
|
|
|
To modify the default configuration, edit the `TrainingConfig` class in `config.py`:
|
|
|
|
```python
|
|
@dataclass
|
|
class TrainingConfig:
|
|
base_model: str = "unsloth/Qwen2.5-7B"
|
|
max_seq_length: int = 16384
|
|
# ... modify other parameters as needed
|
|
```
|
|
|
|
## Usage
|
|
|
|
```
|
|
gdown --fuzzy 'https://drive.google.com/file/d/1mqhq69dsSOK7ep7trTjRd3FMagTFTrzF/view?usp=sharing'
|
|
```
|
|
|
|
```bash
|
|
python main.py \
|
|
--dataset /workspace/dataset_v3.0_alpaca_noinstr_filtered_6144.json \
|
|
--output_dir /workspace/output \
|
|
--wandb_project qwen2.5-lora \
|
|
--wandb_entity luwoyuki-zhtl \
|
|
--wandb_sweep
|
|
```
|
|
|
|
```bash
|
|
python main.py \
|
|
--base_model mistralai/Mistral-7B-v0.1 \
|
|
--dataset path/to/your/dataset.json \
|
|
--output_dir ./custom_output
|
|
--hub_token "secret"
|
|
```
|
|
|
|
### Using Google Drive Dataset
|
|
|
|
Train using a dataset stored on Google Drive:
|
|
|
|
```bash
|
|
python main.py \
|
|
--dataset https://drive.google.com/file/d/your_file_id/view \
|
|
--output_dir ./drive_output
|
|
```
|