train.en and train.zh are from here
the actual dataset and .sqlite file It's missing the epubs dir I used for paragraph rebuilding... I accidentally deleted the dir, sorry :c
What I did was Google a sentence from the chapter 1 of a novel and just scrape 50-60 chapters from either Webnovel or some aggregator, then unzip it into epub with the directory name set to book_id.

GuoFeng dataset chapter spread:

select book_id, count(*) as chapter_count
from chapters
group by book_id
order by chapter_count desc;

book_id	chapter_count
45-jsys	2262
93-yzsslfmmd	1733
2-xzltq	1718
19-ysmmjwn	1546
52-mfwz	1254
86-wzxajddyx	1188
34-xwdrcsh	1172
25-dgfsngm	942
53-gmzz	798
6-yh1frhjqjysy	763
141-fyyysndy	745
37-scrj	539
95-cjjyyhy	516
99-jjl	220

There are 21 more with 60chs and the rest are 50 or less.

However, I didn't import many epubs, there are 153 books in the dataset in total and the most important part about GuoFeng-Webnovel dataset is the Chinese raws and more or less decent mapping between paragraphs (there are some mistakes which sucks). I used 19 epubs and not many of the paragraphs actually matched.