-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Maybe we can merge projects?
EvoTrees:
- More loss functions supported
- Cleaner code
- Working on integrating with Julia ecosystem
MemoryConstrainedTreeBoosting:
- 4-5x faster on CPU
- Early stopping
MemoryConstrainedTreeBoosting.jl is a library I've been working on for a couple years that would allow me to control the loading and binning of data, so I could (a) do feature engineering in Julia and (b) use all the memory on my machine for the binned data. I've also spent a lot of time on speed because 10% faster ≈ training done 1 day sooner for my data sets. I think it is quite fast. I didn't bother documenting the library until today, however.
With our powers combined...!
A benchmark below. 4 threads on my 2013 quad-core i7-4960HQ.
pkg> add https://github.com/brianhempel/MemoryConstrainedTreeBoosting.jl
using Statistics
using StatsBase:sample
using Revise
using EvoTrees
nrounds = 200
# EvoTrees params
params_evo = EvoTreeRegressor(T=Float32,
loss=:logistic, metric=:logloss,
nrounds=nrounds,
λ=0.5, γ=0.0, η=0.05,
max_depth=6, min_weight=1.0,
rowsample=1.0, colsample=0.5, nbins=64)
# MemoryConstrainedTreeBoosting params
params_mctb = (
weights = nothing,
bin_count = 64,
iteration_count = nrounds,
min_data_weight_in_leaf = 1.0,
l2_regularization = 0.5,
max_leaves = 32,
max_depth = 6,
max_delta_score = 1.0e10, # Before shrinkage.
learning_rate = 0.05,
feature_fraction = 0.5, # Per tree.
bagging_temperature = 0.0,
)
nobs = Int(1e6)
num_feat = Int(100)
@info "testing with: $nobs observations | $num_feat features."
X = rand(Float32, nobs, num_feat)
Y = Float32.(rand(Bool, size(X, 1)))
@info "evotrees train CPU:"
params_evo.device = "cpu"
@time m_evo = fit_evotree(params_evo, X, Y);
@time fit_evotree(params_evo, X, Y);
@info "evotrees predict CPU:"
@time pred_evo = EvoTrees.predict(m_evo, X);
@time EvoTrees.predict(m_evo, X);
import MemoryConstrainedTreeBoosting
@info "MemoryConstrainedTreeBoosting train CPU:"
@time bin_splits, trees = MemoryConstrainedTreeBoosting.train(X, Y; params_mctb...);
@time MemoryConstrainedTreeBoosting.train(X, Y; params_mctb...);
@info "MemoryConstrainedTreeBoosting predict CPU, JITed:"
save_path = tempname()
MemoryConstrainedTreeBoosting.save(save_path, bin_splits, trees)
unbinned_predict = MemoryConstrainedTreeBoosting.load_unbinned_predictor(save_path)
@time pred_mctb = unbinned_predict(X)
@time unbinned_predict(X)$ JULIA_NUM_THREADS=4 julia --project=. experiments/benchmarks_v2.jl
[ Info: testing with: 1000000 observations | 100 features.
[ Info: evotrees train CPU:
98.929771 seconds (64.89 M allocations: 21.928 GiB, 2.12% gc time)
83.160324 seconds (187.35 k allocations: 18.400 GiB, 1.69% gc time)
[ Info: evotrees predict CPU:
2.458015 seconds (4.50 M allocations: 246.320 MiB, 38.75% compilation time)
1.598223 seconds (4.59 k allocations: 4.142 MiB)
[ Info: MemoryConstrainedTreeBoosting train CPU:
20.320708 seconds (16.04 M allocations: 2.480 GiB, 1.48% gc time, 0.01% compilation time)
15.954224 seconds (3.10 M allocations: 1.714 GiB, 2.66% gc time)
[ Info: MemoryConstrainedTreeBoosting predict CPU, JITed:
14.364365 seconds (11.80 M allocations: 692.582 MiB, 25.95% compilation time)
0.778851 seconds (40 allocations: 30.520 MiB)
jeremiedb and Moelf
Metadata
Metadata
Assignees
Labels
No labels