Run evolutionary feature engineering

evolve_features(
  data,
  target_col,
  task = "classification",
  generations = 10,
  pop_size = 10,
  cv_folds = 3,
  evaluation_strategy = "cv",
  split_ratio = c(0.6, 0.2, 0.2),
  split_ids = NULL,
  early_stopping_rounds = 3,
  evaluator = "lightgbm",
  dynamic_population = TRUE,
  crossover_type = "both",
  threads = 8,
  max_clustering_size = 5000,
  seed = NULL,
  verbose = TRUE
)

Arguments

data

A data.frame or data.table

target_col

Name of the target column

task

"classification" or "regression"

generations

Number of generations (max iterations)

pop_size

Population size

cv_folds

Number of cross-validation folds

evaluation_strategy

"cv" or "split". Strategy to evaluate candidate recipes.

split_ratio

A numeric vector of length 2 or 3 defining train/validation/holdout proportions (e.g. c(0.6, 0.2, 0.2)).

split_ids

An optional character vector of split assignments (e.g. "train", "val", "holdout").

early_stopping_rounds

Stop if fitness doesn't improve for this many generations

evaluator

The ML model to use ("lightgbm" or "xgboost")

dynamic_population

Logical. If TRUE, population expands dynamically during stagnation.

crossover_type

Crossover type: "both" (default, 50% random / 50% union), "random", or "union"

threads

Number of threads to use for parallel execution (default 8)

max_clustering_size

Maximum unique training rows to cluster (default 5000, 0/NULL for unlimited)

seed

Optional integer seed for reproducibility.

verbose

Logical. If TRUE, prints progress.