Optuna

Optunaは、ハイパーパラメータの最適化を効率的に行うためのPythonライブラリである。データ駆動科学では、このような最適化作業は不可欠なステップであり、そのニーズを満たすツールとして利用されている。主に数値、カテゴリ、離散のハイパーパラメータを対象とし、目的関数を定義することで、サンプリング方法に基づく探索が行われる。探索過程を視覚的に表示する機能もあり、ユーザーが最適化の進行を直感的に把握することができる。

Trial of Optuna

1. Introduction

Optuna is a Python library designed to efficiently perform hyperparameter optimization. In this article, we will use the Higgs Boson classification dataset (https://openml.org/search?type=data&sort=version&status=any&order=asc&exact_name=Higgs&id=45570) and apply Optuna to optimize the hyperparameters of a neural network classifier. This tutorial assumes that you are running the code on Google Colaboratory.

2. Installation

Run the following command to install Optuna:

!pip install optuna

3. Execution

First, set up PyTorch to use the GPU if available.

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Next, retrieve and preprocess the Higgs dataset.

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
data = fetch_openml(name="Higgs", version=1)
X, y = data.data, data.target.astype(int)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to PyTorch tensors and move to GPU
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).to(device)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.long).to(device)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.long).to(device)

Then, define a single-hidden-layer neural network in PyTorch. The hyperparameter hidden_size (the number of units in the hidden layer) is passed as an argument.

import torch.nn as nn
import torch.optim as optim

class DNN(nn.Module):
    def __init__(self, hidden_size):
        super(DNN, self).__init__()
        self.fc1 = nn.Linear(28, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, 2) 
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

Next, define the optimization function for Optuna. The hyperparameters hidden_size (hidden layer size), lr (learning rate), and batch_size are tuned. The function trains the model and returns the test accuracy.

import optuna

def objective(trial):
    # Define the search space
    hidden_size = trial.suggest_int('hidden_size', 32, 256)
    lr = trial.suggest_loguniform('lr', 1e-4, 1e-1)
    batch_size = trial.suggest_categorical('batch_size', [32, 64, 128])
    epochs = 20  # Number of epochs

    # Move model to GPU
    model = DNN(hidden_size).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    # Create DataLoader
    train_dataset = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

    # Training loop
    for epoch in range(epochs):
        for X_batch, y_batch in train_loader:
            optimizer.zero_grad()
            outputs = model(X_batch)
            loss = criterion(outputs, y_batch)
            loss.backward()
            optimizer.step()

    # Evaluate on test data
    with torch.no_grad():
        y_pred = model(X_test_tensor)
        y_pred_classes = torch.argmax(y_pred, dim=1)
        accuracy = (y_pred_classes == y_test_tensor).sum().item() / len(y_test_tensor)

    return accuracy

Finally, run Optuna to optimize the hyperparameters. By default, the TPESampler (a Bayesian optimization-based sampler) is used.

study = optuna.create_study(direction='maximize')  # Maximize accuracy
study.optimize(objective, n_trials=30)  # Run 30 trials

# Display the best hyperparameters
print("Best trial:")
print(f"  Accuracy: {study.best_value:.4f}")
print(f"  Params: {study.best_params}")

After about 23 minutes, the optimization completes, and the best hyperparameters are displayed as follows:

Best trial:
  Accuracy: 0.7214
  Params: {'hidden_size': 50, 'lr': 0.0008382275385421477, 'batch_size': 64}

4. Visualization of Results

The following code plots the accuracy of each trial, producing a figure like the one below.

import matplotlib.pyplot as plt
import pandas as pd

df = study.trials_dataframe()

plt.figure(figsize=(8, 5))
plt.plot(df["number"], df["value"], marker="o", linestyle="-")
plt.xlabel("Trial Number")
plt.ylabel("Accuracy")
plt.title("Optuna Optimization Progress")
plt.grid()
plt.show()

We can also visualize the relationship between hyperparameters and accuracy using a scatter plot. The optimal hyperparameters are located toward the lower-left area. Note that this 2D plot does not include information about the batch size.

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
scatter = sns.scatterplot(x=df["params_hidden_size"], y=df["params_lr"], size=df["value"], hue=df["value"], palette="coolwarm", sizes=(20, 200))

plt.legend(title="Accuracy", bbox_to_anchor=(1.05, 1), loc="upper left")

plt.xlabel("Hidden Size")
plt.ylabel("Learning Rate")
plt.title("Hyperparameter Impact on Accuracy")
plt.show()

5. Conclusion

Using Optuna, we were able to easily perform hyperparameter optimization based on Bayesian optimization. In this example, we used a neural network with a single hidden layer, but further improvements such as increasing the number of hidden layers could enhance accuracy.