HOME > Optuna

Optuna

Optuna is a Python library designed to efficiently optimize hyperparameters. In data-driven science, such optimization tasks are essential steps, and it serves as a tool to meet these needs. It primarily targets numeric, categorical and discrete hyperparameters. By defining an objective function, a search is performed based on sampling methods. The library also has visualization capabilities that allow the user to intuitively understand the progress of the optimization.

Information
Case / Article
Learning Content
Review
Developer’s Voice

Review

2025/11/07

Trial of Optuna

1. Introduction

Optuna is a Python library designed to efficiently perform hyperparameter optimization. In this article, we will use the Higgs Boson classification dataset (https://openml.org/search?type=data&sort=version&status=any&order=asc&exact_name=Higgs&id=45570) and apply Optuna to optimize the hyperparameters of a neural network classifier. This tutorial assumes that you are running the code on Google Colaboratory.

2. Installation

Run the following command to install Optuna:

!pip install optuna

3. Execution

First, set up PyTorch to use the GPU if available.

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Next, retrieve and preprocess the Higgs dataset.

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
data = fetch_openml(name="Higgs", version=1)
X, y = data.data, data.target.astype(int)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to PyTorch tensors and move to GPU
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).to(device)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.long).to(device)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.long).to(device)

Then, define a single-hidden-layer neural network in PyTorch. The hyperparameter hidden_size (the number of units in the hidden layer) is passed as an argument.

import torch.nn as nn
import torch.optim as optim

class DNN(nn.Module):
    def __init__(self, hidden_size):
        super(DNN, self).__init__()
        self.fc1 = nn.Linear(28, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, 2) 
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

Next, define the optimization function for Optuna. The hyperparameters hidden_size (hidden layer size), lr (learning rate), and batch_size are tuned. The function trains the model and returns the test accuracy.

import optuna

def objective(trial):
    # Define the search space
    hidden_size = trial.suggest_int('hidden_size', 32, 256)
    lr = trial.suggest_loguniform('lr', 1e-4, 1e-1)
    batch_size = trial.suggest_categorical('batch_size', [32, 64, 128])
    epochs = 20  # Number of epochs

    # Move model to GPU
    model = DNN(hidden_size).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    # Create DataLoader
    train_dataset = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

    # Training loop
    for epoch in range(epochs):
        for X_batch, y_batch in train_loader:
            optimizer.zero_grad()
            outputs = model(X_batch)
            loss = criterion(outputs, y_batch)
            loss.backward()
            optimizer.step()

    # Evaluate on test data
    with torch.no_grad():
        y_pred = model(X_test_tensor)
        y_pred_classes = torch.argmax(y_pred, dim=1)
        accuracy = (y_pred_classes == y_test_tensor).sum().item() / len(y_test_tensor)

    return accuracy

Finally, run Optuna to optimize the hyperparameters. By default, the TPESampler (a Bayesian optimization-based sampler) is used.

study = optuna.create_study(direction='maximize')  # Maximize accuracy
study.optimize(objective, n_trials=30)  # Run 30 trials

# Display the best hyperparameters
print("Best trial:")
print(f"  Accuracy: {study.best_value:.4f}")
print(f"  Params: {study.best_params}")

After about 23 minutes, the optimization completes, and the best hyperparameters are displayed as follows:

Best trial:
  Accuracy: 0.7214
  Params: {'hidden_size': 50, 'lr': 0.0008382275385421477, 'batch_size': 64}

4. Visualization of Results

The following code plots the accuracy of each trial, producing a figure like the one below.

import matplotlib.pyplot as plt
import pandas as pd

df = study.trials_dataframe()

plt.figure(figsize=(8, 5))
plt.plot(df["number"], df["value"], marker="o", linestyle="-")
plt.xlabel("Trial Number")
plt.ylabel("Accuracy")
plt.title("Optuna Optimization Progress")
plt.grid()
plt.show()

We can also visualize the relationship between hyperparameters and accuracy using a scatter plot. The optimal hyperparameters are located toward the lower-left area. Note that this 2D plot does not include information about the batch size.

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
scatter = sns.scatterplot(x=df["params_hidden_size"], y=df["params_lr"], size=df["value"], hue=df["value"], palette="coolwarm", sizes=(20, 200))

plt.legend(title="Accuracy", bbox_to_anchor=(1.05, 1), loc="upper left")

plt.xlabel("Hidden Size")
plt.ylabel("Learning Rate")
plt.title("Hyperparameter Impact on Accuracy")
plt.show()

5. Conclusion

Using Optuna, we were able to easily perform hyperparameter optimization based on Bayesian optimization. In this example, we used a neural network with a single hidden layer, but further improvements such as increasing the number of hidden layers could enhance accuracy.