🧠 From Notebook to Pipeline: A Deep Learning Developer's Journey (Part 1)
Every machine learning project starts with a spark of an idea. Mine was simple: "Could I build a model to tell the difference between pizza, steak, and sushi?" Like many developers, I opened my go-to tool for exploration: a Jupyter Notebook. It felt like the perfect place to start, but I quickly discovered that a notebook is just a starting point. The real journey—the one filled with frustrating setbacks and "aha!" moments—was turning that experimental script into a reliable, reusable machine learning system.
The First Big Question: Is My Model Actually Any Good?
After some initial coding, I had a simple classifier that seemed to work. But my process was a mess. I was manually changing hyperparameters in code cells, re-running the notebook, and scribbling results in a notepad. I was getting numbers, but I couldn't trust them. This led to the first major challenge: how can I prove my model's performance in a scientific, repeatable way?
This single question broke down into a series of smaller, interconnected problems.
Problem #1: My experiments were chaotic and biased.
My first attempt at a solution was to bring order to the chaos. I wrote a nested loop to systematically iterate through every combination of model (EfficientNetB0
, B2
, B4
), dataset size, and training duration.
Moving from random tweaks to a structured experiment plan like this was the first step toward building a reliable model.
This was a huge step forward! But it created a new problem: I was drowning in a sea of print()
statements. Comparing the results of run #3 with run #17 was a nightmare of scrolling and squinting.
# The first step toward a real system: a structured experiment loop
for data_name, train_dataloader in train_dataloaders_dict.items():
for epochs in num_epochs_list:
for model_name in models_list:
# ... train and evaluate ...
Problem #2: I couldn't visualize the story of my training. I realized I didn't just need results; I needed a narrative. I needed to see the learning curves to understand how each model was behaving. This is where TensorBoard came in. But my first attempt was, again, a mess. All my logs were jumbled into one confusing timeline.
TensorBoard made it easy to visually compare all eight experiments. The superiority of the
EffNetB2
model trained on 20% of the data (the top-performing line) became immediately obvious.
The insight wasn't just to use TensorBoard, but to be deliberate about how I organized my logs. I wrote a small utility function to create a clean, timestamped directory structure for every single run, which made the above visualization possible.
# This function was the key to unlocking readable, comparable logs
def create_writer(experiment_name, model_name, extra=None):
timestamp = datetime.now().strftime("%Y-%m-%d")
log_dir = Path("runs") / timestamp / experiment_name / model_name
# ...
Finally, I could see the story. I had a dashboard that clearly showed which models were learning fastest and which were overfitting. The data was now pointing to EfficientNetB2
as the most promising candidate.
The Second Big Question: How Do I Squeeze Out More Performance?
My experiments were now reliable, but the best model's accuracy was still just "okay." I knew the answer was Transfer Learning, but I soon learned that knowing the name of a technique is very different from implementing it correctly.
The core concept of feature extraction: keep the pre-trained 'backbone' (the feature learner) and only train a new, small 'head' (the classifier) on our specific data.
Problem #3: My GPU was crying and my training was slow.
My first attempt was naive. I loaded a pre-trained EfficientNet
, swapped the final layer for my 3-class classifier, and hit "train." My GPU fan spun up like a jet engine, and the estimated training time was in hours, not minutes.
The "aha!" moment came after digging into how transfer learning truly works. I was trying to retrain the entire network. The solution was to freeze the backbone.
The proof is in the numbers. After freezing the backbone, the number of trainable parameters dropped from over 4 million to just 3,843, dramatically speeding up training.
# The crucial insight: only train the tiny new part of the model
for param in model.features.parameters():
param.requires_grad = False # Freeze the billions of learned parameters
By freezing the vast majority of the network, I was only training the tiny "head" I had added. The training time plummeted, and my GPU could finally breathe.
Problem #4: A better model was performing worse. Why? With my training loop now fast and efficient, I hit another wall. My accuracy was inexplicably poor. I spent hours debugging my code before realizing the problem wasn't in my logic, but in my data.
The pre-trained EfficientNet
model expects images to be a specific size, with specific normalization values. My manually crafted image transforms were close, but not perfect. It was like speaking the same language but with a slightly wrong accent—the model was getting confused. The fix, once found, was beautifully simple:
# This one line fixed a mountain of hidden data-mismatch issues
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
auto_transforms = weights.transforms()
Using the transforms that came packaged with the pre-trained weights ensured my data was perfectly preened for the model and the accuracy immediately jumped.
The Final Step: From a Monolithic Script to a Modular System
I now had a reliable, high-performing pipeline. But it was all trapped inside a single, massive 500-line script. It worked, but it wasn't a piece of software; it was a monolith.
The final challenge of this phase was to perform the Great Refactor. I systematically broke the monolithic script apart, piece by piece, into a dedicated Python package called food_vision
. Each file was given a single, clear responsibility:
data_setup.py
: Does one thing: createsDataLoaders
.model_builder.py
: Does one thing: buildsEfficientNet
models.engine.py
: Does one thing: runs the training and testing loops.utils.py
: Holds the helpers for saving models and creating TensorBoard writers.
This process was the bridge from being a data scientist to being an MLOps engineer. The result wasn't just a script that ran; it was a system that could be imported, tested, and extended.
The ML brain was finally complete. It was validated, packaged, and ready. But it was a brain trapped in a jar, only accessible via my terminal. It felt like a powerful engine with no car around it. The next, even bigger challenge was giving it a body—an interactive web interface that anyone could use.
My struggle with web frameworks, background processes, and user experience is a whole other story. If you want to see how I built the UI for this model, you can follow along in Part 2 of this series!