cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Object Oriented Dataset with Python and PyTorch - Part 3: Cats and Dogs

gguasti
Xilinx Employee
Xilinx Employee
4 0 972

This is part three of the Object Oriented Dataset with Python and PyTorch blog series. For Part One, see here.

For Part two see here.

Part 3: repetita iuvant(*): Cats and Dogs

(*) a Latin phrase meaning "Repeated things help"

In this entry we repeat the procedure we just completed in part two on a Cat and Dog database, and we will add something else.

 

Typically we find a simple dataset organized with folders. For example, cats, dogs, and for each category Train, Validation, and Test folders.

By organizing our dataset as an object we can avoid the complexity of the folder tree. In this application all pictures are saved in the same folder.

We just need a label file telling us which sample is a dog and which is a cat. The code to automatically create the label file is included below.


Even if each picture name is a label per se, a file label.txt is created on purpose: each row contains the filename and the label: cat = 0 dog = 1.

At the end of the example we will review two methods to split the database with PyTorch and train an extremely simple model.

In [ ]:
data_path = './raw_data/dogs_cats/all'
import os
files = [f for f in os.listdir(data_path) ]
#for f in files:
#    print(f)

with open(data_path + '/'+ "labels.txt", "a") as myfile:
    for f in files:
        if f.split('.')[0]=='cat':
            label = 0
        elif f.split('.')[0]=='dog':
            label = 1
        else:
            print("ERROR in recognizing file " + f + "label")
        
        myfile.write(f + ' ' + str(label) + '\n')
In [106]:
raw_data_path = './raw_data/dogs_cats/all'
im_example_cat = Image.open(raw_data_path + '/' + 'cat.1070.jpg') 
im_example_dog = Image.open(raw_data_path + '/' + 'dog.1070.jpg') 

fig, axs = plt.subplots(1, 2, figsize=(10, 3))

axs[0].set_title('should be a cat')
axs[0].imshow(im_example_cat)

axs[1].set_title('should be a dog')
axs[1].imshow(im_example_dog)
plt.show()
 
gguasti_0-1600262561503.png
 

Do not forget to refresh the sample list:

In [ ]:
del sample_list

@functools.lru_cache(1)
def getSampleInfoList(raw_data_path):
    sample_list = []
    with open(str(raw_data_path) + '/labels.txt', mode = 'r') as f:
        reader = csv.reader(f, delimiter = ' ')
        for i, row in enumerate(reader):
            imgname = row[0]
            label = int(row[1])
            sample_list.append(DataInfoTuple(imgname, label))
    sample_list.sort(reverse=False, key=myFunc)
    # print("DataInfoTouple: samples list length = {}".format(len(sample_list)))
    return sample_list

The database object creation is as simple as a line of code:

In [114]:
mydataset = MyDataset(isValSet_bool = None, raw_data_path = raw_data_path, norm = False, resize = True, newsize = (64, 64))
 

If you want to normalize, we should calculate mean and standard deviation and regenerate the normalized dataset.

The code is included below for completeness.

In [ ]:
imgs = torch.stack([img_t for img_t, _ in mydataset], dim = 3)
im_mean = imgs.view(3, -1).mean(dim=1).tolist()
im_std = imgs.view(3, -1).std(dim=1).tolist()
del imgs
normalize = transforms.Normalize(mean=im_mean, std=im_std)
mydataset = MyDataset(isValSet_bool = None, raw_data_path = raw_data_path, norm = True, resize = True, newsize = (64, 64))

Splitting the database into Train, Validation, and Test sets

The next step is necessary for the training phase. Usually, the entire samples database is scrambled and then split into three sets: the Train set, Validation set and Test set.

In cases where you have the dataset organized as tensor of data and tensor of label you could for example use 'sklearn.model_selection.train_test_split' twice.

First to split it into Train and Test and then to split Train again into Validation and Train.

It would look something like this:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

 

However, we want to keep the dataset as an Object and PyTorch helps us to make this operation simple.

As an example let's create the Train and Validation set only.

Method 1:

Here we scramble the indices and then create the datasets

In [ ]:
n_samples = len(mydataset)
# how many samples will be in the Validation Set
n_val = int(0.2 * n_samples)
# important! scramble the dataset. We start with scrambling the indices.
shuffled_indices = torch.randperm(n_samples)
# first step is ti split the indices
train_indices = shuffled_indices[:-n_val]
val_indices = shuffled_indices[-n_val:]
train_indices, val_indices
In [ ]:
from torch.utils.data.sampler import SubsetRandomSampler
batch_size = 64

train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)

train_loader = torch.utils.data.DataLoader(mydataset, batch_size=batch_size, sampler=train_sampler)
validation_loader = torch.utils.data.DataLoader(mydataset, batch_size=batch_size, sampler=valid_sampler)
 

Method 2

Here is an example of directly scrambling the database. The code style is more abstract:

In [116]:
train_size = int(0.9 * len(mydataset))
valid_size = int(0.1 * len(mydataset))
train_dataset, valid_dataset = torch.utils.data.random_split(mydataset, [train_size, valid_size])

# uncomment if you need Test dataset too
#test_size = valid_size
#train_size = train_size - test_size
#train_dataset, test_dataset = torch.utils.data.random_split(train_dataset, [train_size, test_size])

len(mydataset), len(train_dataset), len(valid_dataset)
Out[116]:
(25000, 22500, 2500)

Model definition

In [41]:
import torch.nn as nn
import torch.nn.functional as F
n_out = 2
In [ ]:
# very minimal NN
# Expected accuracy 0.66

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding = 1)
        self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding = 1)
        self.fc1 = nn.Linear(8*16*16, 32)
        self.fc2 = nn.Linear(32, 2)

    def forward(self, x):
        out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
        out = F.max_pool2d(torch.tanh(self.conv2(out)), 2)
        #print(out.shape)
        out = out.view(-1,8*16*16)
        out = torch.tanh(self.fc1(out))
        out = self.fc2(out)
        return out
In [131]:
# deeper model - but training time starts becoming prohibitive on my CPU

class ResBlock(nn.Module):
    def __init__(self, n_chans):
        super(ResBlock, self).__init__()
        self.conv = nn.Conv2d(n_chans, n_chans, kernel_size=3, padding=1)
        self.batch_norm = nn.BatchNorm2d(num_features=n_chans)
    def forward(self, x):
        out = self.conv(x)
        out = self.batch_norm(out)
        out = torch.relu(out)
        return out + x
In [177]:
class Net(nn.Module):
    def __init__(self, n_chans1=32, n_blocks=10):
        super(Net, self).__init__()
        self.n_chans1 = n_chans1
        self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(n_chans1, n_chans1, kernel_size=3, padding=1)
        self.resblocks = nn.Sequential(* [ResBlock(n_chans=n_chans1)] * n_blocks)
        self.fc1 = nn.Linear(n_chans1 * 8 * 8, 32)
        self.fc2 = nn.Linear(32, 2)
    def forward(self, x):
        out = F.max_pool2d(torch.relu(self.conv1(x)), 2)
        out = self.resblocks(out)
        out = F.max_pool2d(torch.relu(self.conv3(out)), 2)
        out = F.max_pool2d(torch.relu(self.conv3(out)), 2)
        out = out.view(-1, self.n_chans1 * 8 * 8)
        out = torch.relu(self.fc1(out))
        out = self.fc2(out)
        return out
model = Net(n_chans1=32, n_blocks=5)
 

Let's display the size of the model:

In [178]:
model = Net()
numel_list = [p.numel() for p in model.parameters() if p.requires_grad == True]
sum(numel_list), numel_list
Out[178]:
(85090, [864, 32, 9216, 32, 9216, 32, 32, 32, 65536, 32, 64, 2])
 

A simple and clever trick to highlight shape mismatches and mistakes: check the model in the forward direction before training it:

In [180]:
model(mydataset[0][0].unsqueeze(0))
# unsqueeze is needed to add a dimension and emulate the batch
Out[180]:
tensor([[0.7951, 0.6417]], grad_fn=<AddmmBackward>)
 

It works!

Training the model

This was not the goal of the article but ... once we have the model it is fun to train it, in particular if Pytorch provides the DataLoader for free.

The DataLoader's task is to sample minibatches from a Dataset, with flexible sampling strategies. We will shuffle the dataset before loading the minibatches. For reference see https://pytorch.org/docs/stable/data.html

In [181]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print("Training on device {}.".format(device))
 
Training on device cpu.
In [182]:
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=64, shuffle=False) # note, we do not need to shuffle here
In [183]:
def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            outputs = model(imgs)
            loss = loss_fn(outputs, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            loss_train += loss.item()
        if epoch == 1 or epoch % 5 == 0:
            print('{} Epoch {}, Training loss {}'.format(
                datetime.datetime.now(), epoch, float(loss_train)))
In [184]:
model = Net()
# pretrained model models_data_path = './raw_data/models' model.load_state_dict(torch.load(models_data_path + '/cats_dogs.pt'))
In [185]:
optimizer = optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()

training_loop(
    n_epochs = 20,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader,
)
 
2020-09-15 19:33:03.105620 Epoch 1, Training loss 224.0338312983513
2020-09-15 20:01:35.993491 Epoch 5, Training loss 153.11289536952972
2020-09-15 20:36:51.486071 Epoch 10, Training loss 113.09166505932808
2020-09-15 21:11:37.375586 Epoch 15, Training loss 85.17814277857542
2020-09-15 21:46:05.792975 Epoch 20, Training loss 59.60428727790713
In [189]:
for loader in [train_loader, valid_loader]:
    correct = 0
    total = 0
    with torch.no_grad():
        for imgs, labels in loader:
            outputs = model(imgs)
            _, predicted = torch.max(outputs, dim=1) 
            total += labels.shape[0]
            correct += int((predicted == labels).sum())
    print("Accuracy: %f" % (correct / total))
 
Accuracy: 0.956756
Accuracy: 0.830800
 

Not a great performance, but it was just to test that the dataset organized as a Python Object works and that we can train a generic model.

Please also consider that in order to speed up training with my CPU, all pictures have been downsampled to 64x64.

In [187]:
models_data_path = './raw_data/models'
torch.save(model.state_dict(), models_data_path + '/cats_dogs.pt')
In [ ]:
# in case we want to load a previously saved model
model = Net()
model.load_state_dict(torch.load(models_data_path + 'cats_dogs.pt'))
 

Appendix

Understanding DIM

The way to understand the “dim” of pytorch sum or mean is that it collapses the specified dimension. So when it collapses dimension 0 (the row), it becomes just one row (it operates column-wise).

In [ ]:
a = torch.randn(2, 3)
a
In [ ]:
torch.mean(a)
In [ ]:
torch.mean(a, dim=0) # now collapsing rows, only one row will result
In [ ]:
torch.mean(a, dim=1) # now collapsing columns, only one column will remain
 

References