CS194 Project 4

Classification and Segmentation

Xuxin Cheng CS194-agv

Part 1: Image Classification

1.1 Network Structure

The original data is first normalized to [-1, 1] and then fed into the CNN.

LayerKernel sizeInput dimensionOutput dimension
conv15n12828n322828
max pool2n322828n321414
conv25n321414n321414
max pool2n321414n3277
fully connected layer 1-n(3277)n64
fully connected layer 2-n6410

*n is the batchsize

 

1.2 Hyperparameters

Learning rateBatch sizeNumber of epochsL2 regularization weight
0.000764400.001

 

1.3 Results

1.3.1 Losses and Accuray

  

 

1.3.2 Class accuracy

ClassAccuracy(%)
T-shirt83.50
Trouser98.30
Pullover90.00
Dress92.30
Coat91.00
Sandal98.10
Shirt71.60
Sneaker98.40
Bag97.80
Ankle_boot95.00

We can see that the class "shirt" is the hardest to get. An explanation is that a shirt is too similar to a T-shirt. And we can see that the accuracy of "T-shirt" is the 2nd lowest because the network is confusing about these two classes.

 

 

1.3.3 Filter visualization

*the visualization of conv2 is by concatenating all 32x32 channels of 5x5 2d filter, as described here.

 

Part 2: Semantic Segmentation

2.1 Network Structure

 

The net work sturcture is shown in the following table. The parameters of conv_transpose layer are computed as

where is the ouput size; is stride; is input size; is padding and is kernel size.

LayerKernel sizePaddingStrideInput dimensionOutput dimension
conv1521n1256256n64256256
max pool2--n64256256n64128128
conv2311n64128128n128128128
max pool2--n128128128n1286464
conv3311n1286464n2566464
max pool2--n2566464n2563232
conv_transpose1412n2563232n1286464
conv_transpose2412n1286464n64128128
conv_transpose3412n64128128n16256256
conv4101n16256256n5256256

I got inspiration of network structure from this paper.

 

1.2 Hyperparameters

Learning rateBatch sizeNumber of epochsL2 regularization weight
1e-316801e-3

 

1.3 Results

1.3.1 Losses and accuracy

I found there was not a significant improvement of AP between epoch 40 and epoch 80. So I stopped training at epoch 80.

 

ClassColorAverage Precision
othersblack0.6849
facadeblue0.7916
pillargreen0.2269
windoworange0.8432
balconyred0.5738
Average AP 0.6241

 

1.3.2 Some expamples

From the statistics and image results we can see that the network does not work well on pillars and balcony.