# Classification and Segmentation

## Part 1: Image Classification

### 1.1 Network Structure

The original data is first normalized to [-1, 1] and then fed into the CNN.

*n is the batchsize

### 1.3 Results

#### 1.3.2 Class accuracy

We can see that the class "shirt" is the hardest to get. An explanation is that a shirt is too similar to a T-shirt. And we can see that the accuracy of "T-shirt" is the 2nd lowest because the network is confusing about these two classes.

#### 1.3.3 Filter visualization

*the visualization of conv2 is by concatenating all 32x32 channels of 5x5 2d filter, as described here.

## Part 2: Semantic Segmentation

### 2.1 Network Structure

The net work sturcture is shown in the following table. The parameters of conv_transpose layer are computed as

$o=s(i-1)-2p+k$

where $o$ is the ouput size; $s$ is stride; $i$ is input size; $p$ is padding and $k$ is kernel size.

I got inspiration of network structure from this paper.

### 1.3 Results

#### 1.3.1 Losses and accuracy

I found there was not a significant improvement of AP between epoch 40 and epoch 80. So I stopped training at epoch 80.

#### 1.3.2 Some expamples

From the statistics and image results we can see that the network does not work well on pillars and balcony.