{"body":"10:37 PM <Atharva> I am pretty sure the implementation of transposed convolution layer has multiple interrelated mistakes\n10:37 PM <Atharva> Also, I think one thing can be changed in convolutional layer as well\n10:37 PM <rcurtin> agreed, I am disappointed by the spam...\n10:38 PM <Atharva> rcurtin: Is it happening to all freenode channels?\n10:38 PM <rcurtin> most that I am in, yes\n10:38 PM <rcurtin> I wonder if it is happening in #gsoc, I'll join there\n10:38 PM <zoq> Atharva: If you can provide a simple test case with the expected output, I'll take a look, or perhaps you already fixed it?\n10:39 PM <Atharva> zoq: I can fix it, I know how. I just thought I will discuss with you first.\n10:39 PM <zoq> Atharva: What's your idea with the conv layer?\n10:39 PM <zoq> Atharva: Oh, nice, yeah sure.\n10:39 PM <Atharva> I think I will start with the transposed convolution, because with conv it's just performance improvemnent\n10:40 PM <Atharva> So, the output size of transposed conv is being calculated incorrectly\n10:40 PM <Atharva> according to this paper https://arxiv.org/pdf/1603.07285.pdf\n10:40 PM <zoq> Atharva: Okay, I think any improvment would have a huge effect on the overall model performance.\n10:40 PM <Atharva> Yes\n10:41 PM <Atharva> Sorry for being slow, I am a little confused what to start with, there are a lot of things\n10:41 PM <Atharva> I will try my best to explain it all\n10:42 PM <zoq> I think Shikhar opened an issue regarding the otuput formula, let's see if I can find it.\n10:43 PM <Atharva> zoq: That's great, but the problem is not just with the formula\n10:43 PM <zoq> https://github.com/vdumoulin/conv_arithmetic/issues/18, no response so far\n10:44 PM <Atharva> The forward function of Transposed Conv is flipping the filter, that's not needed in this case\n10:44 PM <Atharva> It performs full convolution irrespective of stride and padding, which is incorrect\n10:46 PM <Atharva> This brings us to the stride, it's true that the effective final stride in transposed conv layer is always 1, but the operation does depend on the stride of the equivalent conv layer\n10:46 PM <Atharva> for example 4.5 and 4.6 of the above paper\n10:49 PM <zoq> I see the issue with the full convolution and the stride parameter, but I'm not sure about not flipping the kernel.\n10:49 PM <Atharva> I think for trans conv layer, the stride parameter should take s instead of s`. s` is always 1, but we need s(stride of corresponsing conv operation of the given trans conv layer)\n10:49 PM <Atharva> We can flip the kernel in the backwars function instead of forward\n10:50 PM <Atharva> it seems more apt\n10:50 PM <Atharva> It won't matter mathematically I think\n10:50 PM <zoq> yeah, that's the same\n10:52 PM <Atharva> Also, we are never inserting zeros between input units when the stride is not 1, section 4.5 and 4.6 of that paper\n10:52 PM <Atharva> What we instead do is always perform full convolution operation which is incorrect and extremely inefficient.\n10:53 PM <Atharva> For example, in my encoder network, I have a conv layer that goes from 64x64 to 32x32 with s = 2, p = 2, k = 5\n10:54 PM <zoq> I think, inserting zeros is a result of the incorrect dimension, which I think isn#t effecting the output\n10:54 PM <Atharva> But, for the transposed equivalent in the decoder, I am forced to use a kernel size of 33 to go from 32x32 to 64x64(a full convolution)\n10:54 PM <zoq> I see\n10:55 PM <Atharva> As mentioned in the paper, we are always doing 4.3 which they have said to be extremely inefficient \n10:56 PM <Atharva> The transposed conv layer took 30 seconds on mylaptop while the conv layer took about 0.5\n10:56 PM <Atharva> We need to correct the output size formula in transposed conv\n10:56 PM <zoq> yeah, especially if you use such a huge kernel size\n10:56 PM <zoq> agreed\n10:57 PM <Atharva> Yes, sadly that's the only option right now\n10:57 PM <Atharva> we also need to take the stride of the equivalent conv layer instead of actual stride of transposed conv(which is always 1)\n10:57 PM <zoq> right\n10:58 PM <Atharva> We need to insert zeros between the input units when it's > 1\n10:58 PM <Atharva> Similarly, the backward function is also wrong\n10:58 PM <Atharva> It always performs a valid convolution with no padding on the error matrix even when it's needed\n10:58 PM <zoq> yeah, same issues\n10:59 PM <Atharva> I think, it will be better to use valid convolution for both forward and backward in transposed conv and take care of padding and zeros in between manually\n11:00 PM <zoq> that would also speed things up\n11:00 PM <Atharva> Yes\n11:01 PM <Atharva> Also, a minor thing, we can use the correct typenames `ForwardConvolutionRule` and `BackwardConvolutionRule` for the corresponding functions :)\n11:02 PM <zoq> :)\n11:02 PM <Atharva> about conv layer now\n11:02 PM <zoq> any corrections are much appreciated\n11:03 PM <Atharva> The backward function of conv layer performs a full convolution irrespective of the output(input) size it needs to output which leads to too many unnecessary operations.\n11:04 PM <Atharva> for example, let's say a conv layer gets input 5x5 with k 4x4 and padding 2, s =1\n11:05 PM <Atharva> it goes to 6x6\n11:06 PM <Atharva> in the backward function, what happens is, it pads the 6x6 input error to 12x12 first, then takes it to 9x9 with k(inverted) 4x4 and then only used the centre 5x5 of it\n11:06 PM <Atharva> we should instead just pad 6x6 to 8x8 and directly take it to 5x5\n11:10 PM <zoq> interested to see the performance improvement, i think this would mostly effect bigger kernels\n11:10 PM <Atharva> Yes, should I change the conv part or just the transposed conv?\n11:11 PM <Atharva> I am not sure how better the performance will be in conv layer after that change, it was just somthing I noticed\n11:12 PM <zoq> it should be faster, so if you like to take a look into it, I'm happy to merge this in\n11:13 PM <Atharva> zoq: Great! I should open a PR tomorrow if I don't run into any issues\n11:13 PM <zoq> wow, that's fast\n11:13 PM <zoq> thanks!\n11:14 PM <Atharva> zoq: Happy to help, I had to clear a lot of concepts to solve this, it was fun!\n11:14 PM <Atharva> I think we should discuss some implementation details for transposed conv\n11:14 PM <Atharva> 1) we take stride of equivalent conv operation\n11:14 PM <zoq> what I really like is that this will effect all sorts of models including the rl code\n11:14 PM <Atharva> 2) change the output formula\n11:15 PM <Atharva> zoq: That's great! even the transposed conv?\n11:16 PM <zoq> Right now, only the GAN and VAE code will use the transposed conv operation, but who knows.\n11:16 PM <Atharva> Yeah, right\n11:16 PM <Atharva> so I will continue\n11:17 PM <Atharva> 3) we need to add zeros between input units if s > 1, this we will do in the forward function and not in the naive conv class, is that okay?\n11:17 PM <Atharva> so, we will just perform valid conv operation after taking care of the zeros\n11:18 PM <zoq> about 3) fine with me, that way we don't have to touch the naive conv code\n11:18 PM <Atharva> Yes\n11:19 PM <Atharva> 4) same thing with backward, we don't touch naive conv code\n11:19 PM <zoq> right\n11:19 PM <Atharva> We will need to take care of the case when s > 1, because we only have to take alternate points from the output\n11:21 PM <zoq> hm, in this case it would be nice to modify the conv rules, don't you think?\n11:21 PM <Atharva> Oh, so that we can take care of it when pointers in the valid conv function?\n11:21 PM <Atharva> Yeah, I think it will be more efficient\n11:23 PM <zoq> in this case we would have to modify all conv rules, but we could start with the naive rule\n11:23 PM <Atharva> oh, okay. I don't think I know how the other conv rules work, can you suggest me something to read on it\n11:23 PM <zoq> but I agree it should be faster if we do it inside the rule class\n11:24 PM <zoq> the other one is based on fft\n11:25 PM <zoq> don't think there is an easy way to skip the input, as we could do for the naive rule\n11:26 PM <zoq> let's focus on the naive rule, I think it will outperform the fft rule afterwards (for small kernels)\n11:26 PM <Atharva> Oh, is it okay if we do it after gsoc? \n11:27 PM <zoq> so, we could remove the code\n11:27 PM <Atharva> Yeah, I think\n11:27 PM <zoq> of course\n11:28 PM <Atharva> 5) for conv layer backward function, for that change, we would need to manually add padding and use valid conv instead of full conv\n11:29 PM <zoq> sounds reasonable\n11:30 PM <zoq> Btw. everyone should think about the final report and put some work into it.\n11:31 PM <Atharva> Yes, will surely do\n11:32 PM <Atharva> I was thinking I will create a repo on my account and explain what I did over the summers with links to the PR and some results to show\n11:35 PM <zoq> Yeah, if you like you can also write the report in the form of a blog post, something like:\n11:35 PM <zoq> - http://www.mlpack.org/gsocblog/implementation-of-tree-types-summary.html\n11:35 PM <zoq> - http://www.mlpack.org/gsocblog/deep-reinforcement-learning-methods-summary.html\n11:35 PM <zoq> - http://www.mlpack.org/gsocblog/summary-of-lsh-changes-for-gsoc-2016.html\n11:36 PM <zoq> but that's up to you. As for me the final report is somewhat of a living document, which can be updated even after GSoC has ended, like if we change/merge something afterwards I think the report should reflect that.\n11:41 PM <zoq> Also, I think it's important that the report is visible, so if anyone is interested in what you did over the summer, there is an easy way to find out (the GSoC page will link to the final report).\n","name":"","extension":"txt","url":"https://www.irccloud.com/pastebin/bEq3kpwu","modified":1533757066,"id":"bEq3kpwu","size":9528,"lines":107,"own_paste":false,"theme":"","date":1533757066}