Why do we use x−y rather than x+y in the definition of the convolution? Is it just convention? (If we are thinking of convolutions as weighted averages, for instance against "good kernels," it should make no difference.)
Why (f∗g)(x)=∫f(y)g(x−y)dy rather than (f∗g)(x)=∫f(y)g(x+y)dy?
Edit: I'm finding it really hard to choose a best answer. There are at least three very good ones here.