So, I've finished educating myself on this topic.
The source of my confusion was that I didn't understand that the DTFT took a discrete time signal to a continuous frequency representation. Now I understand that there are 4 different ways doing a Fourier transform:
1. Continuous unbounded time <=> Continuous unbounded frequency (Fourier transform)
2. Continuous bounded time <=> Discrete unbounded frequency (Fourier series)
3. Discrete unbounded time <=> Continuous bounded frequency (Discrete time fourier transform / DTFT)
4. Discrete bounded time <=> Discrete bounded frequency (Discrete fourier transform / DFT)
I only knew about #1, 2, and 4 and I didn't really understand how they were related. Now I understand that you can transform between bounded and unbounded via periodic summation. Once both are bounded, you immediately get the DFT. The sampling theorem is just an obvious consequence of the DTFT, but I was trying to shoe-horn it into my understanding of the DFT.
The sinc(x) function just pops out of a DTFT followed by its continuous-time inverse:
x(t) = 1/2pi Int_w [Sum_m x[m] e^{-iwm)] e^{iwt} dw = Sum_m x[m] sinc(t-m)
I knew that you could fit a finite series of data points by a finite series of sine waves (DFT), so that was what I imagined the frequency domain of the DTFT to look like: a series of discrete impulses. Now that I understand it is continuous, I realize that DTFT is actually the sum of one helix spanning the frequency band per data point. Obviously, a helix turns into a sinc function when you invert the transform.
To answer my question about convergence:
function Y = fit_sinc(y,m)
n=size(y,2);
r=n/m;
s=m/n;
Z=sinc(-n+r:r:n-r);
X=zeros(1,m);
X(1:s:m-1)=y;
Y = conv(X,Z,"same");
endfunction
n=64;
m=4096;
X=rand(1,n);
plot([0:n-1],X,'o',[0:n/m:n-1/m],fit_sinc(X,m),'-',[0:n/m:n/2-1/m],fit_sinc(X(1:n/2),m/2),'-')
The sinc function fits nice and local due to the the division by x. My original concern that the fit is not unique still stands, despite claims to the contrary made in this thread. As long as the time series is discrete AND bounded, the DTFT does not actually provide a unique fit. The "wiggles" between samples from potential sinc functions corresponding to data points beyond the cutoff are missing. However, the error due to this truncation is bounded above by an alternating harmonic series, and quite ok in practice (due to random cancellation). Nevertheless, one could construct a worst-case signal that has an arbitrarily large reconstruction error due to truncation.
Anyway, I get it now. Yay.
EDIT: Thanks for the pointer to sinc(x), xnor. It was the realization that the Fourier Transform of sinc is continuous that started me on the golden path.