Here are two attacks.<p>Correlation in time and space could be exploited. There's a phenomenon called "superresolution" where you can combine a number of slightly displaced images together, estimate the displacements, and use subpixel techniques to resample the image at higher resolution.<p>Superresolution might work well in the "bad video" situation where somebody is handholding a camcorder.<p>Wider than that there's the general possibility of modelling the environment -- for instance you can scale up graphics for old video games by having a model for high resoluton details that people find plausible. If an object is seen in close-up, details could be transfered to the same object further away.
A really long shot;
1. Crawl web for images and store them.
2. For every frame find similar images to make a subset of the image store to be searched in (will call searchset)
3. divide the frame into blocks suitable (possibly depending on the format and compression of the video)
4. For each block find similar blocks in the search set and use extrapolation to generate a better picture.<p>I know this would take time but worth the try.
I think the closest to your idea would be upscaling.
<a href="http://en.wikipedia.org/wiki/Video_scaler" rel="nofollow">http://en.wikipedia.org/wiki/Video_scaler</a>