To build DualReal, We use a real dual-camera setup (Fig. 1(a)) to collect \(50\) pairs of focused-defocused videos. The resolution is \(1920 \times 1080\). We display images and videos on the LCD screen (Fig. 1(b)) and use the dual-camera to shoot towards the screen to obtain the videos. We use the shutter release cable to ensure synchronization of the two cameras. We use a tripod to fix the two cameras and put them as close as possible, as shown in the dual camera of Fig. 1(a), to reduce occlusion area between the pair of focused and defocused videos. Because we perform optical flow instead of stereo matching in our framework, we do not require the traditional camera calibration using the checkerboard. The displayed images are from 5K dataset and the displayed videos are from some popular movies. One of the collected frame pairs are shown in Fig. 2.
The original videos are available at this link .
To build DualSynthetic, we generate the synthetic image pairs by the following steps:
(1) Resample the original image into a mosaic of RGB subpixels (modeled as a \(3 \times 3\) grid with [R, G, B; R, G, B; R, G, B]) to simulate the image displayed on the LCD.
(2) Apply a random projective transformation to the image to simulate different relative positions and orientations of the display LCD and the camera.
(3) Apply Gaussian blur when simulating the defocused camera, where the sigma value of the Gaussian filter is a random value between \(3.2\) and \(4.0\).
(4) Resample the image using the Bayer CFA to simulate the RAW data.
(5) Apply the demosaic function provided by MATLAB to convert the RAW data into RGB images, and then scale the results to compensate for brightness changes during the processing.
(6) Add a foreground to the image (optional). When simulating the defocused camera, we also apply Gaussian blur to the foreground of the defocused image, where the sigma value of the Gaussian filter is a random value between \(2.0\) and \(2.5\).
The original image serves as the ground-truth image. We select images from 5K dataset as the background, and collect some cartoon characters from the internet as the foreground. One of the samples in DualSynthetic are shown in Fig. 3.
The original images are available at this link .
The original video frames are from the famous stereo video dataset of Sintel. Sintel consists of \(23\) stereo videos of left view and right view, and each video contains \(20\)-\(50\) frames. In each video, the generation of the first pair of frames is performed according to the steps (1)-(6) above. We add moire patterns to the left view frames and add blur to the right view frames to generate the input focused and defocused frames, respectively. The left view without adding moire patterns are used as ground-truth. The following frames are assumed with small and stable camera motion of the previous frame instead of random motion, and thus the step (2), i.e. the projective transformation, is not randomly generated. We fix the screen not moved and set a constant translation of the camera for each frame. For each video, the constant translation value is randomly selected between \(5\) and \(20\). One of the samples in DualSyntheticVideo are shown in Fig. 4.
The original frames are available at this link .
@ARTICLE{DuDemoire2025,
author={Dong, Xuan and Sun, Xiangyuan and Wang, Xia and Song, Jian and Li, Ya and Li, Weixin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Video Demoireing using Focused-Defocused Dual-Camera System},
year={2025},
volume={},
number={},
pages={1-15},
keywords={Video demoireing; Focused-defocused dual camera},
doi={10.1109/TPAMI.2025.3596700}}