< Back to blog

Background Remover 1.0: An inside view

Introduction

Background remover result

Hi there! I am happy to present to you a new service from our company — Background Remover. This service allows one to overlay any background on the video, much like it is done in Zoom or Skype, but with a greater picture quality. The technology behind the service can be used in video production. For example, one can use it to create a YouTube video with a green background when a green screen is not available, or for Twitch as plugin to streaming software. The article is written for technical specialists. I will begin with a small overview of the service and then explain how the service works inside.

First, you will need to upload your video to the site. Just pick any file from your PC or Mac after clicking the button at the center of the screen.

Background remover result

Then, in just several seconds you can select the background for one of the frames of the loaded video.

Background remover screen

At the moment, there are 7 backgrounds to choose from: 4 static pictures and 3 videos. After selecting a new background and pressing the button, the service starts to process the video. The video will be cropped to 10 seconds if it is longer in order not to overload the service. After that, the page will change and you can preview and download the processed video.

Background remover screen

Currently, our service produces maximum picture quality if the video contains a “speaking head”, i.e. an upper body like in Skype or Zoom. I will now explain how this service works inside.

Video processing

The main function is performed by a neural network, which gets a frame from the video as input and returns a single-channel picture (we call it “mask”). White pixels in the mask correspond to pixels in the frame from the foreground, and black ones from the background. After that, the mask is slightly cleaned from noise using OpenCV’s algorithms, which remove some mistakes in the mask. Then an image is composed with a background using the mask as alpha channel.

Video processing

The neural network consists of 2 modules: “Semantic Segmentation” and “Image Matting”. The first module gets the original image, and at the output we get Trimap, this is a mask on which the pixels belong to one of 3 classes: “foreground”, “background” and “uncertain”. You can notice a gray strip along the edge of the image.

Video processing

Trimap is sent to the “Image Matting” module, which returns the resulting mask.

How the Background Remover web part works

The following is a structural diagram of the service:

Workflow Scheme

The service consists of 2 main components: “Master Server” and “Worker”. The master service is responsible for sending static files, processing user requests, uploading videos and forming a task queue. It is written in Django using the Django REST API.

Workers are directly involved in data processing. They take the task from the queue, download the file, process it according to the task and return the result. We have 2 types of tasks: ‘preview’ and ‘video’. The result of the first type of task is one frame in .png format with an alpha channel, and the second one is the processed video.

For communication between workers and the master server, a separate API is provided where the worker can receive a task, occupy it and send the status of the task.

Epilogue

We are currently testing this technology, so you can work with your file on our website for free with certain restrictions. I have mentioned earlier that the video will be cut to 10 seconds if it is longer; there is a restriction on the resolution — 1920x1080p, and on the file type. The site contains a list of file types and it includes the most common formats.

References