2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)

December 1 – 4, 2020, Virtual Conference

Tutorials


From Low to Super Resolution and Beyond

Dr. Chi-Wah Kok and Dr. Wing-Shan Tam


Date: 1 December 2020
Time: AM

Dr. Chi-Wah Kok
CVX Semiconductor Pty Ltd., Adelaide, South Australia, Australia
eekok@ieee.org

Dr. Chi-Wah Kok Chi-Wah Kok was born in Hong Kong. He was granted with a PhD degree from the University of Wisconsin Madison. Since 1992, he has been working with various semiconductor companies, research institutions and universities, which include AT&T Labs Research, Holmdel, SONY U.S. Research Labs, Stanford University, Hong Kong University of Science and Technology, Hong Kong Polytechnic University, City University of Hong Kong, Lattice Semiconductor etc. He founded Canaan Semiconductor Ltd. in Hong Kong in 2016, CVX Semiconductor Pty Ltd in Adelaide, Australia in 2019. Both are fabless IC design companies with products in mixed signal IC for consumer electronics. Dr. Kok embraces new technologies to meet the fast changing market requirements. He has extensively applied signal processing techniques to improve the circuit topologies, designs, and fabrication technologies within Canaan and CVX. This includes the application of semidefinite programming to circuit design optimization, abstract algebra in switched capacitor circuit topologies improvement, nonlinear optimization method to optimize high voltage MOSFET layout and fabrication. Dr. Kok has authored and co-authored three text books: CMOS Voltage References: An Analytical and Practical Perspective by Wiley-IEEE; Precoder-Equalizer Systems in Communications by Pearson Education; Manufacturing Processes for Engineering Materials-Fifth Edition by Pearson Education; Digital Image Interpolation in MATLAB by Wiley-IEEE. A new book “Digital Image Denoising in MATLAB” will be published by Wiley- IEEE on 2021.

Dr. Wing-Shan Tam
CVX Semiconductor Pty Ltd., Adelaide, South Australia, Australia
wstam@ieee.org

Dr. Wing-Shan Tam Wing-Shan Tam was born in Hong Kong. She received her BEng degree in electronic engineering from The Chinese University of Hong Kong, and MSc degree in electronic and information engineering from The Hong Kong Polytechnic University, and PhD degree in electronic engineering from the City University of Hong Kong in 2004, 2007, and 2011, respectively. Currently, she is the Engineering Manager of CVX Semiconductor Pty Ltd, and she has been working in semiconductor business since 2004. Her research interests include mixed-signal integrated circuit design for data conversion and power-management, and image interpolation. Dr. Tam has coauthored the book CMOS Voltage References: An Analytical Practical Perspective by Wiley-IEEE; Digital Image Interpolation in MATLAB by Wiley-IEEE. A new book “Digital Image Denoising in MATLAB” will be published by Wiley-IEEE on 2021.


Abstract

The tutorial starts with an introduction of digital image interpolation, and single image super-resolution. It continues with the definition of various image interpolation performance measurement indices, including both objective and subjective indices. The core of this tutorial is the application of covariance based interpolation to achieve high visual quality image interpolation and single image super-resolution results. Layer on layer, the covariance based edge-directed image interpolation techniques that makes use of stochastic image model without explicit edge map, to iterative covariance correction based image interpolation. The edge based interpolation incorporated human visual system to achieve visually pleasant high resolution interpolation results. On each layer, the pros and cons of each image model and interpolation technique, solutions to alleviate the interpolation visual artifacts of each techniques, and innovative modification to overcome limitations of traditional edge-directed image interpolation techniques are presented in this tutorial, which includes: spatial adaptive pixel intensity estimation, pixel intensity correction, error propagation mitigation, covariance windows adaptation, and iterative covariance correction. The tutorial will extend from theoretical and analytical discussions to detail implementation using MATLAB. The audience shall be able to bring home with implementation details, as well as the performance and complexity of the interpolation algorithms discussed in this tutorial.


Screen Content Coding in Recently Developed Video Coding Standards

Dr. Xiaozhong Xu and Dr. Shan Liu


Date: 1 December 2020
Time: AM

Dr. Xiaozhong Xu
Tencent Media Lab, 2747 Park Blvd, Palo Alto, CA 94306, USA
xiaozhongxu@tencent.com

Dr. Xiaozhong Xu Xiaozhong Xu has been a Principal Researcher and Manager of Multimedia Standards at Tencent Media Lab, Palo Alto, CA, USA, since 2017. He was with MediaTek USA Inc., San Jose, CA, USA as a Senior Staff Engineer and Department Manager of Multimedia Technology Development, from 2013 to 2017. Prior to that, he worked for Zenverge (acquired by NXP in 2014), a semiconductor company focusing on multi-channel video transcoding ASIC design, from 2011 to 2013. He also held technical positions at Thomson Corporate Research (now Technicolor) and Mitsubishi Electric Research Laboratories. His research interest lies in the general area of multimedia, including video and image coding, processing and transmission. He has been an active participant in video coding standardization activities for over fifteen years. He has successfully contributed to various standards including H.264/AVC and its extensions, AVS1 and AVS3 (China), HEVC and its extensions, MPEG-5 EVC and the most recent H.266/VVC standard. He served as a core experiment (CE) coordinator and a key technical contributor for screen content coding developments in various video coding standards (HEVC, VVC, EVC and AVS3). Xiaozhong Xu received the B.S. and Ph.D. degrees from Tsinghua University, Beijing China in electronics engineering, and the MS degree from Polytechnic school of engineering, New York University, NY, USA, in electrical and computer engineering.

Dr. Shan Liu
Tencent Media Lab, 2747 Park Blvd, Palo Alto, CA 94306, USA
shanl@tencent.com

Dr. Shan Liu Shan Liu received the B.Eng. degree in electronic engineering from Tsinghua University, the M.S. and Ph.D. degrees in electrical engineering from the University of Southern California, respectively. She is now a Tencent Distinguished Scientist and General Manager of Tencent Media Lab. She was formerly Director of Media Technology Division at MediaTek USA. She was also formerly with MERL, Sony Electronics and Sony Computer Entertainment America (now Sony Interactive Entertainment). She has been actively contributing to international standards since the last decade and served co-Editor of HEVC SCC and the emerging VVC. She has numerous technical contributions adopted into various standards, such as HEVC, VVC, OMAF, DASH and PCC, etc. She also directly contributed to and led the development effort of products which have served hundreds of millions of users. Dr. Liu holds more than 150 granted US and global patents and has published more than 80 journal and conference papers. She was in the committee of Industrial Relationship of IEEE Signal Processing Society (2014-2015) and is on the Editorial Board of IEEE Transactions on Circuits and Systems for Video Technology (2018-2021). She was the VP of Industrial Relations and Development of Asia-Pacific Signal and Information Processing Association (2016-2017) and was named APSIPA Industrial Distinguished Leader in 2018. She was appointed Vice Chair of IEEE Data Compression Standards Committee in 2019. Her research interests include audio-visual, high volume, immersive and emerging media compression, intelligence, transport, and systems.


Abstract

In the recently years, screen content video including computer generated text, graphics and animations, have drawn more attention than ever, as many related applications become very popular. However, conventional video codecs are typically designed to handle the camera-captured, natural video. Screen content video on the other hand, exhibits distinct signal characteristics and varied levels of the human’s visual sensitivity to distortions. To address the need for efficient coding of such contents, a number of coding tools have been specifically developed and achieved great advances in terms of coding efficiency.

The importance of screen content applications is well addressed by the fact that all of the recently developed video coding standards have included screen content coding (SCC) features. Nevertheless, the inclusion considerations of SCC tools in these standards are quite different. Each standard typically adopts only a subset of the known tools. Further, for one particular coding tool, when adopted in more than one standard, its technical features may various quite a lot from one standard to another.

All these caused confusions to both researchers who want to further explore SCC on top of the state-of-the-art and engineers who want to choose a codec particularly suitable for their targeted products. Information of SCC technologies in general and specific tool designs in these standards are of great interest. This tutorial provides an overview and comparative study of screen content coding (SCC) technologies across a few recently developed video coding standards, namely HEVC SCC, VVC, AVS3, AV1 and EVC. In addition to the technical introduction, discussions on the performance and design/implementation complication aspects of the SCC tools are followed up, aiming to provide a detailed and comprehensive report. The overall performances of these standards are also compared in the context of SCC. The SCC tools in discussion are listed as follows:

Screen content coding specific technologies:

  • Intra block copy (IBC)
  • Palette mode coding (PLT)
  • Transform Skip Residue Coding (TSRC)
  • Block based differential pulse code modulation (BDPCM)
  • Intra string copy (ISC)
  • Deblocking filter (DBK)


Screen content coding related technologies:

  • Integer motion vector difference (IMVD)
  • Intra subblock partitioning (ISP)
  • Geometrical partition blending off (GPBO)
  • Adaptive Color Transform (ACT)
  • Hash based motion estimation (HashME)


Recent Advances in End-to-End Learned Image and Video Compression

Prof. Wen-Hsiao Peng and Prof. Hsueh-Ming Hang


Date: 1 December 2020
Time: AM

Prof. Wen-Hsiao Peng
Computer Science Dept., National Chiao Tung University, Taiwan
wpeng@cs.nctu.edu.tw

Dr. Wen-Hsiao Peng Dr. Wen-Hsiao Peng (M’09-SM’13) received the B.S., M.S., and Ph.D. degrees from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 1997, 1999, and 2005, respectively, all in electronics engineering.

He was with the Intel Microprocessor Research Laboratory, Santa Clara, CA, USA, from 2000 to 2001, where he was involved in the development of International Organization for Standardization (ISO) Moving Picture Experts Group (MPEG)-4 fine granularity scalability and demonstrated its application in 3-D peer-to-peer video conferencing. Since 2003, he has actively participated in the ISO MPEG digital video coding standardization process and contributed to the development of the High Efficiency Video Coding (HEVC) standard and MPEG-4 Part 10 Advanced Video Coding Amd.3 Scalable Video Coding standard. His research group at NCTU is one of the few university teams around the world that participated in the Call-for-Proposals on HEVC and its Screen Content Coding extensions. He is currently a Professor with the Computer Science Department, NCTU. He was a Visiting Scholar with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA, from 2015 to 2016. He has authored over 70 technical papers in the field of video/image processing and communications and over 60 standards contributions. His research interests include video/image coding, deep/machine learning, multimedia analytics, and computer vision.

Dr. Peng is a Technical Committee Member of the Visual Signal Processing and Communications and Multimedia Systems and Application tracks of the IEEE Circuits and Systems Society (CASS). He was Technical Program Co-chair for 2011 IEEE VCIP, 2017 IEEE ISPACS, and 2018 APSIPA ASC; Publication Chair for 2019 IEEE ICIP; Area Chair for IEEE ICME and VCIP; and Review Committee Member for IEEE ISCAS. He served as AEiC for Digital Communications/Lead Guest Editor/Guest Editor/SEB Member for IEEE JETCAS, Associate Editor for IEEE TCSVT, and Guest Editor for IEEE TCAS-II. More recently, he was elected Distinguished Lecturer of APSIPA and Chair Elect of IEEE CASS VSPC Technical Committee.

Prof. Hsueh-Ming Hang
Electronic Engineering Dept., National Chiao Tung University, Taiwan
hmhang@nctu.edu.tw

Prof. Hsueh-Ming Hang Hsueh-Ming Hang received the B.S. and M.S. degrees from National Chiao Tung University, Hsinchu, Taiwan, in 1978 and 1980, respectively, and Ph.D. in Electrical Engineering from Rensselaer Polytechnic Institute, Troy, NY, in 1984. From 1984 to 1991, he was with AT&T Bell Laboratories, Holmdel, NJ, and then he joined the Electronics Engineering Department of National Chiao Tung University (NCTU), Hsinchu, Taiwan, in December 1991. From 2006 to 2009, he was appointed as Dean of the EECS College at National Taipei University of Technology (NTUT). From 2014 to 2017, he served as the Dean of the ECE College at NCTU. He has been actively involved in the international MPEG standards since 1984 and his current research interests include multimedia compression, spherical image/video processing, and deep-learning based image/video processing.

He was an associate editor (AE) of the IEEE Transactions on Image Processing (1992-1994, 2008-2012) and the IEEE Transactions on Circuits and Systems for Video Technology (1997-1999). He is a co-editor and contributor of the Handbook of Visual Communications published by Academic Press in 1995. He was an IEEE Circuits and Systems Society Distinguished Lecturer (2014-2015) and was a Board Member of the Asia-Pacific Signal and Information Processing Association (APSIPA) (2013-2018) and a General Co-chair of IEEE International Conference on Image Processing (ICIP) 2019. He is a recipient of the IEEE Third Millennium Medal and is a Fellow of IEEE and IET and a member of Sigma Xi.


Abstract

The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.

The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.

In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.


Versatile Video Coding – Algorithms and Specification

Mathias Wien and Benjamin Bross


Date: 1 December 2020
Time: PM

Dr. Mathias Wien
RWTH Aachen University, Kopernikusstr. 16, 52074 Aachen, Germany
wien@lfb.rwth-aachen.de

Dr. Mathias Wien Mathias received the Diploma and Dr.-Ing. degrees from Rheinisch-Westfälische Technische Hochschule Aachen (RWTH Aachen University), Aachen, Germany, in 1997 and 2004, respectively. In 2018, he achieved the status of the habilitation, which makes him an independent scientist in the field of visual media communication. He was with Institut für Nachrichtentechnik, RWTH Aachen University (head: Prof. Jens-Rainer Ohm) as a researcher from 1997-2006, and as senior researcher and head of administration from 2006-2018. Since July 2018, he is with Lehrstuhl für Bildverarbeitung, RWTH Aachen University (head: Prof. Dorit Merhof) as senior researcher, leader of the Visual Media Communication group, and head of administration. His research interests include image and video processing, immersive, space-frequency adaptive and scalable video compression, and robust video transmission.

Mathias has been an active contributor to H.264/AVC, HEVC, and VVC. He has participated and contribute to ITU-T VCEG, ISO/IEC MPEG, the Joint Video Team (JVT), the Joint Collaborative Team on Video Coding (JCT-VC), and the Joint Video Experts Team (JVET) of VCEG and ISO/IEC MPEG. He has served as a co-editor of the scalability amendment to H.264/AVC (SVC). In the aforementioned standardization bodies, he has co-chaired and coordinated several AdHoc groups as well as tool- and core experiments. Mathias has published more than 60 scientific articles and conference papers in the area of video coding and has co-authored several patents in this area.

Mathias is member of the IEEE Signal Processing Society and the IEEE Circuits and Systems Society. He is a member of IEEE CASS TC VSPC. He is Technical Program Co-Chair of PCS 2019 and PCS 2021 and has co-organized and co-chaired special sessions at IEEE VCIP and PCS. He was the Corresponding Guest Editor of a IEEE JETCAS Special Issue on Immersive Video Coding and Transmission. He has co-organized and co-chaired the Grand Challenge on Video Compression Technology at IEEE ICIP 2017. He serves as associate editor for IEEE Transactions on Circuits and Systems for Video Technology, and Signal Processing: Image Communication. Mathias has further authored and co-authored more than 200 standardization documents. He has published the Springer textbook “High Efficiency Video Coding: Coding Tools and Specification”, which fully covers Version 1 of HEVC.


Dr. Benjamin Bross
Fraunhofer Heinrich Hertz Institute, Einsteinufer 37, 10587 Berlin, Germany
benjamin.bross@hhi.fraunhofer.de

Dr. Benjamin Bross Benjamin Bross received the Dipl.-Ing. degree in electrical engineering from RWTH Aachen University, Aachen, Germany, in 2008. In 2009, he joined the Fraunhofer Institute for Telecommunications – Heinrich Hertz Institute, Berlin, Germany, where he is currently heading the Video Coding Systems group at the Video Coding & Analytics Department, Berlin and a part-time lecturer at the HTW University of Applied Sciences Berlin. Since 2010, Benjamin is very actively involved in the ITU-T VCEG | ISO/IEC MPEG video coding standardization processes as a technical contributor, coordinator of core experiments and chief editor of the High Efficiency Video Coding (HEVC) standard [ITU-T H.265 | ISO/IEC 23008-2] and the emerging Versatile Video Coding (VVC) standard. In addition to his involvement in standardization, Benjamin is coordinating standard-compliant software implementation activities. This included the development of an HEVC encoder that is currently deployed in broadcast for HD and UHD TV channels as well as optimized implementations of VVC.

Besides giving talks about recent video coding technologies, Benjamin Bross is an author or co-author of several fundamental HEVC and VVC-related publications, and an author of two book chapters on HEVC and Inter-Picture Prediction Techniques in HEVC. He received the IEEE Best Paper Award at the 2013 IEEE International Conference on Consumer Electronics – Berlin in 2013, the SMPTE Journal Certificate of Merit in 2014 and an Emmy Award at the 69th Engineering Emmy Awards in 2017 as part of the Joint Collaborative Team on Video Coding for its development of HEVC.


Abstract

The tutorial provides an overview on the latest emerging video coding standard VVC (Versatile Video Coding) to be jointly published by ITU-T and ISO/IEC. It has been developed by the Joint Video Experts Team (JVET), consisting of ITU-T Study Group 16 Question 6 (known as VCEG) and ISO/IEC JTC 1/SC 29/WG 11 (known as MPEG). VVC has been designed to achieve significantly improved compression capability compared to previous standards such as HEVC, and at the same time to be highly versatile for effective use in a broadened range of applications. Some key application areas for the use of VVC particularly include ultra-high-definition video (e.g. 4K or 8K resolution), video with a high dynamic range and wide colour gamut (e.g., with transfer characteristics specified in Rec. ITU-R BT.2100), and video for immersive media applications such as 360° omnidirectional video, in addition to the applications that have commonly been addressed by prior video coding standards. Important design criteria for VVC have been low computational complexity on the decoder side and friendliness for parallelization on various algorithmic levels. VVC is planned to be finalized by July 2020 and is expected to enter the market very soon.

The tutorial details the video layer coding tools specified in VVC and develops the concepts behind the selected design choices. While many tools or variants thereof have been available before, the VVC design reveals many improvements compared to previous standards which result in compression gain and implementation friendliness. Furthermore, new tools such as the Adaptive Loop Filter, or Matrix-based Intra Prediction have been adopted which contribute significantly to the overall performance. The high-level syntax of VVC has been re-designed compared to previous standards such as HEVC, in order to enable dynamic sub-picture access as well as major scalability features already in version 1 of the specification.

Learning Objectives
The tutorial shall enable the participants to understand the design principles and concepts behind the specification of VVC. They shall recognize and understand the innovation of VVC compared to the previous VCIP 2020 | Tutorial Proposal on Versatile Video Coding – Algorithms and Specification | M. Wien and B. Bross Page 3 / 6 standards (esp. HEVC). The tutorial shall educate the participants to further explore related knowledge resources and, specifically, the specification text itself.


Learned image and video compression with deep neural networks

Prof. Dong Xu, Dr. Guo Lu, Dr. Ren Yang and Dr. Radu Timofte


Date: 1 December 2020
Time: PM

Prof. Dong Xu
School of Electrical and Information Engineering, The University of Sydney, Australia
dong.xu@sydney.edu.au

Prof. Dong Xu Dong Xu is Chair in Computer Engineering at the School of Electrical and Information Engineering, The University of Sydney, Australia. After receiving his PhD degree in 2005, he worked as a postdoctoral research scientist at Columbia University from 2006 to 2007 and a faculty member at Nanyang Technological University from 2007 to 2015. He has published more than 100 papers in IEEE Transactions and top tier conferences, among which two of his co-authored works (with his former PhD students) won the prestigious IEEE T-MM Prize Paper Award and CVPR Best Student Paper Award. His publications have received over 18,000 citations in Google Scholar. He was selected as the Clarivate Analytics Highly Cited Researcher in the field of Engineering in 2018 and awarded the IEEE Computational Intelligence Society Outstanding Early Career Award in 2017. Dr. Xu is/was on the editorial boards of T-IP (2017-present), T-MM (2016-2018), T-CSVT (2016-2017), T-PAMI (2014-2019) and T-NNLS (2013-2017), as well as other four journals. He also served as a guest editor of ten special issues in IJCV, T-NNLS, T-CSVT, T-CYB, IEEE Multimedia, ACM TOMM, CVIU and other journals. He will serve/served as a General Co-chair of MLSP 2021, a Program Co-chair of IEEE Signal and Data Science Forum (Sponsored by the IEEE SPS) in 2016, a Program Co-chair of ICME 2014 and a Program Co-chair of PCM 2012. Moreover, he served as a steering committee member of ICME (2016-2017), an area chair of AAAI 2020, ICCV 2017, ACM MM 2017, ECCV 2016, CVPR 2012, ICIP (2015-2020) and MMSP (2016-2018), as well as a track chair of ICPR 2016. He was also involved in the organization committees of many international conferences such as GlobalSIP 2019, MMSP 2019, ICIP 2017, MMSP 2016, VCIP 2015, ChinaSIP 2015 and GlobalSIP 2015. He received the Best Associate Editor Award of T-CSVT in 2017. He is a fellow of the IEEE.


Dr. Guo Lu
School of Computer Science, Beijing Institute of Technology, China
luguo2014@sjtu.edu.cn

Dr. Guo Lu Guo Lu received his PhD degree from Shanghai Jiao Tong University in 2020 and the B.S. degree from Ocean University of China in 2014. Currently, he is an assistant professor with the School of Computer Science, Beijing Institute of Technology, China. His research interests include image and video processing, video compression and computer vision. His works have been published in top-tier journals and conferences (e.g., T-PAMI, T-IP, CVPR and ECCV).


Dr. Ren Yang
ETH Zurich, Switzerland
reyang@ee.ethz.ch

Dr. Ren Yang Ren Yang is a doctoral student at ETH Zurich, Switzerland. He received his M.Sc. degree in 2019 at the School of Electronic and Information Engineering, Beihang University, China, and obtained his B.Sc. degree at Beihang University in 2016. His research interests mainly include computer vision and video compression. He has published several papers in top international journals and conference proceedings, such as IEEE T-PAMI, IEEE T-IP, IEEE T-CSVT, CVPR, ICCV and ICME. He serves as a reviewer for top conferences and journals, such as ECCV, ACCV, IEEE T-IP, IEEE T-CSVT, IEEE T-MM, Elsevier’s SPIC, Elsevier’s NEUCOM, and IEEE Access. He won the winner award of the Three Minute Thesis Competition at IEEE ICME 2019.


Dr. Radu Timofte
Computer Vision Laboratory, at ETH Zurich, Switzerland
radu.timofte@vision.ee.ethz.ch

Dr. Radu Timofte Radu Timofte is lecturer and research group leader in the Computer Vision Laboratory, at ETH Zurich, Switzerland. He obtained a PhD degree in Electrical Engineering at the KU Leuven, Belgium in 2013, the MSc at the Univ. of Eastern Finland in 2007, and the Dipl. Eng. at the Technical Univ. of Iasi, Romania in 2006. He serves as a reviewer for top journals (such as TPAMI, TIP, IJCV, TNNLS) and conferences (ICCV, CVPR, NeurIPS) and is associate editor for top journals: Elsevier CVIU, IEEE Trans. PAMI , Elsevier Neurocomputing and SIAM Journal on Imaging Sciences. He serves(d) as area chair / SPC for ICCV’19, ECCV’20, ACCV’18,’20, CVPR’21, IJCAI’19,’20,’21. He received a NIPS 2017 best reviewer award. His work received multiple awards such as the best student paper award at BMVC 2019, a best scientific paper award at ICPR 2012, the honorable mention award at FG 2017, and his team won a number of challenges including traffic sign detection (IJCNN 2013), age estimation (ICCV 2015) and real world super-resolution (ICCV 2019). He is co-founder of Merantix and co-organizer of NTIRE, CLIC, AIM and PIRM events. His current research interests include sparse and collaborative representations, deep learning, optical flow, image/video compression, restoration and enhancement.


Abstract

This tutorial aims at reviewing the recent progress in the deep learning based data compression, including image compression and video compression. In the past years, deep learning techniques have been successfully applied to a large number of computer vision and image processing tasks. However, for the data compression task, the traditional approaches (i.e., block based motion estimation and motion compensation, etc.) are still widely employed in the mainstream codecs. Considering the powerful representation capability, it is possible to improve the data compression performance by employing the advanced deep learning technologies. To this end, deep leaning based compression approaches have recently received significant attention from both academia and industry in the field of computer vision and image/video compression.
In this tutorial, we will introduce the related deep learning techniques for image compression and video compression. Specifically, in this tutorial, we will first introduce the basic pipeline for the traditional codecs, such as JPEG, H.264 and HEVC. Then, we will discuss the common network architectures for visual data compression and analyse different learning based entropy models. Based on these techniques, we will describe several widely used end-to-end optimized frameworks for visual data compression. In summary, our tutorial will cover both the traditional data coding techniques and the popular learning based visual data compression algorithms, which will help the audiences with different backgrounds learn the recent progresses in this emerging research area.