Сжатие видео - Разное
Английские материалы |
|||
Авторы | Название статьи | Описание | Рейтинг |
Nikos Grammalidis, Dimitris Beletsiotis, and Michael G. Strintzis | Sprite Generation and Coding in Multiview Image Sequences |
Abstract—A novel algorithm for the generation of background sprite images from multiview image sequences is presented. A dynamic programming algorithm, first proposed in [1] using a multiview matching cost, as well as pure geometrical constraints, is used to provide an estimate of the disparity field and to identify occluded areas. By combining motion, disparity, and occlusion information, a sprite image corresponding to the first (main) view at the first time instant is generated. Image pixels from other views that are occluded in the main view are also added to the sprite. Finally, the sprite coding method defined by MPEG-4 is extended for multiview image sequences based on the generated sprite. Experimental results are presented, demonstrating the performance of the proposed technique and comparing it with standard MPEG-4 coding methods applied independently to each view. RAR 883 кбайт |
|
Javier Zamora, Stephen Jacobs, Alexandros leftheriadis, Shih-Fu Chang, and Dimitris Anastassiou, Fellow | A Practical Methodology for Guaranteeing Quality of Service for Video-on-Demand |
Abstract—A novel and simple approach for defining end-to-end quality of service (QoS) in video-on-demand (VoD) services is presented. Using this approach, we derive a schedulable region for a video server which guarantees end-to-end QoS, where a specific QoS required in the video client translates into a QoS specification for the video server. Our methodology is based on a generic model for VoD services, which is extendible to any VoD system. In this kind of system, both the network and the video server are potential sources of QoS degradation. Specifically, we examine the effect that impairments in the video server and video client have on the video quality perceived by the end user. The Columbia VoD testbed is presented as an example to validate the model through experimental results. Our model can be connected to network QoS admission- control models to create a unified approach for admission control of incoming video requests in the video server and network. RAR 601 кбайт |
|
Jiwu Huang, Yun Q. Shi, and Yi Shi | Embedding Image Watermarks in DC Components |
Abstract—Both watermark structure and embedding strategy affect robustness of image watermarks. Where should watermarks be embedded in discrete cosing transform (DCT) domain in order for the invisible image watermarks to be robust? Though many papers in the literature agree that watermarks should be embedded in perceptually significant components, dc components are explicitly excluded from watermark embedding. In this letter, a new embedding strategy for watermarking is proposed based on a quantitative analysis on the magnitudes of DCT components of host images. We argue that more robustness can be achieved if watermarks are embedded in dc components since dc components have much larger perceptual capacity than any ac components. Based on this idea, an adaptive watermarking algorithm is presented.We incorporate the feature of texture masking and luminance masking of the human visual system into watermarking. Experimental results demonstrate that the invisible watermarks embedded with the proposed watermark algorithm are very robust. RAR 425 кбайт |
|
L. Lentola, G. M. Cortelazzo, E. Malavasi, and A. Baschirotto | Design of SC Filters for Video Applications |
Abstract—This work examines the pratical issues concerning the realization of switched-capacitors video-rate finite-impulse response filters based on standard CMOS technology. The capacitor spread/sensitivity tradeoff is explored for a number of architectural solutions and an architecture suitable to the challenges of the video field is identified. The nonidealities of switches and operational amplifiers, which must be accounted for at video-rate, are analyzed and pratical solutions to prevent performance deterioration are proposed. The implementation of a concrete design case is discussed and presented as a demonstration of the technological solutions proposed. RAR 323 кбайт |
|
Marcus Magnor and Bernd Girod | Data Compression for Light-Field Rendering |
Abstract—Two light-field compression schemes are presented. The codecs are compared with regard to compression efficiency and rendering performance. The first proposed coder is based on video-compression techniques that have been modified to code the four-dimensional light-field data structure efficiently. The second coder relies entirely on disparity-compensated image prediction, establishing a hierarchical structure among the light-field images. Coding performance of both schemes is evaluated using publicly available light fields of synthetic, as well as real-world, scenes. Compression ratios vary between 100 : 1 and 2000 : 1, depending on reconstruction quality and light-field scene characteristics. RAR 405 кбайт |
|
Jeong-Kwon Kim, Kyeong Ho Yang, and Choong Woong Lee, Fellow | Document Image Compression by Nonlinear Binary Subband Decomposition and Concatenated Arithmetic Coding |
Abstract—This paper proposes a new subband coding approach to compression of document images, which is based on nonlinear binary subband decomposition followed by the concatenated arithmetic coding. We choose to use the sampling-exclusive OR (XOR) subband decomposition to exploit its beneficial characteristics to conserve the alphabet size of symbols and provide a small region of support while providing the perfect reconstruction property.We propose a concatenated arithmetic coding scheme to alleviate the degradation of predictability caused by subband decomposition, where three high-pass subband coefficients at the same location are concatenated and then encoded by an octave arithmetic coder. The proposed concatenated arithmetic coding is performed based on a conditioning context properly selected by exploiting a nature of the sampling-XOR subband filter bank as well as taking the advantage of noncausal prediction capability of subband coding. We also introduce a unicolor map to efficiently represent large uniform regions frequently appearing in document images. Simulation results show that each of the functional blocks proposed in the paper performs very well, and consequently, the proposed subband coder provides good compression of document images. RAR 202 кбайт |
|
Chin-Chen Chang, Jer-Sheng Chou, and Tung-Shou Chen | An Efficient Computation of Euclidean Distances Using Approximated Look-Up Table |
Abstract—For fast vector quantization (VQ) encoding, we present in this paper a new method to speed up the calculation of the squared Euclidean distance between two vectors. We call it the approximated look-up table (ALUT) method. This method considers the frequency of each squared number that occurs in the equation of squared Euclidean distances, and generates a more practical table to store squared numbers. ALUT makes use of this table and some simple operations to speed up the calculation of squared Euclidean distances. From the VQ simulation results, we see that ALUT saves memory and produces better image quality compared with some other methods. It is a suitable method for VLSI implementation. RAR 355 кбайт |
|
Debargha Mukherjee, Jong Jin Chae, Sanjit K. Mitra, Fellow, and B. S. Manjunath | A Source and Channel-Coding Framework for Vector-Based Data Hiding in Video |
Abstract—Digital data hiding is a technology being developed for multimedia services, where significant amounts of secure data is invisibly hidden inside a host data source by the owner, for retrieval only by those authorized. The hidden data should be recoverable even after the host has undergone standard transformations, such as compression. In this paper, we present a source and channel coding framework for data hiding, allowing any tradeoff between the visibility of distortions introduced, the amount of data embedded, and the degree of robustness to noise. The secure data is source coded by vector quantization, and the indices obtained in the process are embedded in the host video using orthogonal transform domain vector perturbations. Transform coefficients of the host are grouped into vectors and perturbed using noise-resilient channel codes derived from multidimensional lattices. The perturbations are constrained by a maximum allowable mean-squared error that can be introduced in the host. Channel-optimized can be used for increased robustness to noise. The generic approach is readily adapted to make retrieval possible for applications where the original host is not available to the retriever. The secure data in our implementations are low spatial and temporal resolution video, and sampled speech, while the host data is QCIF video. The host video with the embedded data is H.263 compressed, before attempting retrieval of the hidden video and speech from the reconstructed video. The quality of the extracted video and speech is shown for varying compression ratios of the host video. RAR 565 кбайт |
|
Bai-Jue Shieh, Yew-San Lee, and Chen-Yi Lee | A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction |
Abstract—In this paper, we present a high-throughput memorybased VLC decoder with codeword boundary prediction. The required information for prediction is added to the proposed branch models. Based on an efficient scheme, these branch models and the Huffman tree structure are mapped onto memory modules. Taking the prediction information, the decompression scheme can determine the codeword length before the decoding procedure is completed. Therefore, a parallel-processor architecture can be applied to the VLC decoder to enhance the system performance.With a clock rate of 100 MHz, a dual-processor decoding process can achieve decompression rate up to 72.5 Msymbols/s on the average. Consequently, the proposed VLC decompression scheme meets the requirements of current and advanced multimedia applications. RAR 200 кбайт |
|
Jae Ho Jeon, Young Seo Park, and Hyun Wook Park | A Fast Variable-Length Decoder Using Plane Separation |
Abstract—This paper has developed a fast variable-length decoder which uses a plane separation technique to reduce the processing time of the feedback path in the decoder. The developed decoder performs two shift processes and a decision process concurrently. Therefore, the processing time in the feedback path of our developed variable length decoder can be improved and determined by the longest time among the three processes, not by the sum of their processing times together. Our simulation results showthat the total processing time of our developed decoder makes about 30% improvement from that of the Sun and Lei’s decoder and their modified decoder when they are implemented with field programmable logic device. RAR 140 кбайт |
|
Osama K. Al-Shaykh, Eugene Miloslavsky, Toshio Nomura, Ralph Neff, and Avideh Zakhor | Video Compression Using Matching Pursuits |
Abstract—The use of matching pursuit (MP) to code video using overcomplete Gabor basis functions has recently been introduced. In this paper, we propose new functionalities such as SNR scalability and arbitrary shape coding for video coding based on matching pursuit. We improve the performance of the baseline algorithm presented earlier by proposing a new search and a new position coding technique. The resulting algorithm is compared to the earlier one and to DCT-based coding. signal. As the bit rate decreases, the distortion introduced by matching pursuit coding takes the form of a gradually increasing blurriness or loss of detail. RAR 918 кбайт |
|
Jin Woo Park, Kun Woen Song, Ho Young Lee, Jae Yeal Nam, and Yeong-Ho Ha | Topological Surgery Encoding Improvements Based on Adaptive Bit Allocation and DFSVQ |
Abstract—In this paper, new methods to improve the encoding of connectivity and geometry of the topological surgery scheme are proposed. In connectivity compression, after obtaining the vertex and triangle spanning trees by decomposing a threedimensional object, bits are adaptively allocated to each run of two spanning trees on a threshold basis. The threshold is the length of a binary number of the maximum run length. If a run length exceeds the threshold, it is represented by a binary number of the run length. Otherwise, it is represented by a bit sequence. Therefore, compression efficiency is enhanced through an adaptive bit allocation to each run of two spanning trees. In geometry compression, since vertices represented by threedimensional vectors are stored according to the order of the travelling along vertex spanning tree by depth-first searching, they have geometrical closeness. The geometry compression effi- ciency can be improved if the local characteristics of vectors are considered. Therefore, dynamic finite state vector quantization, which has subcodebooks depending on a local characteristic of vectors, is used to encode the geometry information. As it dynamically constructs a subcodebook by predicting an input vector’s state, it produces less distortion and gives better visual quality than conventional methods. RAR 394 кбайт |
|
C. De Vleeschouwer and B. Macq | Subband Dictionaries for Low-Cost Matching Pursuits of Video Residues |
Abstract—“Matching pursuits” is a signal expansion technique whose efficiency for video coding has already been largely demonstrated in the MPEG-4 framework. In this paper, our attention focuses on complexity issues. First, the most expensive step of the signal expansion process is significantly speeded up by exploiting results achieved in the wavelet and multiresolution theory. A subband dictionary is proposed as an alternative to the Gabor dictionary that has been used up to now. Equivalent levels of quality are achieved with both dictionaries, but the computational cost is significantly reduced when using the subband one. Then, we explain how, with any dictionary, the linearity of the inner product could be exploited to further speed up the process in return for an increased amount of memory. RAR 603 кбайт |
|
? | Special Issue on Representation and Coding of Images and Video II |
RAR 125 кбайт |
|
Wenwu Zhu, Zixiang Xiong, and Ya-Qin Zhang | Multiresolution Watermarking for Images and Video |
Abstract—This paper proposes a unified approach to digital watermarking of images and video based on the two- and threedimensional discrete wavelet transforms. The hierarchical nature of the wavelet representation allows multiresolutional detection of the digital watermark, which is a Gaussian distributed random vector added to all the high-pass bands in the wavelet domain. We show that when subjected to distortion from compression or image halftoning, the corresponding watermark can still be correctly identified at each resolution (excluding the lowest one) in the wavelet domain. Computational savings from such a multiresolution watermarking framework is obvious, especially for the video case. RAR 569 кбайт |
|
Tihao Chiang, and Dimitris Anastassiou, Fellow | Hierarchical HDTV/SDTV Compatible Coding Using Kalman Statistical Filtering |
Abstract—This paper addresses the issue of hierarchical coding of digital television. A two-layer coding scheme is presented to provide compatibility of standard-definition television (SDTV) and high-definition television (HDTV). The scheme is based on a spatio-temporal pyramid coding technique. We address the problem of interlaced-to-interlaced two-layer compatible coding where both layers are interlaced. The resolution translation is important for the visual quality of the SDTV layer and for the performance of the HDTV layer. A motion-compensated up/down deinterlacing scheme is used to improve the performance. A spatio-temporal averaging technique is used to give a better compatible prediction so that the HDTV layer has a high compression performance. To offer an improved prediction, systematic analysis of the remaining statistical redundancy of the enhancement signal is conducted. Based on an autoregressive model of the difference signal, a Kalman statistical filtering is used to exploit such a redundancy. We combine a recursive filtering and discrete cosine transform (DCT) coding using QR decomposition, where Q is an orthonormal matrix and R is an upper triangular matrix. The error accumulation is cancelled in the DCT frequency domain. Our results show peak signal-to-noise-ratio improvements over simulcast as high as 1.2 dB. The new technique, which is referred to as spatial scalability using a Kalman filter (SSKF), achieves a comparable or better picture quality than that of a nonscalable approach for high-quality video coding. The near optimal performance is demonstrated by the white Gaussian noise property of the residual signal. RAR 317 кбайт |
|
Ravi Krishnamurthy, John W. Woods, Fellow, and Pierre Moulin, | Frame Interpolation and Bidirectional Prediction of Video Using Compactly Encoded Optical-Flow Fields and Label Fields |
Abstract—We consider the problems of motion-compensated frame interpolation (MCFI) and bidirectional prediction in a video coding environment. These applications generally require good motion estimates at the decoder. In this paper, we use a multiscale optical-flow-based motion estimator that provides smooth, natural motion fields under bit-rate constraints. These motion estimates scale well with change in temporal resolution and provide considerable flexibility in the design and operation of coders and decoders. In the MCFI application, this estimator provides excellent interpolated frames that are superior to those of conventional motion estimators, both visually and in terms of peak signal-to-noise ratio (PSNR). We also consider the effect of occlusions in the bidirectional prediction application and introduce a dense label field that complements our motion estimator. This label field enables us to adaptively weight the forward and backward predictions and gives us substantial visual and PSNR improvements in the covered/uncovered regions of the sequence. from two reference pictures ( or ), one before and one after the current -frame. -frames often lead to a greater degree of compression due to the following reasons. RAR 690 кбайт |
|
D. Wilson and Mohammed Ghanbari, | Exploiting Interlayer Correlation of SNR Scalable Video |
Abstract— There is a significant interlayer correlation between the base and enhancement layers of two-layer signal-tonoise- ratio (SNR) scalable coders. By delaying the enhancementlayer bitstream, spatial and temporal interlayer correlations can be minimized or even made negative. Such an arrangement improves the statistical multiplexing gain to the extent that, even taking into account the normally higher bit rate of SNR scalable coders, negatively correlated two-layer sources can be more efficiently multiplexed than their single-layer counterparts. RAR 340 кбайт |
|
P. M. B. van Roosmalen, R. L. Lagendijk, and J. Biemond | Correction of Intensity Flicker in Old Film Sequences |
Abstract—Temporal intensity flicker is a common artifact in old film sequences. Removing disturbing temporal fluctuations in image intensities is desirable because it increases both the subjective quality and, where image sequences are stored in a compressed format, the coding efficiency. We describe a robust technique that corrects intensity flicker automatically by equalizing local frame means and variances in the temporal sense. The low complexity of our method makes it suitable for hardware implementation. We tested the proposed method on sequences with artificially added intensity flicker and on original film material. The results show a great improvement. RAR 443 кбайт |
|
Emile Sahouria and Avideh Zakhor | Content Analysis of Video Using Principal Components |
Abstract— We use principal component analysis (PCA) to reduce the dimensionality of features of video frames for the purpose of content description. This low-dimensional description makes practical the direct use of all the frames of a video sequence in later analysis. The PCA representation circumvents or eliminates several of the stumbling blocks in current analysis methods and makes new analyses feasible. We demonstrate this with two applications. The first accomplishes high-level scene description without shot detection and key-frame selection. The second uses the time sequences of motion data from every frame to classify sports sequences. RAR 851 кбайт |
|
Limin Wang and Andrґe Vincent | Bit Allocation and Constraints for Joint Coding of Multiple Video Programs |
Abstract—Recent studies have shown that joint coding is more efficient and effective than independent coding for compression of multiple video programs [3]–[7]. Unlike independent coding, joint coding is able to dynamically distribute the channel capacity among video programs according to their respective complexities and hence achieve a more uniform picture quality. This paper examines the bit-allocation issues for joint coding of multiple video programs and provides a bit-allocation strategy that results in a uniform picture quality among programs as well as within a program. To prevent the encoder/decoder buffers from over- flowing and underflowing, further constraints on bit allocation are also discussed. RAR 370 кбайт |
|
Shi Hwa Lee, Dae-Sung Cho, Yu-Shin Cho, Se Hoon Son, Euee S. Jang, Jae-Seob Shin, and Yang Seock Seo | Bit Allocation and Constraints for Joint Coding of Multiple Video Programs |
Abstract— Here, we propose a new shape-coding algorithm called baseline-based binary shape coding (BBSC), where the outer or inner contours of an arbitrarily shaped object are represented by traced one-dimensional (1-D) data from the baseline with turning points (TP’s). There are two coding modes, i.e., the intra and inter modes as in texture coding. In the intra mode, the differential values of the neighboring 1-D distance values and TP’s corresponding to the given shape are encoded by the entropy coder. In the inter mode, object identification, global shape matching, and local contour matching are employed for motion compensation/estimation. Lossy shape coding is enabled by variable sampling in each contour segment or by allowing some predefined error when performing motion compensation. We compare the proposed method with the bitmap-based method of context-based arithmetic encoding (CAE). Simulation results show that the proposed method is better than CAE in coding efficiency for intra mode and better in subjective quality for both intra and inter modes, although the CAE method has performed better than the proposed method in inter mode. RAR 1460 кбайт |
|
K. W. Ng and Kai-hau A. Yeung | Analysis on Disk Scheduling for Special User Functions |
Abstract—Previous studies on disk scheduling for video services were usually based on computer simulation. In this paper, we present analysis on disk scheduling for video services. This paper first gives a short review on the various disk-scheduling algorithms. It then concentrates on the analysis of two major disk-scheduling algorithms, namely, CLOOK and LOOK. The purpose of the analysis is to obtain the maximum number of simultaneous users supported by systems using these two algorithms. The results of the analysis show that the CLOOK algorithm performs slightly better than LOOK algorithm in video applications. Then, this paper discusses disk scheduling for supporting special user functions such as “forward search” and “reverse search.” It is shown that the maximum number of user streams supported drops dramatically when such user functions are used. A technique called redundancy for special user functions (RESUF) is then studied in this paper. Analysis of the technique shows that it can keep the I/O demands to almost constant under all user request conditions. RAR 321 кбайт |
|
Alan Hanjalic and HongJiang Zhang, | An Integrated Scheme for Automated Video Abstraction Based on Unsupervised Cluster-Validity Analysis |
Abstract—Key frames and previews are two forms of a video abstract, widely used for various applications in video browsing and retrieval systems. We propose in this paper a novel method for generating these two abstract forms for an arbitrary video sequence. The underlying principle of the proposed method is the removal of the visual-content redundancy among video frames. This is done by first applying multiple partitional clustering to all frames of a video sequence and then selecting the most suitable clustering option(s) using an unsupervised procedure for clustervalidity analysis. In the last step, key frames are selected as centroids of obtained optimal clusters. Video shots, to which key frames belong, are concatenated to form the preview sequence. RAR 350 кбайт |
|
Patrick Bouthemy, Marc Gelgon, and Fabrice Ganansia | A Unified Approach to Shot Change Detection and Camera Motion Characterization |
Abstract—This paper describes an original approach to partitioning of a video document into shots. Instead of an interframe similarity measure which is directly intensity based, we exploit image motion information, which is generally more intrinsic to the video structure itself. The proposed scheme aims at detecting all types of transitions between shots using a single technique and the same parameter set, rather than a set of dedicated methods. The proposed shot change detection method is related to the computation, at each time instant, of the dominant image motion represented by a two-dimensional affine model. More precisely, we analyze the temporal evolution of the size of the support associated to the estimated dominant motion. Besides, the computation of the global motion model supplies by-products, such as qualitative camera motion description, which we describe in this paper, and other possible extensions, such as mosaicing and mobile zone detection. Results on videos of various content types are reported and validate the proposed approach. RAR 2098 кбайт |
|
Yi-Jen Chiu, and Toby Berger, Fellow | A Software-Only Videocodec Using Pixelwise Conditional Differential Replenishment and Perceptual Enhancements |
Abstract—Designing a videocodec involves a four-way tradeoff among computational complexity, data rate, picture quality, and latency. Rapid advancement in very large-scale integration technology has provided CPU’s with enough power to accommodate a software-only videocodec. Accordingly, computational complexity has resurfaced as a major element in this tradeoff. With a view toward significantly reducing computational complexity relative to standards-based videocodecs, we introduce a pixelwise conditional differential replenishment scheme to compress video via perception-sensitive decomposition of difference frames into a facsimile map and an intensity vector. Our schemes, which apply techniques from facsimile, are transform free. Some of them also involve no motion compensation and hence are completely free of block-based artifacts and particularly computationally economical. The fusion of our facsimile-based video-coding schemes and spatio-temporal perceptual-coding techniques facilitates powerful software-only video conferencing on today’s medium- and highend personal computers. Indeed, assuming that a frame-capture driver has been provided, our motion-compensation-free approach has yielded a software-only, full-duplex, full-color videoconferencing system that conveys high-quality, CIF/Q-NTSCsized video at 30 frames per second on 200-MHz Pentium PC’s sending less than 300 Kbps in each direction. We also present new spatio-temporal compression techniques for perceptual coding of video. These techniques, motivated by the classical psychological experiments that led to formulation of the Weber–Fechner law, allow videocodec systems to capitalize on properties of the human visual system. Some of our spatiotemporal perceptual techniques not only apply to our proprietary pixelwise conditional differential replenishment schemes that we describe for video conferencing but also can readily be incorporated into today’s popular video standards. Temporal correlation usually can be reduced significantly via forward, backward, or interpolative prediction based on motion compensation. The remaining spatial correlation in the temporal prediction error images can be reduced via transform coding. In addition to spatial and temporal redundancies, perceptual redundancy also has become a target [1]. RAR 279 кбайт |
|
Zhishi Peng, Yih-Fang Huang, Daniel J. Costello, and Robert L. Stevenson | A Pyramidal Image Coder Using Generalized Rank-Ordered Prediction Filter |
Abstract — This paper presents a lossy image compression scheme that employs a generalized rank-ordered prediction filter for pyramidal image coding. The proposed prediction method renders significantly reduced variance of the quantizer input. Consequently, the quality of the decompressed image is much enhanced due to the greatly reduced quantization distortion. Both analytical and simulation results show that the proposed scheme yields high-quality performance. RAR 161 кбайт |
|
Gregory J. Conklin and Sheila S. Hemami | A Comparison of Temporal Scalability Techniques |
Abstract—A temporally scalable video coding algorithm allows extraction of video of multiple frame rates from a single coded stream. In recent years, several video coding techniques have been proposed that provide temporal scalability using subband coding, both without and with motion compensation. With a two-band subband decomposition applied hierarchically, frame rates halve after each filtering operation. Alternatively, motion-compensated prediction (as used in MPEG) can provide temporal scalability and the same frame rates as temporal subband coding through strategic placement of reference frames and selective decoding of frames. This paper compares three temporal coding techniques with respect to providing temporal scalability: temporal subband coding (TSB), motion-compensated temporal subband coding (MC-TSB), and motion compensated prediction (MCP). Predicted rate-distortion performances at full- and lower frame rates and experimental quantitative and visual performances from coding several video sequences are compared. The comparison is explicitly for temporal coding when the dimensionality of the subsequent source coding is held constant; any spatial or higher dimensional source coding can follow. In theory and in practice, MCP and MC-TSB always outperform TSB. For high-bit-rate full-frame-rate video, the performances of MCP and MC-TSB are approximately equivalent. However, to provide temporal scalability, MCP clearly provides the best performance in terms of visual quality, quantitative quality, and bit rate of the lower frame-rate video. RAR 524 кбайт |
|
Peter Pirsch, Fellow, and Hans-Joachim Stolberg, | VLSI Implementations of Image and Video Multimedia Processing Systems |
Abstract—An overview of very large scale integrated (VLSI) implementations of multimedia processing systems is given with particular emphasis on architectures for image and video processing. Alternative design approaches are discussed for dedicated image and video processing circuits and for programmable multimedia processors. Current design examples of dedicated and programmable architectures are reviewed, and the techniques employed to improve the performance for multimedia processing are therein identified. Future trends in multimedia processing systems are anticipated with respect to current developments in emerging image and video multimedia applications. RAR 287 кбайт |
|
Ching C. Lin and Chung J. Kuo | Two-Dimensional Rank-Order Filter by Using Max–Min Sorting Network |
RAR 164 кбайт |
|
? | Special Issue on Representation and Coding of Images and Video I |
RAR 131 кбайт |
|
Yong Rui, Thomas S. Huang, Fellow, Michael Ortega, and Sharad Mehrotra | Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval |
Abstract—Content-based image retrieval (CBIR) has become one of the most active research areas in the past few years. Many visual feature representations have been explored and many systems built. While these research efforts establish the basis of CBIR, the usefulness of the proposed approaches is limited. Specifically, these efforts have relatively ignored two distinct characteristics of CBIR systems: 1) the gap between high-level concepts and low-level features, and 2) subjectivity of human perception of visual content. This paper proposes a relevance feedback based interactive retrieval approach, which effectively takes into account the above two characteristics in CBIR. During the retrieval process, the user’s high-level query and perception subjectivity are captured by dynamically updated weights based on the user’s feedback. The experimental results over more than 70 000 images show that the proposed approach greatly reduces the user’s effort of composing a query, and captures the user’s information need more precisely. RAR 388 кбайт |
|
Eddy De Greef, Francky Catthoor, and Hugo De Man, Fellow | Program Transformation Strategies for Memory Size and Power Reduction of Pseudoregular Multimedia Subsystems |
Abstract—In this paper, a program transformation strategy is presented that is able to reduce the buffer size and power consumption for a relatively large class of (pseudo)regular datadominated signal processing algorithms. Our methodology is targeted toward an implementation on programmable processors, but most of the principles remain valid for a custom processor implementation. As power and area cost are crucial in the context of embedded multimedia applications, this strategy can be very valuable. The feasibility of our approach is demonstrated on a representative high-speed video processing algorithm for which we obtain a substantial reduction of the area and power consumption compared to the classical approaches. RAR 328 кбайт |
|
Gong-San Yu, Max M.-K. Liu, and Michael W. Marcellin, | POCS-Based Error Concealment for Packet Video Using Multiframe Overlap Information |
Abstract—This paper proposes a new error concealment algorithm for packet video, effectively eliminating error propagation effects. Most standard video CODEC’s use motion compensation to remove temporal redundancy. With such motion-compensated interframe processing, any packet loss may generate serious error propagation over more than ten consecutive frames. This kind of error propagation leads to perceptually annoying artifacts. Thus, proper error concealment algorithms need to be used to reduce this effect. The proposed algorithm adopts a one-pixel block overlap structure. With the redundancy information from the damaged frame and the following frames, the proposed POCS-based method can have consistently high performance in error concealment. According to experimental results, the proposed algorithm can successfully eliminate visible error propagation. In addition, the proposed algorithm is very robust. Experimental results show that it can have good error concealment results, even when the damaged frame loses all DCT coefficients. RAR 384 кбайт |
|
K. Masselos, P. Merakos, T. Stouraitis, and C. E. Goutis | A Novel Algorithm for Low-Power Image and Video Coding |
Abstract— A novel scheme for low-power image and video coding and decoding is presented. It is based on vector quantization, and reduces its memory requirements, which form a major disadvantage in terms of power consumption. The main innovation is the use of small codebooks, and the application of simple but efficient transformations to the codewords during coding to compensate for the quality degradation introduced by the small codebook size. In this way, the small codebooks are computationally extended, and the coding task becomes computation based rather than memory based, leading to significant power consumption reduction. The parameters of the transformations depend on the image block under coding, and thus the small codebooks are dynamically adapted each time to this specific image block, leading to image qualities comparable to or better than those corresponding to classical vector quantization. The algorithm leads to power savings of a factor of 10 in coding and of a factor of 3 in decoding at least, in comparison to classical full-search vector quantization. Both image quality and power consumption highly depend on the size of the codebook that is used. RAR 120 кбайт |
|
M. Bi, S. H. Ong, and Y. H. Ang | Integer-Modulated FIR Filter Banks for Image Compression |
Abstract—A method for designing perfect-reconstruction M- channel integer-modulated filter banks (IMFB’s) is proposed. Given the filter coefficient word length, many IMFB’s satisfying the perfect reconstruction conditions can be obtained. Methods to select the best IMFB for the purpose of image compression are studied, and it is demonstrated that the IMFB obtained by maximizing the coding gain is the most suitable for image coding. Simulation results show that the PSNR of the integer-modulated FIR filter bank designed by maximizing the coding gain with the AR(1) model is close to that of the real-valued cosine-modulated filter bank of the same order. RAR 172 кбайт |
|
Kazuto Kamikura, Hiroshi Watanabe, Hirohisa Jozawa, Hiroshi Kotera, and Susumu Ichinose | Global Brightness-Variation Compensation for Video Coding |
Abstract—In this paper, a global brightness-variation compensation (GBC) scheme is proposed to improve the coding efficiency for video scenes that contain global brightness variations caused by fade in/out, camera-iris adjustment, flicker, illumination change, and so on. In this scheme, a set of two brightness-variation parameters, which represent multiplier and offset components of the brightness variation in the whole frame, is estimated. The brightness variation is then compensated using the parameters. We also propose a method to apply the GBC scheme to component signals for video coding. Furthermore, a block-by-block ON/OFF control method for GBC is introduced to improve coding performance even for scenes including local variations in brightness caused by camera flashes, spotlights, and the like. Simulation results show that the proposed GBC scheme with the ON/OFF control method improves the peak signal-tonoise ratio (PSNR) by 2–4 dB when the coded scenes contain brightness variations. RAR 819 кбайт |
|
Nikos Grammalidis, and Michael G. Strintzis, | Disparity and Occlusion Estimation in Multiocular Systems and Their Coding for the Communication of Multiview Image Sequences |
Abstract—An efficient disparity estimation and occlusion detection algorithm for multiocular systems is presented. A dynamic programming algorithm, using a multiview matching cost as well as pure geometrical constraints, is used to estimate disparity and to identify the occluded areas in the extreme left and right views. A significant advantage of the proposed approach is that the exact number of views in which each point appears (is not occluded) can be determined. The disparity and occlusion information obtained may then be used to create virtual images from intermediate viewpoints. Furthermore, techniques are developed for the coding of occlusion and disparity information, which is needed at the receiver for the reproduction of a multiview sequence using the two encoded extreme views. Experimental results illustrate the performance of the proposed techniques. RAR 599 кбайт |
|
Dyi-Long Yang and Chin-Hsing Chen | Data Dependence Analysis and Bit-Level Systolic Arrays of the Median Filter |
Abstract—The data dependence of the delete-and-insert sort algorithm for median filtering is analyzed in this paper. It is shown that because of data dependence, the fastest throughput rate and the most efficient pipeline scheme cannot be used concurrently. A modified delete-and-insert sort algorithm avoiding the above dilemma and its bit-level systolic array implementation are proposed in this paper. The throughput rate of the proposed architecture is equal to one-half (output/clocks) the maximum throughput allowed by the delete-and-insert sort algorithm, and the clock cycle time is equal to the propagation delay of a simple combinational circuit. Its speed is about 1.5 times faster than the existing bit-level systolic array designed by using the same delete-and-insert sort algorithm. The proposed architecture can be designed to operate at different word lengths and different window sizes. It is modular, regular, and of local interconnections and therefore amenable for VLSI implementation. RAR 306 кбайт |
|
Xia Wan and C.-C. Jay Kuo, | A New Approach to Image Retrieval with Hierarchical Color Clustering |
Abstract—After performing a thorough comparison of different quantization schemes in the RGB;HSV; Y UV; and CIEL.u.v. color spaces, we propose to use color features obtained by hierarchical color clustering based on a pruned octree data structure to achieve efficient and robust image retrieval. With the proposed method, multiple color features, including the dominant color, the number of distinctive colors, and the color histogram, can be naturally integrated into one framework. A selective filtering strategy is also described to speed up the retrieval process. Retrieval examples are given to illustrate the performance of the proposed approach. RAR 460 кбайт |
|
Ioannis Pitas | A Method for Watermark Casting on Digital Images |
Abstract—Watermark casting on digital images is an important problem since it affects many aspects of the information market. We propose a method for casting digital watermarks on images, and we analyze its effectiveness. The satisfaction of some basic demands in this area is examined, and a method for producing digital watermarks is proposed. Moreover, issues like immunity to subsampling and image-dependent watermarks are examined, and simulation results are provided for the verification of the above-mentioned topics. RAR 175 кбайт |
|
K. Aizawa, H. Ohno, Y. Egi, T. Hamamoto, M. Hatori, H. Maruyama, and J. Yamazaki | On Sensor Image Compression |
Abstract— In this paper, we propose a novel image sensor which compresses image signals on the sensor plane. Since an image signal is compressed on the sensor plane by making use of the parallel nature of image signals, the amount of signal read out from the sensor can be significantly reduced. Thus, the potential applications of the proposed sensor are high pixel rate cameras and processing systems which require very high speed imaging or very high resolution imaging. The very high bandwidth is the fundamental limitation to the feasibility of those high pixel rate sensors and processing systems. Conditional replenishment is employed for the compression algorithm. In each pixel, current pixel value is compared to that in the last replenished frame. The value and the address of the pixel are extracted and coded if the magnitude of the difference is greater than a threshold. Analog circuits have been designed for processing in each pixel. A first prototype of a VLSI chip has been fabricated. Some results of experiments obtained by using the first prototype are shown in this paper. RAR 172 кбайт |
|
G. Calvagno, C. Ghirardi, G. A. Mian, and R. Rinaldo | Modeling of Subband Image Data for Buffer Control |
Abstract—In this work we develop an adaptive scheme for quantization of subband or transform coded frames in a typical video sequence coder. Using a generalized Gaussian model for the subband or transform coefficients, we present a procedure to determine the optimum deadzone quantizer for a given entropy of the quantizer output symbols. We find that, at low bit rates, the dead-zone quantizer offers better performance than the uniform quantizer. The model is used to develop an adaptive procedure to update the quantizer parameters on the basis of the state of a channel buffer with constant output rate and variable input rate. We compare the accuracy of the generalized Gaussian model in predicting the actual bit rate to that achievable using the simpler and more common Laplacian model. Experimental results show that the generalized Gaussian model has superior performance than the Laplacian model, and that it can be effectively used in a practical scheme for buffer control. RAR 219 кбайт |
|
Franёcois Michaud, Chon Tam Le Dinh, and Gґerard Lachiver | JFuzzy Detection of Edge-Direction for Video Line Doubling |
Abstract—Video line doubling can be realized using time-space interpolation filters. To improve their performances for moving diagonal lines, we have developed a fuzzy edge-direction detector. This fuzzy detector works by identifying small pixel variations in five orientations (0., . 45., and . 60.) and by using rules to infer the prevailing direction. This direction is then used to spatially rotate the interpolation filter. For the fuzzy detector, three fuzzy sets are used to characterize the inputs, and two rule bases have been validated. This article presents the characteristics of the fuzzy edge-direction detector along with the methodology used for fuzzy sets positioning. Detection and interpolation results are also presented. RAR 145 кбайт |
|
J. Scharcanski and A. N. Venetsanopoulos | Edge Detection of Color Images Using Directional Operators |
Abstract—This paper discusses an approach for detecting edges in color images. A color image is represented by a vector field, and the color image edges are detected as differences in the local vector statistics. These statistical differences can include local variations in color or spatial image properties. The proposed approach can easily accommodate concepts, such as multiscale edge detection, as well as the latest developments in vector order statistics for color image processing. A distinction between the proposed approach and previous approaches for color edge detection using vector order statistics is that, besides the edge magnitude, the local edge direction is also provided. Note that edge direction information is a relevant feature to a variety of image analysis tasks (e.g., texture analysis). RAR 385 кбайт |
|
Tsuhan Chen, Cassandra T. Swain, and Barry G. Haskell | Coding of Subregions for Content-Based Scalable Video |
Abstract—We propose a scheme for coding subregions in video scenes to provide content-based scalable video. For each region, a special color is used to represent the nonobject area, and the resulting frames are coded using conventional video coding algorithms. At the decoder, the region shape is recovered based on chroma keying, and hence, content-based manipulations are made possible. A number of techniques that eliminate boundary artifacts common to region-based coding are presented. In this scheme, no explicit shape coding is needed, and advantages of existing coding algorithms are retained. This scheme was submitted to ISO MPEG-4 and performed very well in the subjective tests. RAR 472 кбайт |
|
Сайт о сжатии >> Статьи и исходники >>
Материалы по видео
Смотрите также материалы:
- По цветовым пространствам
- По JPEG
- По JPEG-2000
наверх
Подготовили Сергей Гришин и Дмитрий Ватолин