Сжатие видео - Hardware
Английские материалы |
|||
Авторы | Название статьи | Описание | Рейтинг |
Amaury Aubel, Ronan Boulic, and Daniel Thalmann | Real-Time Display of Virtual Humans: Levels of Details and Impostors |
Abstract—Rendering and animating in real time a multitude of articulated characters presents a real challenge, and few hardware systems are up to the task. Up to now, little research has been conducted to tackle the issue of real-time rendering of numerous virtual humans. This paper presents a hardware-independent technique that improves the display rate of animated characters by acting on the sole geometric and rendering information. We first review the acceleration techniques traditionally in use in computer graphics and highlight their suitability to articulated characters. We then showhowimpostors can be used to render virtual humans. We introduce concrete case studies that demonstrate the effectiveness of our approach. Finally, we tackle the visibility issue. RAR 650 кбайт |
|
Daniel F. Zucker,, Ruby B. Lee, and Michael J. Flynn | Hardware and Software Cache Prefetching Techniques for MPEG Benchmarks |
Abstract—With the popularity of multimedia acceleration instructions such as MMX, MPEG decompression is increasingly executed on general purpose processors instead of dedicated MPEG hardware. The gap between processor speed and memory access means that a significant amount of time is spent in the memory system. As processors get faster—both in terms of higher clock speeds and increased instruction level parallelism—the time spent in the memory system becomes even more significant. Data prefetching is a well-known technique for improving cache performance. While several studies have examined prefetch strategies for scientific and commercial applications, this paper focuses on video applications. Data is presented for three types of hardware-prefetching schemes: the stream buffer, the stride prediction table (SPT), and the stream cache, as well as a new software-directed prefetching technique based on emulation of the hardware SPT. Up to 90% of the misses that would otherwise occur with no prefetching are eliminated. The stream cache can cut execution time by more than half with the addition of a relatively small amount of additional hardware. Software prefetching achieves nearly equal performance with minimal additional hardware. Techniques presented in this paper can be used to improve performance in a general-purpose CPU or an embedded MPEG processor. Performance gains achieved for MPEG benchmarks apply equally effectively to similar multimedia applications. RAR 285 кбайт |
|
Sunho Chang, Bum-Sik Kim, and Lee-Sup Kim | A Programmable 3.2-GOPS Merged DRAM Logic for Video Signal Processing |
Abstract—This paper proposes a programmable high-performance architecture of datapath in the merged DRAM logic (MDL) for video signal processing. A model of a datapath in the programmable MDL is generated, and two basic parameters, total required clock cycles (TRCC) and DRAM access rate (DAR), are defined by analysis of the model. Design guidelines are suggested for the optimized video signal processor based on the modeling and analysis of the MDL. The inverse discrete cosine transform (IDCT) and motion compensation (MC) of the video signal processing are analyzed in the MDL architecture. Two measures, TRCC and DAR, are determined such that the data bandwidth between DRAM and logic is not a bottleneck in the MDL architecture. The efficient datapath is designed based on these design guidelines. The datapath has processing units (ALU, MAC, and Barrel Shifter) with splittabilities of data and multi-port SRAM. The maximum performance of the proposed datapath with 200-MHz clock frequency is 3.2 GOPS for 8-bit video signals, which can deal with decoding high-level (1920 1080) in MPEG. The proposed MDL architecture has 2.1–4.8 times higher performance compared with conventional dedicated hardware chips. It can also be used for other multimedia signal processing due to its programmability. RAR 173 кбайт |
|
Ayman Elnaggar and Hussein M. Alnuweiri | A New Multidimensional Recursive Architecture for Computing The Discrete Cosine Transform |
Abstract—This paper presents a novel recursive algorithm for generating higher order multidimensional ( -D) discrete cosing transform (DCT) by combining the computation of 2 identical lower order (smaller size) DCT architectures. One immediate outcome of our results is the true “scalability” of the DCT computation. Basically, an -D DCT computation can be constructed from exactly one stage of smaller DCT computations of the same dimension. This is useful for both hardware and software solutions, in which a very efficient smaller size -D DCT core has been developed, and a larger DCT computation is required. The resulting DCT networks have very simple modular structure, highly regular topology, and use simple arithmetic units. RAR 139 кбайт |
|
Franёcois Charot, Gwendal Le Fol, Pascal Lemonnier, Charles Wagner, Ronan Barzic, and Christian Bouville | Toward Hardware Building Blocks for Software-Only Real-Time Video Processing: The MOVIE Approach |
Abstract—The goal of the movie very large-scale integration chip is to facilitate the development of software-only solutions for real-time video processing applications. This chip can be seen as a building block for single-instruction, multiple-data processing, and its architecture has been designed so as to facilitate high-level language programming. The basic architecture building block associates a subarray of computation processors with an I/O processor. A module can be seen as a small linear, systolic-like array of processing elements, connected at each end to the I/O processor. The module can communicate with its two nearest neighbors via two communication ports. The chip architecture also includes three 16-bit video ports. One important aspect in the programming environment is the C-stolic programming language. C-stolic is a C-like language augmented with parallel constructs, which allow the differentiation between the array controller variables (scalar variables) and the local variables in the array structure (systolic variables). A statement operating on systolic variables implies a simultaneous execution on all the cells of the structure. Implementation examples of movie-based architectures dealing with video compression algorithms are given. RAR 292 кбайт |
|
Santanu Dutta, Vijay Mehra, Weiwen (Vivian) Zhu, Deepak Singh, Marcel Janssens, Ramakrishna Vengalasetti, Boaz Ben-Nun, Pardha Pothana, Venkat Adusumilli, Nahid (Mansuripur) King, John Yen-Han Huang, Lie (Laura) Ling, Chris Nelson, Jai Bannur, and Sarah Wu | Architecture and Design of a Talisman-Compatible Multimedia Processor |
Abstract—This paper describes the architecture, functionality, and design of a Talisman-compatible multimedia processor (TMPC) from Philips Semiconductors. “Talisman” [1]–[3] is the code name of a new graphics and multimedia hardware architecture (from Microsoft Corp.) that aims at achieving the performance of high-end three-dimensional graphics workstations at consumer price points. TM-PC is a programmable processor with a highperformance, very long instruction word central processing unit (CPU) core. The CPU core, aided by an array of peripheral devices (multimedia coprocessors and input–output units), facilitates concurrent processing of audio, video, graphics, and communication data. Designed specifically for the Microsoft Talisman project, TM-PC is a derivative of Philips’ TM-1 [4]–[7] media processor and is tailored to be used in a variety of PCbased functions as a plug-in board on the peripheral component interconnect (PCI) bus. In the design of TM-PC, the functionality of most of the blocks from TM-1 has been kept unchanged; the primary changes in the existing blocks have been in the main memory and the PCI interfaces, and a new block, called VPB, has been added to support virtual frame buffer functionality as well as video graphics adapter and Soundblaster emulation capability. The major emphasis of this paper is on the design details of the new VPB module and an explanation of how it fits with the rest of the TM-1 design. RAR 711 кбайт |
|
An-Yeu Wu, K. J. Ray Liu, and Arun Raghupathy | System Architecture of an Adaptive Reconfigurable DSP Computing Engine |
Abstract— Modern digital signal processing (DSP) applications call for computationally intensive data processing at very high data rates. In order to meet the high-performance/lowcost constraints, the state-of-the-art video processor should be a programmable design which performs various tasks in video applications without sacrificing the computational power and the manufacturing cost in exchange for such flexibility. Currently, general-purpose programmable DSP processor and applicationspecific integrated circuit (ASIC) design are the two major approaches for data processing in practical implementations. In order to meet the high-speed/low-cost constraint, it is desirable to have a programmable design that has the flexibility of the general-purpose DSP processor while the computational power is similar to ASIC designs. In this paper, we present the system architecture of an adaptive reconfigurable DSP computing engine for numerically intensive front-end audio/video communications. The proposed system is a massively parallel architecture that is capable of performing most low-level computationally intensive data processing including finite impulse response/infinite impulse response (FIR/IIR) filtering, subband filtering, discrete orthogonal transforms (DT), adaptive filtering, and motion estimation for the host processor in DSP applications. Since the properties of each programmed function such as parallelism and pipelinability have been fully exploited in this design, the computational speed of this computing engine can be as fast as ASIC designs that are optimized for individual specific applications. We also show that the system can be easily configured to perform multirate FIR/IIR/DT operations at negligible hardware overhead. Since the processing elements are operated at half of the input data rate, we are able to double the processing speed on-the-fly based on the same system architecture without using high-speed/fullcustom circuits. The programmable/high-speed features of the proposed design make it very suitable for cost-effective video-rate DSP applications. RAR 771 кбайт |
|
Santanu Dutta, Kevin J. O’Connor, Wayne Wolf, Fellow, and Andrew Wolfe | A Design Study of a 0.25- m Video Signal Processor |
Abstract—This paper presents a detailed design study of a highspeed, single-chip architecture for video signal processing (VSP), developed as part of the Princeton VSP Project. In order to define the architectural parameters by examining the area and delay tradeoffs, we start by designing parameterizable versions of key modules, and we perform VLSI modeling experiments in a 0.25- .m process. Based on the properties of these modules, we propose a VLIW (very long instruction word) VSP architecture that features 32–64 operations per cycle at clock rates well in excess of 600 MHz, and that includes a significant amount of on-chip memory. VLIW architectures provide predictable, efficient, high performance, and benefit from mature compiler technology. As explained later, a VLIW video processor design requires flexible, high-bandwidth interconnect at fast cycle times, and presents some unique VLSI tradeoffs and challenges in maintaining high clock rates while providing high parallelism and utilization. RAR 517 кбайт |
|
N. Ranganathan, N. Vijaykrishnan, and N. Bhavanishankar | A Linear Array Processor with Dynamic Frequency Clocking for Image Processing Applications |
Abstract—The need for high-performance image processing systems has led to the design and development of several application-specific parallel processing systems. In this paper, a SIMD linear array processor with dynamic frequency clocking is proposed for real-time image processing applications. The architecture uses a novel concept called dynamic frequency clocking which allows the processor to vary the clock frequency dynamically based on the operation being performed. A VLSI chip based on the proposed architecture has been designed and verified using the Cadence design tools. The chip will operate between 400 and 50 MHz based on the operation being performed. Several low-level image processing tasks have been mapped onto the architecture to evaluate the system performance and to demonstrate the effectiveness of the dynamic frequency clocking scheme. RAR 220 кбайт |
|
Santanu Dutta, Wayne Wolf, and Andrew Wolfe | A Methodology to Evaluate Memory Architecture Design Tradeoffs for Video Signal Processors |
Abstract— This paper develops a methodology for the design of the memory and the memory-processor communication network in video signal processors. The memory subsystem is the bottleneck of most video computing systems and its design requires evaluating tradeoffs between area, cycle time, and utilization. We emphasize the need to consider technological and circuit-level issues during the design of a system architecture, particularly video signal processing (VSP) systems, and present a systematic method whereby the organization of the memory architecture—the granularity of memory partitioning and the size and type of interconnection network—can be analyzed and its cycle-time approximated before a detailed design is undertaken. We show how variations in sizes and circuit configurations help determine the variations in delay of both memory and network, and how the delay curves, thus determined, can be used to design, compare, and choose from different memorysystem architectures; we also describe a technique that can be used to identify the on-chip-off-chip boundary with respect to a hierarchical memory-system design for a memory-intensive VSP module. All of our results are validated via layout and simulation of prototype circuits in two different process technologies. Motion estimation and discrete cosine transform (DCT) being two of the most important tasks in video processing, we use the design of a motion estimator and that of a DCT unit as examples to illustrate the high-level issues in designing the memory architecture for a VSP module. The analysis presented for the motion estimator and the DCT unit can also be applied to other processing blocks belonging to the system. RAR 514 кбайт |
|
Сайт о сжатии >> Статьи и исходники >>
Материалы по видео
Смотрите также материалы:
- По цветовым пространствам
- По JPEG
- По JPEG-2000
наверх
Подготовили Сергей Гришин и Дмитрий Ватолин