본문 바로가기
장바구니0

Programming Massively Parallel Processors: A Hands on Approach, 3/Ed > 분산시스템

도서간략정보

Programming Massively Parallel Processors: A Hands on Approach, 3/Ed
판매가격 79,000원
저자 Kirk
도서종류 외국도서
출판사 Morgan Kaufmann
발행언어 영어
발행일 2016-12
페이지수 576
ISBN 9780128119860
도서구매안내 온, 온프라인 서점에서 구매 하실 수 있습니다.

구매기능

보조자료 다운
  • 도서 정보

    도서 상세설명

    Table of Contents
    Dedication
    Preface
    Target Audience
    How to Use the Book
    Illinois–NVIDIA GPU Teaching Kit
    Online Supplements
    Acknowledgements
    Chapter 1. Introduction
    Abstract
    1.1 Heterogeneous Parallel Computing
    1.2 Architecture of a Modern GPU
    1.3 Why More Speed or Parallelism?
    1.4 Speeding Up Real Applications
    1.5 Challenges in Parallel Programming
    1.6 Parallel Programming Languages and Models
    1.7 Overarching Goals
    1.8 Organization of the Book
    References
    Chapter 2. Data parallel computing
    Abstract
    2.1 Data Parallelism
    2.2 CUDA C Program Structure
    2.3 A Vector Addition Kernel
    2.4 Device Global Memory and Data Transfer
    2.5 Kernel Functions and Threading
    2.6 Kernel Launch
    2.7 Summary
    References
    Chapter 3. Scalable parallel execution
    Abstract
    3.1 CUDA Thread Organization
    3.2 Mapping Threads to Multidimensional Data
    3.3 Image Blur: A More Complex Kernel
    3.4 Synchronization and Transparent Scalability
    3.5 Resource Assignment
    3.6 Querying Device Properties
    3.7 Thread Scheduling and Latency Tolerance
    3.8 Summary
    Chapter 4. Memory and data locality
    Abstract
    4.1 Importance of Memory Access Efficiency
    4.2 Matrix Multiplication
    4.3 CUDA Memory Types
    4.4 Tiling for Reduced Memory Traffic
    4.5 A Tiled Matrix Multiplication Kernel
    4.6 Boundary Checks
    4.7 Memory as a Limiting Factor to Parallelism
    4.8 Summary
    Chapter 5. Performance considerations
    Abstract
    5.1 Global Memory Bandwidth
    5.2 More on Memory Parallelism
    5.3 Warps and SIMD Hardware
    5.4 Dynamic Partitioning of Resources
    5.5 Thread Granularity
    5.6 Summary
    References
    Chapter 6. Numerical considerations
    Abstract
    6.1 Floating-Point Data Representation
    6.2 Representable Numbers
    6.3 Special Bit Patterns and Precision in IEEE Format
    6.4 Arithmetic Accuracy and Rounding
    6.5 Algorithm Considerations
    6.6 Linear Solvers and Numerical Stability
    6.7 Summary
    References
    Chapter 7. Parallel patterns: convolution: An introduction to stencil computation
    Abstract
    7.1 Background
    7.2 1D Parallel Convolution—A Basic Algorithm
    7.3 Constant Memory and Caching
    7.4 Tiled 1D Convolution with Halo Cells
    7.5 A Simpler Tiled 1D Convolution—General Caching
    7.6 Tiled 2D Convolution With Halo Cells
    7.7 Summary
    7.8 Exercises
    Chapter 8. Parallel patterns: prefix sum: An introduction to work efficiency in parallel algorithms
    Abstract
    8.1 Background
    8.2 A Simple Parallel Scan
    8.3 Speed and Work Efficiency
    8.4 A More Work-Efficient Parallel Scan
    8.5 An Even More Work-Efficient Parallel Scan
    8.6 Hierarchical Parallel Scan for Arbitrary-Length Inputs
    8.7 Single-Pass Scan for Memory Access Efficiency
    8.8 Summary
    8.9 Exercises
    References
    Chapter 9. Parallel patterns—parallel histogram computation: An introduction to atomic operations and privatization
    Abstract
    9.1 Background
    9.2 Use of Atomic Operations
    9.3 Block versus Interleaved Partitioning
    9.4 Latency versus Throughput of Atomic Operations
    9.5 Atomic Operation in Cache Memory
    9.6 Privatization
    9.7 Aggregation
    9.8 Summary
    Reference
    Chapter 10. Parallel patterns: sparse matrix computation: An introduction to data compression and regularization
    Abstract
    10.1 Background
    10.2 Parallel SpMV Using CSR
    10.3 Padding and Transposition
    10.4 Using a Hybrid Approach to Regulate Padding
    10.5 Sorting and Partitioning for Regularization
    10.6 Summary
    References
    Chapter 11. Parallel patterns: merge sort: An introduction to tiling with dynamic input data identification
    Abstract
    11.1 Background
    11.2 A Sequential Merge Algorithm
    11.3 A Parallelization Approach
    11.4 Co-Rank Function Implementation
    11.5 A Basic Parallel Merge Kernel
    11.6 A Tiled Merge Kernel
    11.7 A Circular-Buffer Merge Kernel
    11.8 Summary
    Reference
    Chapter 12. Parallel patterns: graph search
    Abstract
    12.1 Background
    12.2 Breadth-First Search
    12.3 A Sequential BFS Function
    12.4 A Parallel BFS Function
    12.5 Optimizations
    12.6 Summary
    References
    Chapter 13. CUDA dynamic parallelism
    Abstract
    13.1 Background
    13.2 Dynamic Parallelism Overview
    13.3 A Simple Example
    13.4 Memory Data Visibility
    13.5 Configurations and Memory Management
    13.6 Synchronization, Streams, and Events
    13.7 A More Complex Example
    13.8 A Recursive Example
    13.9 Summary
    References
    A13.1 Code Appendix
    Chapter 14. Application case study—non-Cartesian magnetic resonance imaging: An introduction to statistical estimation methods
    Abstract
    14.1 Background
    14.2 Iterative Reconstruction
    14.3 Computing FHD
    14.4 Final Evaluation
    References
    Chapter 15. Application case study—molecular visualization and analysis
    Abstract
    15.1 Background
    15.2 A Simple Kernel Implementation
    15.3 Thread Granularity Adjustment
    15.4 Memory Coalescing
    15.5 Summary
    References
    Chapter 16. Application case study—machine learning
    Abstract
    16.1 Background
    16.2 Convolutional Neural Networks
    16.3 Convolutional Layer: A Basic CUDA Implementation of Forward Propagation
    16.4 Reduction of Convolutional Layer to Matrix Multiplication
    16.5 cuDNN Library
    References
    Chapter 17. Parallel programming and computational thinking
    Abstract
    17.1 Goals of Parallel Computing
    17.2 Problem Decomposition
    17.3 Algorithm Selection
    17.4 Computational Thinking
    17.5 Single Program, Multiple Data, Shared Memory and Locality
    17.6 Strategies for Computational Thinking
    17.7 A Hypothetical Example: Sodium Map of the Brain
    17.8 Summary
    References
    Chapter 18. Programming a heterogeneous computing cluster
    Abstract
    18.1 Background
    18.2 A Running Example
    18.3 Message Passing Interface Basics
    18.4 Message Passing Interface Point-to-Point Communication
    18.5 Overlapping Computation and Communication
    18.6 Message Passing Interface Collective Communication
    18.7 CUDA-Aware Message Passing Interface
    18.8 Summary
    Reference
    Chapter 19. Parallel programming with OpenACC
    Abstract
    19.1 The OpenACC Execution Model
    19.2 OpenACC Directive Format
    19.3 OpenACC by Example
    19.4 Comparing OpenACC and CUDA
    19.5 Interoperability with CUDA and Libraries
    19.6 The Future of OpenACC
    Chapter 20. More on CUDA and graphics processing unit computing
    Abstract
    20.1 Model of Host/Device Interaction
    20.2 Kernel Execution Control
    20.3 Memory Bandwidth and Compute Throughput
    20.4 Programming Environment
    20.5 Future Outlook
    References
    Chapter 21. Conclusion and outlook
    Abstract
    21.1 Goals Revisited
    21.2 Future Outlook
    Appendix A. An introduction to OpenCL
    A.1 Background
    A.2 Data Parallelism Model
    A.3 Device Architecture
    A.4 Kernel Functions
    A.5 Device Management and Kernel Launch
    A.6 Electrostatic Potential Map in OpenCL
    A.7 Summary
    Appendix B. THRUST: a productivity-oriented library for CUDA
    B.1 Background
    B.2 Motivation
    B.3 Basic Thrust Features
    B.4 Generic Programming
    B.5 Benefits of Abstraction
    B.6 Best Practices
    Appendix C. CUDA Fortran
    C.1 CUDA Fortran and CUDA C Differences
    C.2 A First CUDA Fortran Program
    C.3 Multidimensional Array in CUDA Fortran
    C.4 Overloading Host/Device Routines with Generic Interfaces
    C.5 Calling CUDA C via ISO_C_Binding
    C.6 Kernel Loop Directives and Reduction Operations
    C.7 Dynamic Shared Memory
    C.8 Asynchronous Data Transfers
    C.9 Compilation and Profiling
    C.10 Calling Thrust from CUDA Fortran
    Appendix D. An introduction to C++ AMP
    D.1 Core C++ AMP Features
    D.2 Details of the C++ AMP Execution Model
    D.3 Managing Accelerators
    D.4 Tiled Execution
    D.5 C++ AMP Graphics Features
    D.6 Summary
    Reference
    Index
  • 사용후기

    사용후기가 없습니다.

  • 배송/교환정보

    배송정보

    배송 안내 입력전입니다.

    교환/반품

    교환/반품 안내 입력전입니다.

선택하신 도서가 장바구니에 담겼습니다.

계속 둘러보기 장바구니보기
회사소개 개인정보 이용약관
Copyright © 2001-2019 도서출판 홍릉. All Rights Reserved.
상단으로