Programming Massively Parallel Processors: A Hands on Approach, 3/Ed > 분산시스템

도서간략정보

Programming Massively Parallel Processors: A Hands on Approach, 3/Ed

판매가격 79,000원

저자	Kirk
도서종류	외국도서
출판사	Morgan Kaufmann
발행언어	영어
발행일	2016-12
페이지수	576
ISBN	9780128119860
도서구매안내	온, 오프라인 서점에서 구매 하실 수 있습니다.

위시리스트

추천하기

구매기능

이전도서 블록체인 네트워크와 분산앱 입문 다음 도서 Cloud Computing for Machine Learning and Cognitive Applications (8월 수입예정)

도서 정보

도서 상세설명

Table of Contents
Dedication
Preface
Target Audience
How to Use the Book
Illinois–NVIDIA GPU Teaching Kit
Online Supplements
Acknowledgements
Chapter 1. Introduction
Abstract
1.1 Heterogeneous Parallel Computing
1.2 Architecture of a Modern GPU
1.3 Why More Speed or Parallelism?
1.4 Speeding Up Real Applications
1.5 Challenges in Parallel Programming
1.6 Parallel Programming Languages and Models
1.7 Overarching Goals
1.8 Organization of the Book
References
Chapter 2. Data parallel computing
Abstract
2.1 Data Parallelism
2.2 CUDA C Program Structure
2.3 A Vector Addition Kernel
2.4 Device Global Memory and Data Transfer
2.5 Kernel Functions and Threading
2.6 Kernel Launch
2.7 Summary
References
Chapter 3. Scalable parallel execution
Abstract
3.1 CUDA Thread Organization
3.2 Mapping Threads to Multidimensional Data
3.3 Image Blur: A More Complex Kernel
3.4 Synchronization and Transparent Scalability
3.5 Resource Assignment
3.6 Querying Device Properties
3.7 Thread Scheduling and Latency Tolerance
3.8 Summary
Chapter 4. Memory and data locality
Abstract
4.1 Importance of Memory Access Efficiency
4.2 Matrix Multiplication
4.3 CUDA Memory Types
4.4 Tiling for Reduced Memory Traffic
4.5 A Tiled Matrix Multiplication Kernel
4.6 Boundary Checks
4.7 Memory as a Limiting Factor to Parallelism
4.8 Summary
Chapter 5. Performance considerations
Abstract
5.1 Global Memory Bandwidth
5.2 More on Memory Parallelism
5.3 Warps and SIMD Hardware
5.4 Dynamic Partitioning of Resources
5.5 Thread Granularity
5.6 Summary
References
Chapter 6. Numerical considerations
Abstract
6.1 Floating-Point Data Representation
6.2 Representable Numbers
6.3 Special Bit Patterns and Precision in IEEE Format
6.4 Arithmetic Accuracy and Rounding
6.5 Algorithm Considerations
6.6 Linear Solvers and Numerical Stability
6.7 Summary
References
Chapter 7. Parallel patterns: convolution: An introduction to stencil computation
Abstract
7.1 Background
7.2 1D Parallel Convolution—A Basic Algorithm
7.3 Constant Memory and Caching
7.4 Tiled 1D Convolution with Halo Cells
7.5 A Simpler Tiled 1D Convolution—General Caching
7.6 Tiled 2D Convolution With Halo Cells
7.7 Summary
7.8 Exercises
Chapter 8. Parallel patterns: prefix sum: An introduction to work efficiency in parallel algorithms
Abstract
8.1 Background
8.2 A Simple Parallel Scan
8.3 Speed and Work Efficiency
8.4 A More Work-Efficient Parallel Scan
8.5 An Even More Work-Efficient Parallel Scan
8.6 Hierarchical Parallel Scan for Arbitrary-Length Inputs
8.7 Single-Pass Scan for Memory Access Efficiency
8.8 Summary
8.9 Exercises
References
Chapter 9. Parallel patterns—parallel histogram computation: An introduction to atomic operations and privatization
Abstract
9.1 Background
9.2 Use of Atomic Operations
9.3 Block versus Interleaved Partitioning
9.4 Latency versus Throughput of Atomic Operations
9.5 Atomic Operation in Cache Memory
9.6 Privatization
9.7 Aggregation
9.8 Summary
Reference
Chapter 10. Parallel patterns: sparse matrix computation: An introduction to data compression and regularization
Abstract
10.1 Background
10.2 Parallel SpMV Using CSR
10.3 Padding and Transposition
10.4 Using a Hybrid Approach to Regulate Padding
10.5 Sorting and Partitioning for Regularization
10.6 Summary
References
Chapter 11. Parallel patterns: merge sort: An introduction to tiling with dynamic input data identification
Abstract
11.1 Background
11.2 A Sequential Merge Algorithm
11.3 A Parallelization Approach
11.4 Co-Rank Function Implementation
11.5 A Basic Parallel Merge Kernel
11.6 A Tiled Merge Kernel
11.7 A Circular-Buffer Merge Kernel
11.8 Summary
Reference
Chapter 12. Parallel patterns: graph search
Abstract
12.1 Background
12.2 Breadth-First Search
12.3 A Sequential BFS Function
12.4 A Parallel BFS Function
12.5 Optimizations
12.6 Summary
References
Chapter 13. CUDA dynamic parallelism
Abstract
13.1 Background
13.2 Dynamic Parallelism Overview
13.3 A Simple Example
13.4 Memory Data Visibility
13.5 Configurations and Memory Management
13.6 Synchronization, Streams, and Events
13.7 A More Complex Example
13.8 A Recursive Example
13.9 Summary
References
A13.1 Code Appendix
Chapter 14. Application case study—non-Cartesian magnetic resonance imaging: An introduction to statistical estimation methods
Abstract
14.1 Background
14.2 Iterative Reconstruction
14.3 Computing FHD
14.4 Final Evaluation
References
Chapter 15. Application case study—molecular visualization and analysis
Abstract
15.1 Background
15.2 A Simple Kernel Implementation
15.3 Thread Granularity Adjustment
15.4 Memory Coalescing
15.5 Summary
References
Chapter 16. Application case study—machine learning
Abstract
16.1 Background
16.2 Convolutional Neural Networks
16.3 Convolutional Layer: A Basic CUDA Implementation of Forward Propagation
16.4 Reduction of Convolutional Layer to Matrix Multiplication
16.5 cuDNN Library
References
Chapter 17. Parallel programming and computational thinking
Abstract
17.1 Goals of Parallel Computing
17.2 Problem Decomposition
17.3 Algorithm Selection
17.4 Computational Thinking
17.5 Single Program, Multiple Data, Shared Memory and Locality
17.6 Strategies for Computational Thinking
17.7 A Hypothetical Example: Sodium Map of the Brain
17.8 Summary
References
Chapter 18. Programming a heterogeneous computing cluster
Abstract
18.1 Background
18.2 A Running Example
18.3 Message Passing Interface Basics
18.4 Message Passing Interface Point-to-Point Communication
18.5 Overlapping Computation and Communication
18.6 Message Passing Interface Collective Communication
18.7 CUDA-Aware Message Passing Interface
18.8 Summary
Reference
Chapter 19. Parallel programming with OpenACC
Abstract
19.1 The OpenACC Execution Model
19.2 OpenACC Directive Format
19.3 OpenACC by Example
19.4 Comparing OpenACC and CUDA
19.5 Interoperability with CUDA and Libraries
19.6 The Future of OpenACC
Chapter 20. More on CUDA and graphics processing unit computing
Abstract
20.1 Model of Host/Device Interaction
20.2 Kernel Execution Control
20.3 Memory Bandwidth and Compute Throughput
20.4 Programming Environment
20.5 Future Outlook
References
Chapter 21. Conclusion and outlook
Abstract
21.1 Goals Revisited
21.2 Future Outlook
Appendix A. An introduction to OpenCL
A.1 Background
A.2 Data Parallelism Model
A.3 Device Architecture
A.4 Kernel Functions
A.5 Device Management and Kernel Launch
A.6 Electrostatic Potential Map in OpenCL
A.7 Summary
Appendix B. THRUST: a productivity-oriented library for CUDA
B.1 Background
B.2 Motivation
B.3 Basic Thrust Features
B.4 Generic Programming
B.5 Benefits of Abstraction
B.6 Best Practices
Appendix C. CUDA Fortran
C.1 CUDA Fortran and CUDA C Differences
C.2 A First CUDA Fortran Program
C.3 Multidimensional Array in CUDA Fortran
C.4 Overloading Host/Device Routines with Generic Interfaces
C.5 Calling CUDA C via ISO_C_Binding
C.6 Kernel Loop Directives and Reduction Operations
C.7 Dynamic Shared Memory
C.8 Asynchronous Data Transfers
C.9 Compilation and Profiling
C.10 Calling Thrust from CUDA Fortran
Appendix D. An introduction to C++ AMP
D.1 Core C++ AMP Features
D.2 Details of the C++ AMP Execution Model
D.3 Managing Accelerators
D.4 Tiled Execution
D.5 C++ AMP Graphics Features
D.6 Summary
Reference
Index
사용후기

사용후기가 없습니다.

사용후기 쓰기 새 창 더보기
배송/교환정보

배송정보
배송 안내 입력전입니다.

교환/반품
교환/반품 안내 입력전입니다.

선택하신 도서가 장바구니에 담겼습니다.

계속 둘러보기 장바구니보기

도서분류

Programming Massively Parallel Processors: A Hands on Approach, 3/Ed > 분산시스템

도서간략정보

구매기능

도서 정보

도서 상세설명

사용후기

배송/교환정보

배송정보

교환/반품

문의처

도서출판 홍릉 정보

공지사항