본문 바로가기
장바구니0

Hadoop in Practice > 데이터마이닝

도서간략정보

Hadoop in Practice
판매가격 49,000원
저자 Homes
도서종류 외국도서
출판사 Manning Publications
발행언어 영어
발행일 2012-10
페이지수 536
ISBN 9781617290237
도서구매안내 온, 온프라인 서점에서 구매 하실 수 있습니다.

구매기능

보조자료 다운
  • 도서 정보

    도서 상세설명

    Part 1 Background and Fundamentals

    1 Hadoop in a heartbeat
    1.1 What is Hadoop?
    1.2 Running Hadoop
    1.3 Chapter summary
    Part 2 Data Logistics

    2 Moving data in and out of Hadoop
    2.1 Key elements of ingress and egress
    2.2 Moving data into Hadoop
    Technique 1 Pushing system log messages into HDFS with Flume
    Technique 2 An automated mechanism to copy files into HDFS
    Technique 3 Scheduling regular ingress activities with Oozie
    Technique 4 Database ingress with MapReduce
    Technique 5 Using Sqoop to import data from MySQL
    Technique 6 HBase ingress into HDFS
    Technique 7 MapReduce with HBase as a data source
    2.3 Moving data out of Hadoop
    Technique 8 Automated file copying from HDFS
    Technique 9 Using Sqoop to export data to MySQL
    Technique 10 HDFS egress to HBase
    Technique 11 Using HBase as a data sink in MapReduce
    2.4 Chapter summary
    3 Data serialization—working with text and beyond
    3.1 Understanding inputs and outputs in MapReduce
    3.2 Processing common serialization formats
    Technique 12 MapReduce and XML
    Technique 13 MapReduce and JSON
    3.3 Big data serialization formats
    Technique 14 Working with SequenceFiles
    Technique 15 Integrating Protocol Buffers with MapReduce
    Technique 16 Working with Thrift
    Technique 17 Next-generation data serialization with MapReduce
    3.4 Custom file formats
    Technique 18 Writing input and output formats for CSV
    3.5 Chapter summary
    Part 3 Big Data Patterns

    4 Applying MapReduce patterns to big data
    4.1 Joining
    Technique 19 Optimized repartition joins
    Technique 20 Implementing a semi-join
    4.2 Sorting
    Technique 21 Implementing a secondary sort
    Technique 22 Sorting keys across multiple reducers
    4.3 Sampling
    Technique 23 Reservoir sampling
    4.5 Chapter summary
    5 Streamlining HDFS for big data
    5.1 Working with small files
    Technique 24 Using Avro to store multiple small files
    5.2 Efficient storage with compression
    Technique 25 Picking the right compression codec for your data
    Technique 26 Compression with HDFS, MapReduce, Pig, and Hive
    Technique 27 Splittable LZOP with MapReduce, Hive, and Pig
    5.3 Chapter summary
    6 Diagnosing and tuning performance problems
    6.1 Measuring MapReduce and your environment
    6.2 Determining the cause of your performance woes
    Technique 28 Investigating spikes in input data
    Technique 29 Identifying map-side data skew problems
    Technique 30 Determining if map tasks have an overall low throughput
    Technique 31 Small files
    Technique 32 Unsplittable files
    Technique 33 Too few or too many reducers
    Technique 34 Identifying reduce-side data skew problems
    Technique 35 Determining if reduce tasks have an overall low throughput
    Technique 36 Slow shuffle and sort
    Technique 37 Competing jobs and scheduler throttling
    Technique 38 Using stack dumps to discover unoptimized user code
    Technique 39 Discovering hardware failures
    Technique 40 CPU contention
    Technique 41 Memory swapping
    Technique 42 Disk health
    Technique 43 Networking
    6.3 Visualization
    Technique 44 Extracting and visualizing task execution times
    6.4 Tuning
    Technique 45 Profiling your map and reduce tasks
    Technique 46 Avoid the reducer
    Technique 47 Filter and project
    Technique 48 Using the combiner
    Technique 49 Blazingly fast sorting with comparators
    Technique 50 Collecting skewed data
    Technique 51 Reduce skew mitigation
    6.5 Chapter summary
    Part 4 Data Science

    7 Utilizing data structures and algorithms
    7.1 Modeling data and solving problems with graphs
    Technique 52 Find the shortest distance between two users
    Technique 53 Calculating FoFs
    Technique 54 Calculate PageRank over a web graph
    7.2 Bloom filters
    Technique 55 Parallelized Bloom filter creation in MapReduce
    Technique 56 MapReduce semi-join with Bloom filters
    7.3 Chapter summary
    8 Integrating R and Hadoop for statistics and more
    8.1 Comparing R and MapReduce integrations
    8.2 R fundamentals
    8.3 R and Streaming
    Technique 57 Calculate the daily mean for stocks
    Technique 58 Calculate the cumulative moving average for stocks
    8.4 Rhipe—Client-side R and Hadoop working together
    Technique 59 Calculating the CMA using Rhipe
    8.5 RHadoop—a simpler integration of client-side R and Hadoop
    Technique 60 Calculating CMA with RHadoop
    8.6 Chapter summary
    9 Predictive analytics with Mahout
    9.1 Using recommenders to make product suggestions
    Technique 61 Item-based recommenders using movie ratings
    9.2 Classification
    Technique 62 Using Mahout to train and test a spam classifier
    9.3 Clustering with K-means
    Technique 63 K-means with a synthetic 2D dataset
    9.4 Chapter summary
    Part 5 Taming the Elephant

    10 Hacking with Hive
    10.1 Hive fundamentals
    10.2 Data analytics with Hive
    Technique 64 Loading log files
    Technique 65 Writing UDFs and compressed partitioned tables
    Technique 66 Tuning Hive joins
    10.3 Chapter summary
    11 Programming pipelines with Pig
    11.1 Pig fundamentals
    11.2 Using Pig to find malicious actors in log data
    Technique 67 Schema-rich Apache log loading
    Technique 68 Reducing your data with filters and projection
    Technique 69 Grouping and counting IP addresses
    Technique 70 IP Geolocation using the distributed cache
    Technique 71 Combining Pig with your scripts
    Technique 72 Combining data in Pig
    Technique 73 Sorting tuples
    Technique 74 Storing data in SequenceFiles
    11.3 Optimizing user workflows with Pig
    Technique 75 A four-step process to working rapidly with big data
    11.4 Performance
    Technique 76 Pig optimizations
    11.5 Chapter summary
    12 Crunch and other technologies
    12.1 What is Crunch?
    12.2 Finding the most popular URLs in your logs
    Technique 77 Crunch log parsing and basic analytics
    12.3 Joins
    Technique 78 Crunch’s repartition join
    12.4 Cascading
    12.5 Chapter summary
    13 Testing and debugging
    13.1 Testing
    Technique 79 Unit Testing MapReduce functions, jobs, and pipelines
    Technique 80 Heavyweight job testing with the LocalJobRunner
    13.2 Debugging user space problems
    Technique 81 Examining task logs
    Technique 82 Pinpointing a problem Input Split
    Technique 83 Figuring out the JVM startup arguments for a task
    Technique 84 Debugging and error handling
    13.3 MapReduce gotchas
    Technique 85 MapReduce anti-patterns
    13.4 Chapter summary


    appendix A Related technologies
    appendix B Hadoop built-in ingress and egress tools
    appendix C HDFS dissected
    appendix D Optimized MapReduce join frameworks
    index
  • 사용후기

    사용후기가 없습니다.

  • 배송/교환정보

    배송정보

    배송 안내 입력전입니다.

    교환/반품

    교환/반품 안내 입력전입니다.

선택하신 도서가 장바구니에 담겼습니다.

계속 둘러보기 장바구니보기
회사소개 개인정보 이용약관
Copyright © 2001-2019 도서출판 홍릉. All Rights Reserved.
상단으로