Coverart for item
The Resource Site reliability engineering : how Google runs production systems, edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy

Site reliability engineering : how Google runs production systems, edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy

Label
Site reliability engineering : how Google runs production systems
Title
Site reliability engineering
Title remainder
how Google runs production systems
Statement of responsibility
edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy
Contributor
Editor
Subject
Language
eng
Cataloging source
BTCTA
Dewey number
620.00452 SIT
Illustrations
illustrations
Index
index present
LC call number
HD9696.8.U64
LC item number
G6666 2016
Literary form
non fiction
Nature of contents
bibliography
http://library.link/vocab/relatedWorkOrContributorName
  • Beyer, Betsy
  • Jones, Chris
  • Petoff, Jennifer
  • Murphy, Niall Richard
http://library.link/vocab/subjectName
  • Google (Firm)
  • Systems engineering
  • Reliability (Engineering)
  • Internet industry
Target audience
adult
Label
Site reliability engineering : how Google runs production systems, edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy
Instantiates
Publication
Bibliography note
Includes bibliographical references (pages 501-512) and index
Carrier category
volume
Carrier category code
  • nc
Carrier MARC source
rdacarrier
Content category
text
Content type code
  • txt
Content type MARC source
rdacontent
Contents
Introduction. The production environment at Google, from the viewpoint of an SRE -- Principles. Embracing risk -- Service level objectives -- Eliminating toil -- Monitoring distributed systems -- The evolution of automation at Google -- Release engineering -- Simplicity -- Practices. Practical alerting from time-series data -- Being on-call -- Effective troubleshooting -- Emergency response -- Managing incidents -- Postmortem culture: learning from failure -- Tracking outages -- Testing for reliability -- Software engineering in SRE -- Load balancing at the frontend -- Load balancing in the datacenter -- Handling overload -- Addressing cascading failures -- Managing critical state: distributed consensus for reliability -- Distributed periodic scheduling with Cron --Data processing pipelines -- Date integrity: what you read is what your wrote -- Reliable product launches at scale -- Management. Accelerating SREs to on-call and beyond -- Dealing with interrupts -- Embedding an SRE to recover from operational overload -- Communication and collaboration in SRE -- The evolving SRE engagement model -- Conclusions. Lessons learned from other industries
Control code
ocn930683030
Dimensions
24 cm
Edition
First edition.
Extent
xxiv, 524 pages
Isbn
9781491929124
Media category
unmediated
Media MARC source
rdamedia
Media type code
  • n
Note
338.761 SI86 ; XX-N ; [A15BP042, #18, 3@44.99, ES/ CSS]
Other physical details
illustrations
System control number
  • (Sirsi) 930683030
  • (OCoLC)930683030
Label
Site reliability engineering : how Google runs production systems, edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy
Publication
Bibliography note
Includes bibliographical references (pages 501-512) and index
Carrier category
volume
Carrier category code
  • nc
Carrier MARC source
rdacarrier
Content category
text
Content type code
  • txt
Content type MARC source
rdacontent
Contents
Introduction. The production environment at Google, from the viewpoint of an SRE -- Principles. Embracing risk -- Service level objectives -- Eliminating toil -- Monitoring distributed systems -- The evolution of automation at Google -- Release engineering -- Simplicity -- Practices. Practical alerting from time-series data -- Being on-call -- Effective troubleshooting -- Emergency response -- Managing incidents -- Postmortem culture: learning from failure -- Tracking outages -- Testing for reliability -- Software engineering in SRE -- Load balancing at the frontend -- Load balancing in the datacenter -- Handling overload -- Addressing cascading failures -- Managing critical state: distributed consensus for reliability -- Distributed periodic scheduling with Cron --Data processing pipelines -- Date integrity: what you read is what your wrote -- Reliable product launches at scale -- Management. Accelerating SREs to on-call and beyond -- Dealing with interrupts -- Embedding an SRE to recover from operational overload -- Communication and collaboration in SRE -- The evolving SRE engagement model -- Conclusions. Lessons learned from other industries
Control code
ocn930683030
Dimensions
24 cm
Edition
First edition.
Extent
xxiv, 524 pages
Isbn
9781491929124
Media category
unmediated
Media MARC source
rdamedia
Media type code
  • n
Note
338.761 SI86 ; XX-N ; [A15BP042, #18, 3@44.99, ES/ CSS]
Other physical details
illustrations
System control number
  • (Sirsi) 930683030
  • (OCoLC)930683030

Library Locations

    • Lee's Summit BranchBorrow it
      150 NW Oldham Pkwy., Lee's Summit, MO, 64081 , US
      38.915628 -94.400799
    • Parkville BranchBorrow it
      8815 Tom Watson Pkwy., Parkville, MO, 64152 , US
      39.2099 -94.68334
    • Riverside BranchBorrow it
      2700 N.W. Vivion Road, Riverside, MO, 64150, US
      39.178749 -94.612022
Processing Feedback ...