A curated list of awesome Chaos Engineering resources.
What is Chaos Engineering?
Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. – Principles Of Chaos Engineering website.
Contents
- Culture
- Books
- Education
- Notable Tools
- Papers
- Blogs & Newsletters
- Conferences & Meetups
- Forums
Culture
- Principles Of Chaos Engineering
- Chaos Community
- Chaos Engineering
- O’Reilly Velocity San Jose 2017: Precision Chaos
- The Discipline of Chaos Engineering
- Chaos Monkey for Fun and Profit
- Fault Injection in Production: Making the case for resilience testing
- Lord of Chaos – Becoming a Chaos Engineer
- Chaos testing – Preventing failure by instigation
- Orchestrated Chaos
- Choose your own adventure: Chaos Engineering – Video & Slides
- AMA Chaos Engineering + DiRT
- SRECON17: Principles of Chaos Engineering
- Chaos & Intuition Engineering at Netflix
- Mastering Chaos – A Netflix Guide to Microservices
- Too big to test: Breaking a production brokerage platform without causing financial devastation
- Inside Azure Search: Chaos Engineering
- Netflix, the Simian Army, and the culture of freedom and responsibility
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- The Verification of a Distributed System by Caitie McCaffrey
- The Journey to Chaos Engineering begins with a single step – Bruce Wong and James Burns (Twilio)
- Chaos Engineering by Lorin Hochstein
- Aaron Rinehart – ChaoSlingr: Introducing Security based Chaos Testing
- Chaos Engineering – Casey Rosenthal
- The Road to Chaos – Velocity 2017- video & slides
- How Netflix DDoS’d Itself To Help Protect the Entire Internet
- 10 Years of Crashing Google
- Weathering the Unexpected
- SRECON17: Breaking Things on Purpose
- PuppetConf 2016: Chaos Patterns – Architecting for Failure in Distributed Systems
- Ship More, Sink Less – Changing Chaos Engineering and Distributed Tracing
- Cloudcast – Discipline of Chaos Engineering
- Software Engineering Daily – Failure Injection with Kolton Andrus podcast
- Responding to Failures in Playback Features with Haley Tucker podcast
- “Antics, drift, and chaos” by Lorin Hochstein
- re:invent 2017: Nora Jones Describes Why We Need More Chaos – Chaos Engineering, That Is
- Failure Friday: Four Years On
- Monkeys & Lemurs and Locusts, Oh my!
- Practical Chaos Engineering
- Chaos Day in the Met Office Cloud
- Cloud Native and Chaos Engineering
- Chaos Engineering with Kolton Andrus
- “GameDay” – Achieving Resilience through Chaos Engineering
- Chaos Engineering: the history, principles, and practice
- Embracing the Chaos of Chaos Engineering
- Designing Services for Resilience: Netflix Lessons
- Chaos Engineering: A cheat sheet
- How to convince your boss and make them say “Yes!” to Chaos Engineering?
- Why the World Needs More Resilient Systems
- Chaos Architecture
- Gremlin’s Tammy Bütow on the Business Side of Chaos Engineering
- Kubernetes Chaos Engineering: Lessons Learned
- Chaos Engineering: managing complexity by breaking things
- Podcast:Database Chaos with Tammy Butow
- LinkedOut: A Request-Level Failure Injection Framework
Books
- Chaos Engineering: Building Confidence in System Behavior through Experiment
- Site Reliability Engineering: How Google Runs Production Systems –
- The Practice Of Cloud System Administration: Designing and Operating Large Distributed Systems
- Antifragile Systems and Teams
Education
- A Chaos Engineering Bootcamp for O’Reilly Velocity 2017 – Slides & Source code
- Your First Chaos Experiment
- Chaos Engineering 101
- A Primer on Automating Chaos
- Intro to Chaos Engineering
- Learn the basics of the Chaos Toolkit
- How to Run a GameDay
- Build System Confidence with Chaos Engineering
- How we break things at Twitter: failure testing
- Run Chaos Experiments Without Risking Your Job
- A Guide to Your First Chaos Day
- Planning Your Own Chaos Day
- How To Install Distributed Tensorflow on GCP and Perform Chaos Engineering Experiments
- Monitoring Your Chaos Experiments
- Increasing the Resilience of APIs with Chaos Engineering
- 3 key steps for running chaos engineering experiments
Notable Tools
- Chaos Monkey – A resiliency tool that helps applications tolerate random instance failures.
- The Simian Army – A suite of tools for keeping your cloud operating in top form.
- orchestrator – MySQL replication topology management and HA.
- kube-monkey – An implementation of Netflix’s Chaos Monkey for Kubernetes clusters.
- Gremlin Inc. – Failure as a Service.
- Pumba – Chaos testing and network emulation for Docker containers (and clusters).
- Chaos Toolkit – A chaos engineering toolkit to help you build confidence in your software system.
- ChaoSlingr – Introducing Security Chaos Engineering. ChaoSlingr focuses primarily on the experimentation on AWS Infrastructure to proactively instrument system security failure through experimentation.
- PowerfulSeal – Adds chaos to your Kubernetes clusters, so that you can detect problems in your systems as early as possible. It kills targeted pods and takes VMs up and down.
- drax – DC/OS Resilience Automated Xenodiagnosis tool. It helps to test DC/OS deployments by applying a Chaos Monkey-inspired, proactive and invasive testing approach.
- Wiremock – API mocking (Service Virtualization) which enables modeling real world faults and delays
- MockLab – API mocking (Service Virtualization) as a service which enables modeling real world faults and delays.
- Pod-Reaper – A rules based pod killing container. Pod-Reaper was designed to kill pods that meet specific conditions that can be used for Chaos testing in Kubernetes.
- Muxy – A chaos testing tool for simulating a real-world distributed system failures.
- Toxiproxy – A TCP proxy to simulate network and system conditions for chaos and resiliency testing.
- Blockade – Docker-based utility for testing network failures and partitions in distributed applications.
- chaos-lambda – Randomly terminate ASG instances during business hours.
- Namazu – Programmable fuzzy scheduler for testing distributed systems.
- Chaos Monkey for Spring Boot – Injects latencies, exceptions, and terminations into Spring Boot applications
- Byte-Monkey – Bytecode-level fault injection for the JVM. It works by instrumenting application code on the fly to deliberately introduce faults like exceptions and latency.
- GomJabbar – ChaosMonkey for your private cloud
Cloud Services
- Testing Amazon Aurora Using Fault Injection Queries
- Azure Fault Analysis Service, see also Include controlled Chaos in Service Fabric clusters
Papers
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems
- Lineage-driven Fault Injection
- Automating Failure Testing Research at Internet Scale
- Principles of Antifragile Software
- Why is random testing effective for partition tolerance bugs?
- Chaos Engineering
- A Platform for Automating Chaos Experiments
- A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM
Blogs & Newsletters
- Netflix Technology Blog – Learn more about how Netflix designs, builds, and operates our systems and engineering organizations.
- Production Ready – A mailing list about building resilient infrastructure and tools.
- SRE Weekly – Weekly Site Reliability Newsletter.
- Site Reliability Engineering resources – A curated list of awesome Site Reliability and Production Engineering resources.
- SysAdvent – One article for each day of December, ending on the 25th article.
- Gremlin Blog – Blogs on Chaos Engineering from Gremlin Inc.
- O’Reilly Systems Engineering and Operations Newsletter – Weekly systems engineering and operations news and insights from industry insiders.
- GameDay Resources – Resources for getting started with GameDay and Chaos Engineering.
- LaunchDarkly Blog – Continuous delivery and feature flags blog.
Conferences & Meetups
- Chaos Conf – A day of Chaos Engineering demos, expert advice, and connect with your peers putting chaos into practice at their companies.
- SRECon Conferences – The official SRE conference.
- LISA Conferences – Prominent conference about SysAdmin/DevOps/SRE.
- O’Reilly Velocity Conference – Prominent conference about Systems Engineering/DevOps/SRE.
- Chaos Engineering Community Meetup Group – Bay Area Meetup group for Chaos Engineers.
- London Chaos Engineering Community _ London Area Meetup group for Chaos Engineers.
- Chaos Engineering Community – A collection of meetups across the globe about Chaos Engineerings.
Forums
- Chaos Community Google Group
- Chaos Engineering LinkedIn Group
- Chaos Engineering Slack Community
- CNCF Chaos Engineering Working Group
- CNCF Chaos Engineering Working Group Slack: #chaosengineering (slack.cncf.io)
- CNCF Chaos Engineering Working Group Github
- Aaron Blohowiak
- Casey Rosenthal
- Mathias Lafeldt
- Nora Jones
- Tammy Bütow
- Bruce Wong
- Kolton Andrus
- Lorin Hochstein
- Peter Alvaro
- John Allspaw
- Charles Torre
- Russ Miles
- Aaron Rinehart
- Mikolaj Pawlikowski
Contributing
Please take a look at the contribution guidelines first. Contributions are always welcome!
今天的文章dastergon/awesome-chaos-engineering: 混沌工程 awesome系列分享到此就结束了,感谢您的阅读。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/17129.html