name: inverse class: center, middle, inverse layout: true .header[.floatleft[.teal[Christopher Biggs] — DevOps for Dishwashers].floatright[.teal[@unixbigot] .logo[@accelerando_au]]] .footer[.floatleft[.hashtag[NDCSydney] Aug 2017]] --- name: callout class: center, middle, italic layout: true .header[.floatleft[.teal[Christopher Biggs] — DevOps for Dishwashers].floatright[.teal[@unixbigot] .logo[@accelerando_au]]] .footer[.floatleft[.hashtag[NDCSydney] Aug 2017]] --- layout: true .header[.floatleft[.teal[Christopher Biggs] — DevOps for Dishwashers].floatright[.teal[@unixbigot] .logo[@accelerando_au]]] .footer[.floatleft[.hashtag[NDCSydney] Jun 2017]] --- template: inverse # DevOps for Dishwashers ## Bringing grown-up practices to the Internet of Things .bottom.right[ Christopher Biggs, .logo[Accelerando Consulting]
@unixbigot .logo[@accelerando_au] ] --- class: bulletsh4 # Who am I? ## Christopher Biggs — .teal[@unixbigot] — .logo[@accelerando_au] .leftish[ #### Brisbane, Australia #### Former developer, architect, development manager #### Founder, .logo[Accelerando Consulting] #### Full service consultancy - chips to cloud #### ***IoT, DevOps, Big Data*** ] ??? G'day folks, I'm Christopher Biggs. I've been involved off and on with embedded systems since my first professional role over 20 years ago, and nowadays I run a consultancy specialising in the Internet of Things, which is the new name for embedded systems that don't work properly. We work with companies developing IoT devices to help them choose the right technologies and practices to build test and deploy their products. Now I've spoken in the past about the state of security in IoT, and while I agree that it's pretty awful, I'm not pessimistic. I made some predictions about how bad things could get, and what kinds of action would be needed, and the bad and good news is that these are coming true. I'm seeing the space rapidly mature and all the right things are being done to address the challenges. --- layout: true template: callout .crumb[ # Agenda ## Problems ] --- # Why Devops? ??? Last month here in sydney I talked about Internet of Scary Things, at IoT Sydney. That presentation is about how IoT is like the Wild West right now, and why DevOps is a must-do for IoT. Traditionally hardware has been developed carefully and conservatively, and had long lifecycles. IoT brings the contradictory requirement of products that are set in stone, sometimes quite literally, but which connect to the wild Internet where sometimes a day can be an age. So *this* talk is going to be about the *how*. How *you* can apply best practices from grown up computers to the challenges of building and managing internet connected autonomous devices. --- # Why Dishwashers? ??? I've called this talk DevOps for Dishwashers because recently the tale of an internet connected dishwasher made the news. A lot of people laughed because why would a dishwasher need to be on the internet. It took about 10 years for us to go from personal computers being networked only at need, to us considering any computer that isn't networked all the time to be broken. I Expect the same for every other electric appliance. In the long term, I think the very word computer will become archaic, because everything will be a computer. --- # *"Software is eating the world"* ## .right[-- Mark Andreesen] ### Wall St Journal, Six years ago this week. ??? Software, as Mark Andreesen predicted, really truly will have eaten the world. This will have a lot of effects, but the one I want to focus on today is that we must rethink our concept of quality in the embedded space. The price of an amazing future everything can interact with everything else, is that everything might interact with everything else. --- layout: true template: callout .crumb[ # Agenda ## Problems ## DevOps? ] --- template: inverse # Interlude: What do I mean by "DevOps"? ??? Actually, before I go too far, I should define what I mean by DevOps. This is one of those technical terms that has been misused so much that it's dangerous to use it without confirming that the listener hears what you mean. --- .fig40[ !(devops.png "DevOps is NOT THIS")] .spacedown[ ## DevOps is not a ***thing you do***, ## it's the ***way you do things***. ] ??? DevOps is not a thing you do. DevOps engineer is not a job title. There's no such thing as a DevOps team. It's an easy mistake to make, in fact I once held the job title Head of Devops. What DevOps **is**, is accepting that development and operations are part of a spectrum, and that distinguishing between them is counterproductive. --- .fig40[ !(devops.png "DevOps is NOT THIS")] .spacedown[ ## *Empower* ***everyone*** ## *to maximise* ***value.*** ] ??? My definition of DevOps is: Evolving a culture and a toolset that empowers everyone to effectively maximise value through radical transparency and extreme agility. --- layout: true template: callout .crumb[ # Agenda ## Problems ## DevOps? ## Solutions ] --- template: inverse # *"When every Thing is connected, everything is connected"* ## .right[-- Me] ??? So today I'm going to look at how we pay the price of a connected future. The body of this presentation is about the practical things you can do to build quality products for the internet of things. I'll look at the lifecycle of an IoT product from inception to retirement, and what you can be doing to get the best outcomes at every stage. The three areas I'm going to look at are firstly choosing platforms that support rather than hinder quality outcomes, secondly shaping your development and quality practices to foster agility while preserving reliability, and thirdly working with your data in ways that promote interoperability and reusability. --- layout: true template: callout .crumb[ # Agenda # Landscape ] --- # Welcome to the Internet of Things ## pop. 10 Trillion ??? Before we dive into all that, I want to talk about the universe of discourse. In just three score years and ten, the span of a single human lifetime, we saw the ratio of computers to people increase by around one order of magnitude per decade. --- # 1936 (Information Pandemic Year Zero) ## 10
device/person ??? If you were a fresh-faced undergrad when Konrad Zuse was building his Z1 in 1936, you might still be around to celebrate your 100th birthday next year. Aside: University washout living in parents apartment --- # Mainframe era ## 10
device/person ??? Fast-forward thirty years --- # Minicomputer era ## 10
device/person ??? A decade later, two orders, inflationary era Computers have become indispensible, but this is not widely understood. --- # Desktop era ## 10
device/person ??? Another decade, another two orders. Spreadsheets create gordon gekko. --- # Mobile era ## 10
devices/person ??? Move out of inflationary era, diffusion, eating into the fabric of our lives. Life pre-smartphone is unimaginable --- # Cloud era .red[[YOU ARE HERE]] ## ~10
devices/person ??? Starting to lose count. Soon we won't even bother. --- # Internet of things ## 10
devices/person ??? That trend is is not over. In the next decade I expect another dectupling, and at some point after that we're going to stop counting. Everything will be a computer. Every. Thing. Hence, the Internet of Things, a world where humans are a zero point one percent impurity in a living network of machines. If we're wise we'll make friends with the machines when they're babies, because I don't think we can beat them in a fair fight. --- layout: true template: callout .crumb[ # Agenda # Landscape # Challenges ] --- .fig30[ !(switchboard.gif)] .spacedown[ # Solve the next problem, not the last one] ??? During the early twentieth century, one analyst predicted an oncoming apocalpyse where civilisation would grind to a halt, because all available single women would be employed as telephone exchange operators, and no further rollout of the telephone network would be possible. --- # Beware of false analogies and straight line trends ??? Of course, we now have robots running the switchboards, and even making and answering some of the calls. Instead of having a sizeable fraction of the workforce operating the communications network, we have millions of people in less developed countries trying to sell those people life insurance over the telephone, because the cost of communicating is now practically zero. --- # Always be Questioning ??? If you wonder how we're going to curate and maintain a thousand devices per person, you're making the same kind of category error. --- # Observe, Orient, Decide, Act ??? Strategy is about adapting your behaviour to circumstances. When a computer cost 3 months wages, you shepherded it carefully. When computers cost less than a cup of coffee, they become consumables. They're sheep, not sheepdogs. --- class: bulletsh4 .crumb[ # Landscape # Challenges ## Risks ] # *"Bad people will break your stuff"* .bottom.right[ Do you want to know **more**?
"The Internet of Scary Things" [christopher.biggs.id.au/talk](http://christopher.biggs.id.au/talk)] ??? I could go on for hours about the security landscape in the internet of things, and in fact I've spoken on the subject in the past. The one-slide precis is that that bad people want to either steal your stuff, or break your stuff. It doesn't have to be many bad people, another side effect of that zero cost of communications is that if you're a terrible person, the 21st century is now a target-rich environment. --- class: bulletsh4 .crumb[ # Landscape # Challenges ## Risks ] # Everything is awful ??? I don't want to go off on a tangent about security in particular, but I find that the underlying causes security problems in IoT are an instructive microcosm of the wider challenges. Last October, the Mirai worm spread around the globe because of well known default passwords that never get changed. This month we learned that CIA has been able to insert spyware wherever they like because some software developers found it convenient to put back-doors into communications hardware. Choosing expedient but fragile programming languages brings conseuqences. The pattern here is a basic lack of professionalism and forethought. --- class: bulletsh4 .crumb[ # Landscape # Challenges ## Risks ] .fig30[ !(dumpster_fire.jpg)] .spacedown[ # Everything is awful ## and the awful is on fire] ??? Moving on, in April we got the endlessly entertaining story of the compromised internet dishwasher, which demonstrates how many developers aren't trained well enough to avoid common and completely avoidable pitfalls. The downside of ubiquitous communications is that there are no safe spaces. June's WannaCry malware outbreak showed us that when everything is software, everything has to be maintainable. Last month's second coming of a much more damaging successor hammered the point home. If you can't fix it, its no longer safe to own it. Firewalls and even Air Gaps are insufficient. --- # It's not rocket science ## No really, I mean actual rockets. ??? And a final anti pattern, if you sit in the middle of the road, you'll get run over. Quality procedures that worked well last year might be a liability next. It turns out that the government software quality rules that were applied to election security in 2016 delivered little benefit and some potential detriment becaused they largely focused on concepts that aren't relevant any more. Guidelines for safety critical software prohibit things like exceptions and recursion. This is more about preventing your space probe from overflowing memory and crashing into a planet than it is about software quality on modern embedded platforms. --- .crumb[ # Landscape # Challenges ## Risks ## Desiderata ] .fig50[ !(edna_no_capes.jpg) ] .spacedown[ # Desiderata] ??? So lets summarise what I think a response to these challenges should look like, then we'll go into detail on each point. --- # Select appropriate tools and platforms ??? First, choose the right platform and tool. --- # Comprehensive identity management ??? Next, make your identity and security part of your framework so that you only build it once. --- # Automate for developer and user convenience ??? Automate everything you possibly can. --- # Testing and testability kept front-of-mind ??? Design for testability. --- # Train, and Audit, and keep doing both ??? Train your teams. Share skills, watch youtube videos, get consultants, go to conferences. Whatever your budget there's a way to improve. --- # Monitor and react (automatically) ??? And lastly maintain awareness, and use that awareness to respond rapidly. --- layout: true template: callout .crumb[ # Agenda # Landscape # Challenges # Solutions ## Platforms ] --- template: inverse # Platforms ??? So lets talk about platforms. Jez Humble who literally wrote the book on Continuous Delivery gave a keynote right here in this buidling at Agile Australia in June where he told a story about HP's printer teams. The cost of engineering was spiralling, yet quality was awful, and part of the reason was they were spending over a quarter of their time just rewriting existing software because pretty much every printer model used a different processor. They moved to a common architecture and massively reduced their reengineering cost. That meant that some printers had a more expensive processor than they needed, but the savings far outweighed this. --- # People are more expensive than circuits ## .right[(Sorry, robots)] ??? My point here is that every hardware business has a person whose job is look at every component and ask "can we delete that to save half a cent per unit". This is a person who's empowered to remove customer value. The correct question when designing your hardware is will this hardware platform add value, in terms of resilience, longevity and supporting software quality. --- # Hardware is DevOps too ## .right[(Robots, I hope this makes it up to you :)] ??? That's right, your platform is part of your devops team. Your goal should be to select a platform that supports your devops way of doing things So while selecting and building the hardware is by far the easiest part of the IoT lifecycle, it's the foundation that facilitates the rest. --- # Open and well supported ??? This means that openness and friendliness are the first things you should look for. Select a processor and platform that has good development tools and will be around for a long time. --- # Case study: .blue[**ARM v7**] and .red[**Debian Linux**] ??? Thanks to the drivers of Android phones and TV media boxes, there's a huge variety of ARM systems out there which are similar enough that hardware differences aren't really important. You can develop to the one architecture and select an appropriate board from five bucks up. --- .fig40[![Pi](raspberry_pi.jpg) ] .spacedown[ # Meet the #3 top-selling computer of all time ] ??? In my projects we work with Raspberry Pi as the development platform. This brings the convenience of a huge selection of platform tools and a great developer community. We can use these in the lab, for CI workers, and even for the first hundred or so product units. There's a wide selection of compatible lower cost designs to buy or build for mass market production. Here's one that starts at seven bucks. --- # Artisanal free-range small-batch Linux? ## No. ??? The same goes in software, think about what happens if news a nasty bug breaks on the Internet. The top tier vendors probably have a patch out in a day or two. If you picked some hipster linux distribution that has fifty users and one maintainer, you may never see a patch. You want your tools and systems to be a convenience not a source of pain. If you're using a mainstream OS variant, and for IoT that means Redhat or Debian Linux then there's a critical mass of users that acts as a quality filter. There's not likely to be some hidden problem that you can't solve. If you're the only company in the world using some niche distribution, when you run into trouble you're on your own. --- # Without the Internet, it's just a Thing. ??? I also think it's important for devices to be online as much as possible. Power and network constraints may mean that devices can't afford to be online continuously, and this is OK, but you do need to think about how to support DevOps processes within these constraints. I'll come back to this later. --- # *"Is there anybody out there?"* .right[-- Pink Floyd
"Continuous Dashboarding" [christopher.biggs.id.au/talk](http://christopher.biggs.id.au/talk)] ??? Simples approach - red line gauges. Artificial stupidity - ignore normal, what's left? Recent ELK stack release (in June) has ML anomaly detection. --- layout: true template: callout .crumb[ # Landscape # Challenges # Solutions ## Platforms ## Dev ## QA ## Deployment ] --- template: inverse # Deployment ??? All right, so you have some tested code that's been encapsulated in container images, and now you need a device on which to run it. --- .fig50[ !(orchestrate.jpg)] .spacedown[ # Orchestrate: Never do anything by hand. ] ??? Do not fall into the trap of turning yourself into a robot. Some people say that as a programmer if you ever have to do something more than twice, you should automate it. Well let me save you those two times. If you learn how to use orchestration systems to configure servers, you'll find that pretty soon its quicker to use them even for things that you only have to do once. Also you were wrong about only ever needing to do it once. --- # Build a provisioning workflow ## Customise a clean OS (via ethernet or emulation) ??? Here's something that I've seen over and over. A project starts with developers putting together a quick prototype by starting from a blank OS installation, installing some tools and editing the configuration. Then it's six months later and there's a new version of the OS and nobody remembers how to go from a clean OS to an app ready platform. And thats how many embedded devices end up running eight year old kernels that are riddled with bugs. --- # Robo-configure the target system from a provisioning system ## Then save a filesystem image ??? So, what's a better way. Remote admin of a target machine (saltstack over ssh) Target machine in a VM - eg vagrant Target machine in a container (dockerfile) --- # How do you create a provisioning system? ## Turtles all the way down! .bottom.right[ Do you want to know **more**?
.blue[github.com/unixbigot/kevin] ] ??? You pull yourself up by your bootstraps. Saltstack - serverless mode to create the first server --- # Case study - My orchestration scripts .leftish[ 1. Create a read only recovery partition 1. Install SaltStack orchestration minion
(now switch protocols) 1. Set timezone, locale, etc. 1. Change default passwords 1. Configure network 1. Provision message bus clients 1. Install language runtimes (nodejs, java etc.) if needed 1. Configure VPN client 1. Fetch initial application containers ] ??? The first part of my process is converting a vendor image to an product base image. This runs through a bunch of repetitive configuration that nobody wants to do by hand. At the end of that process we join up to the output of the CI pipeline. CI take source code and turns it into tested docker images. The orchestration system fetches the tested docker images and drops them onto a prepared operating system. So we've closed the circle, we have continuous deployment. --- # Hey, that sounds a bit like PaaS ## Yeah, it does. ??? At this point you might be thinking this sounds a lot what platform as a service offerings, like OpenShift or Elastic Beanstalk do. Those systems provide a pre-prepared operating system which you don't have to care about, and you just drop your code into place. Well, you'd be right. There are options for IoT where you get something very similar. --- # Resin.io ## IoT PaaS with Linux and Docker .bottom.right[ Do you want to know **more**?
"The Internet of Scary Things" [christopher.biggs.id.au/talk](http://christopher.biggs.id.au/talk)] ??? One of them is resin.io, which provides a base linux image which connects back to their cloud systems. You push your code to their git servers, and their system compiles it and pushes it down to one or more registered devices. They support a number of devices, and they take requests for which ones to add next. I'm beta testing a new device at the moment. --- # Amazon AWS Greengrass ## IoT PaaS built on AWS IOT + AWS Lambda ??? Amazon Greengrass is another system - if you're familiar with lambda, greengrass is basically on-premises lambda, you use the same mechanisms but instead of deploying to an anonymous container host that you never see, you're deploying to an embedded system that you nominate. --- # Mongoose-OS ## Multiplatform embedded OS with cloud integration and remote upgrade .bottom.right[ Do you want to know **more**?
"IoT in two Minutes" [christopher.biggs.id.au/talk](http://christopher.biggs.id.au/talk)
"Continuous Dashboarding" [christopher.biggs.id.au/talk](http://christopher.biggs.id.au/talk)] ??? Right now I'm just dumping all this data into an elastic search database, where we eyeball it from time to time, but there's many other things you can do with this information. --- # Case study: Log pooling for a building safety startup .leftish[ * Ram disk on local ARM devices * Streaming to cloud with Filebeat * Processing with Logstash * Set a storage budget and expire to meet the budget ] ??? One of my clients puts mesh networks into aged care facilities. The sensors talk to a local gateway which has a cellular uplink. The logs from the whole kaboodle funnel back to the cloud, the device keeps the last hour or so in memory in case technicians need it. --- layout: true template: callout .crumb[ # Landscape # Challenges # Solutions ## Platforms ## Dev ## QA ## Deployment ## Maintenance ## Monitoring ## Measurement ] --- template: inverse # Measurement (application data) ??? So that's the data from the system software. We can use all the same mechanisms for the application software, or we can use something bespoke. --- # Use orchestration message bus ## SaltStack message bus is the fast, lightweight ZeroMQ ??? First option is to use the message bus fabric that is maintained by the orchestration software. --- # Can you use your orchestration bus for application events? ## Yes, with care ??? Is this a good idea. I think so, I've done it as a proof of concept but not deployed it yet. * CLI tools to inject messages * Python library --- # Extend orchestration system with custom modules ??? More fancy: * Beacon plugin interface * Engine plugin interface on the server --- # Record as much as you can, digest later ## Shove all your client data in ElasticSearch .bottom.right[Purge oldest indexes until CFO stops whinging] ??? My approach to instrumentation is to overshare. I think its better to have logs and not need them than the reverse. --- # Case study: Saltstack plus ELK .leftish[ * Bridge orchestration bus to application message bus * Engine module at top level master (or intermediate) * Gateway messages to elasticsearch, via logstash * Want MQTT? You already built a PKI to deploy it in 2 minutes ] ??? The project with the subterranean sensors is fairly low volume, so I send the data up the orchestration bus. At the master there's a rule that sends it across to a logstash cluster. --- # Case study: MQTT plus ELK ## "Rapids Rivers Ponds" .leftish[ * MQTT brokers at each site * Broker in the cloud federates with on-site brokers * Logstash MQTT plugin subscribes to all events ] .bottom.right[ Do you want to know **more**?
"Implementing Microservice Architectures" [Fred George, YOW 2014](http://yowconference.com.au/slides/yow2014/George-ImplementingMicroserviceArchitectures.pdf)] ??? For another project, the data volume is so high I wanted a separate channel. The field devices run an MQTT broker, and the cloud hub connects to each of these and receives a firehose of data. Now we have a river of data which we pour into our data lake. --- layout: true template: callout .crumb[ # Landscape # Challenges # Solutions ## Platforms ## Dev ## QA ## Deployment ## Maintenance ## Monitoring ## Measurement ## Visualisation ] --- template: inverse # Visualisation ??? Okay I'm going to be real brief about visualisation. --- # Real time status ## Liveness, resources, environment ??? We've already looked at system performance data. I won't say any more about that. System administrators already know all about this stuff. --- # Measure your KPIs ## Whatever makes you money, count it ??? What people measure less often is the bottom line. Would you know if whatever makes you money stopped working? Remember the telco called OneTel. Their billing system was not very good about issuing bills. How long would you stay in business with no income? --- # Measure your KPIs ## Set high and low water marks, alert on them ??? Broken is not always as simple as on or off. Often you know that normal is within a certain range. So if things are not normal, you probably want to know. --- # Measure your KPIs ## Pay: Elastic and other vendors have commercial alert engines ??? Data analysis is big business and there's a ton of tools to choose from. --- # Measure your KPIs ## Free: Node-RED makes a good FOSS alerting engine .bottom.right[ Do you want to know **more**?
"Continuous Dashboarding" [christopher.biggs.id.au/talk](http://christopher.biggs.id.au/talk)] ??? If you're interested in rolling your own, have a look at Node Red which is a data flow processing system that's targeted at IoT. I've put a link to more info in these slides. --- # Longitudinal comparisons ## View long-term trends in KPIs ??? Two last things to consider, first the long term trend in your KPIs, are you going bust or boom. --- # Longitudinal comparisons ## Pay attention to device longevity, wear, etc. ??? Finally, are there parts in your devices that wear out. Storage and battery are the two that spring to mind, so you might want to consider tracking some metrics about these. Batteries are a particular area where there's lies, damned lies, and manufacturer capacity claims. --- layout: true template: callout class: bulletsh4 .crumb[ # Landscape # Challenges # Solutions # Coda ## Summary ] --- .fig40[ !(keep-calm.jpg) ] # Summary .left[ #### Lots of devices, too many to administer by hand #### Swimming in a soup of malware and bad actors #### Choose tools that support quality #### Pipelines for automated build/test/stage #### (Ab)use traditional cloud management tools for IoT Fleet #### Message bus all the things #### Big data now, play later ] ??? All right, let's summarise what I've talked about. We've looked at the threat landscape. We've worked through the produt lifecycle from development and testing, to deployment management and retirement. And we've covered some things you can do with the data that comes out the end. If you want to hear more about that last point, come along to YOW Data here in sydney next month, were myself and also a bunch of really smart people will talk about data. --- layout: true template: callout .crumb[ # Landscape # Challenges # Solutions # Coda ## Summary ## Resources ] --- # Resources, Questions .left[ #### My SaltStack rules for IoT - [github.com/unixbigot/kevin](https://github.com/unixbigot/kevin/) #### Related talks - [http://christopher.biggs.id.au/#talks](http://christopher.biggs.id.au/#talks) #### Me - Christopher Biggs - Twitter: .blue[@unixbigot] - Email: .blue[email@example.com] - Slides, and getting my advice: http://christopher.biggs.id.au/ - Accelerando Consulting - IoT, DevOps, Big Data - https://accelerando.com.au/ ] ??? Thanks for your time today, I'm happy to take questions in the few moments remaining and I'm here all week if you want to have a longer chat. Over to you.