IoT Troubleshooting Guide
Updated: Feb 5, 2020
The current wave of innovations in the Internet of Things (IoT) is creating billions of internet connected devices over the next few years. Many of these devices are innovative new products that never existed before, or at least never had embedded electronics or connected to the internet. This means that there are people and organisations that previously dealt with only mechanical products and now have to understand electronics and digital solutions to achieve the benefits of IoT. With this phenomenon occurring right now, there is of course a massive increase in device troubleshooting. All organisations have issues with their products at one time or another including Apple, Toyota, and Coca-Cola. A key difference between those organisations and the ones that don’t realise the same level of success is the way they address troubleshooting and improvement. As WaterGroup have been delivering IoT products and solutions such as smart water meters and other monitoring devices to the water industry for over 10 years (before it was called IoT), I have been involved in my fair share of troubleshooting remote devices. This article is a very simple set of rules to serve as a guide to troubleshooting based on this experience.
Rule #1 – Do the Easiest Thing First!
IoT solutions can be very complex systems with many potential points of failure both in the physical world, and in the digital. You will have to make choices on where to start looking for the cause of fault. Some of this troubleshooting can be done from the comfort of your desk, and some will require you to physically inspect the device out in the field. Significant time and costs can be invested in troubleshooting so the goal is to minimise this. Doing the easiest thing first is not being lazy, it is simply efficient.
For example: a remote wireless sensor configured to send data on a daily basis stops transmitting. There are a number of things you can do to investigate the cause of fault. The easiest is likely to be something you can do at a desk like check the devices battery voltage records (assuming your device records and sends this info, which it should). If the voltage has been decreasing for some time and is now below the usable threshold, then that is probably your problem. If the easiest thing doesn’t determine the cause of fault, see Rule #2.
Rule #2 – Do the Second Easiest Thing Second! (and so on…)
You probably realise there is an issue with a remote IoT device while sitting at a desk. You either received a phone call, a support ticket was raised, a push notification came to your phone, a red flag was raised on a hardware management platform, or you simply aren’t seeing the information you expect to see on a user interface. There are probably half a dozen things that you can check to determine the cause of fault within a few minutes and without getting up from your chair. These could include checking battery voltage, sensor calibration, event logs, etc. This process will quickly identify likely causes of fault, or at least will greatly narrow down your options for the next steps. If as you move through the easiest things you do not identify a likely cause of fault, a next step may be to call someone on site to let them know about the issue and ask for their input. Has there been a power failure on site? Has a cable been cut? Has the hardware been tampered with? Checking these quick things first should happen before you spend time travelling out to site to start pulling things apart.
Rule #3 – Document Everything
What day and time did the issue occur? What were the symptoms? What was done to troubleshoot it and what were the outcomes?
If an issue requires troubleshooting, there is probably a chance that the same issue could happen again, especially if you have thousands of similar devices deployed. Documenting these issues, what caused them and how you solve them is crucial to improving your products and solutions. This information should be fed back into your development pipeline to prevent these issues from happening again.
It is also important to document your troubleshooting process in case you don’t manage to resolve the issue on day one. With a written record of the steps you took and the outcomes, you will know where to start from the next day.
The information you document can also benefit your customers. A well detailed troubleshooting report showing the effort you went to and how the issue was solved gives the customer peace of mind and shows that they are in good hands.
Rule #4 – Plan Ahead and Bring Everything to Site
If your troubleshooting leads to a site visit, plan ahead and list out any equipment and parts that you may need depending on what could possibly be causing the issue. Consider what replacement parts you may need to repair broken or faulty parts. Consider bringing an entire replacement product to get the system up and running again if needed. Bring everything with you. Having people on site can be one of the most time consuming and costly aspects of troubleshooting. Aside from the time and cost of travel, troubleshooting on site is usually slower than at a work bench with all of your equipment at your fingertips. By bringing everything you might need, you greatly increase the chance that the issue can be resolved quickly.
Rule #5 – Think Like a Detective
As the expert doing the troubleshooting , understand that you don’t know what the cause of an issue is until you’ve proven it. Say for example that you have installed a remote water tank level monitoring device. Your client tells you “I’m seeing data on my dashboard telling me that the water level is low even though it rained last night. The tank should be full. This data must be wrong and the sensor needs recalibrating. Come out and fix it.” Until you’ve narrowed down the possibilities, the client’s statement is nothing more than a clue as to what the issue could be. Did anyone see the water level in the tank when the device last recorded the level? Is the sensor plugged in and connected to the tank? Is the sensor cable in-tact? Are the gutters clear so water can flow into the tank? Could the tank be leaking? Could the tank be emptied for cleaning or servicing? Collect information and narrow down the possibilities, getting closer to the solution with each piece of information.
Rule 6# - Continually Improve
Your troubleshooting should be part of a continuous feedback loop with product and solution development. Follow Rule #3 Document Everything to collect the details of every issue, what it is caused by, and what resolved it. This information is gold for product developers and production managers looking to improve reliability and performance of IoT solutions.
Continual Improvement is the practice of making many small changes to equipment and processes over time to change overall outputs of operational systems for the better. This covers both the manufacturing of the products and the reduction of faults throughout the life of the product. Feeding troubleshooting outcomes back into the design and manufacturing stages is key to Continual Improvement.
Rule 7# - Keep the Customer Informed
The customer is the focus of your IoT products and solutions. Your offerings should be built around what the customer needs, and so should your business and support processes. If there is an issue, the customer should know about it. If you’re reluctant to contact the customer because you’re worried that they will be upset that there is an issue, recognise that this is the exact moment that you should pick up the phone and call them, even if it may be a difficult conversation. Wherever possible, your phone call or email (typed by you or an automatically generated alert) should be the first thing that tells your customer that there is an issue. It is always better to be on the front foot when addressing issues, rather than the customer finding out first and wondering what is going on.
By keeping the customer informed, you are providing service and building trust in you, your organisation, and your brand. Let your customer know that there is an issue, what that issue is, what steps you’re taking to resolve it, when it is resolved, and importantly what will prevent this issue in the future. Let them know that you’ve put measures in place to improve your products and avoid this issue occurring again.
Consider making troubleshooting guides available to your customers for common issues and how to resolve them. Sometimes simply letting the customer know the basics like checking that device is plugged into power and turned on will save everyone time and hassle.
Informed customers are happy customers who have the trust to continue doing business with you and recommending you to others. If you are the customer and you are reading this, keep your IoT provider informed on what is happening with their products and what should be done to improve them.
1) Do the Easiest Thing First!
2) Do the Second Easiest Thing Second!
3) Document Everything
4) Plan Ahead and Bring Everything to Site
5) Think Like a Detective
6) Continually Improve
7) Keep the Customer Informed
I hope that this guide is useful for those entering the IoT world with new products and solutions. There are many great technical guides and thought pieces on troubleshooting if you search for them. There are also great tutorials for specific issues that may apply to your products and solutions. This article is my input on how to simplify troubleshooting and to ensure your products, solutions, and brand improve and continue to deliver results for your customers.
We are committed to the rapid development and improvement of IoT, smart water metering, and water efficiency within Australia and internationally.