Troubleshooting industrial control systems – Tips and pointers

Everything is hooked up but something isn’t working. It worked before but not right now. Or, it’s a new system and you haven’t got it working yet. In any case, troubleshooting industrial controls systems usually involves a repeatable approach or pattern of actions.

1. Break the system down into components.

There is a root cause to what is going wrong. Each system is made of of multiple components – software and hardware. To zero in on the root cause, check if  each part/component/sub-part of the system works. Example: PLC not communicating with VFD over Modbus. Questions to ask and things to check:
  • Check if the VFD responds to Modbus polls from a Modbus simulator on your computer.
  • Alternately, if there is another PLC, preferably identical, hook it up to the drive and download the same program. Check if it communicates.
  • If there is another VFD available, preferably one that is known to be functional. Also preferable if its of the same type, hook that up to the PLC. Check if it communicates.

Note: Check that the communication settings in the PLC and VFD are correct.

 The concept here is to test individual components to zero in on the root cause.

2. Go deep into the workings of each sub-component- including inputs, outputs and the software when possible.

This becomes easier if you know which subsection of each component may be suspect. To the communications breakdown example above, it could be the serial port ( hardware) or the serial communication settings( software). This provides for immediate leads to follow. 

In any case, when troubleshooting industrial controls address the obvious first, and then go deeper into each sub-component. Confirm that everything is plugged in and powered up correctly. Can’t count the number of times something wasn’t working … because it wasn’t powered up.

3. Bring in alternative/replacement components, if available.

This includes replacement cables. Depends on the stage / I.e development, legacy/installed. This is noted described in item 1 above. The implication here is to consider carrying spare parts to a site when possible. If this is not possible, consider alternatives like software to test for specific functionality.

4. Don’t alter too many characteristics at a time.

Issues are sometimes a combination of two things happening. Example: PLC won’t communicate with an HMI. The HMI was replaced and the problem persists. The old HMI was connected back in and the PLC replaced,- the problem remained. Replaced the PLC and HMI and figured out that some kind of wiring issue had taken out the serial ports on both devices.  Moral of the story: Check one component and then consider combinations.

5. Use software where possible.

One example noted in item 1 above. Another example for ethernet communications troubleshooting is a tool like Wireshark.
Use case:  Watch communication lines for packets coming through and sequence of events.

6. Document testing and troubleshooting efforts.

Write down the sequence of tested items. Makes a big difference after you go through several hours of troubleshooting and forget what has been tested. Also, take pictures as much as possible. Helps when recapping items covered or when researching sub components in front of your computer.

7. Consider the environment, timing and non- obvious of issues.

Time and space are at the core of things. Noise from close by systems, or an environmental effect at the same time of day may cause intermittent issues. This goes back to non-obvious factors to consider.
There may be non-obvious factors that may affect things. One example is a electrical noise issues. Co-located power wiring or bad grounding practices could affect this.
Another example of a non-obvious factor may be related to people or process factors. Example: an operator may be shutting down a device or parts of a system and not powering them up again correctly.  
Another example: someone updated the firmware to one of the devices and it doesn’t communicate to the other devices any more. The problem here may be more of a  process issue but knowing if the firmware was recently updated provides leads for next steps.

Leave a Reply

Your email address will not be published. Required fields are marked *