Monday, June 11, 2012

Temperature Monitoring with APC

In an effort to identify trouble areas of our datacenter, and to gather enough data to help justify some equipment to manage airflow in the server rows, I've begun rolling out temperature and humidity sensors throughout. One of our previous admins had partially deployed APC AP9340 EMUs, which can have 6 probes connected. These devices, however, were either never fully set up, or the set up was torn down prior to me getting my grubby little mitts on them. Additionally, they appear to be from 2008 - they do everything I want/need, but there are newer models out there that do more than I want/need (must have!)

The eventual goal is to have temperature/humidity probes at several locations in the aisles, and then temperature only probes connected to some of the rack-mounted PDUs. I plan on using some sort of software - whether APC's StruxureWare Central, Zenoss, or something similar, to log and graph the ups and downs in the lab.

As it stands, that goal is coming along, but still distant. I have a trio of the AP9340s currently, but have recently ordered a pair of the new shiny NetBotz 200 EMUs, which each will have a NetBotz 150 sensor pod daisy chained off of it. Between the 7 devices, I'll have sufficient capacity to get a pretty broad picture of the DC. On top of this, our current standard rackmount PDU is the APC AP8941, which has the capability of taking a temperature probe. Each one of these will have a temp probe connected. Since I care more about inlet temperature than the hot side of a rack, these probes will be dropped into the lower part of the front of the rack. If the rack contains a second PDU, there I'll monitor the hot side. As I said above, I have all of the parts to do this either in house or inbound.

The tricky part is the actual monitoring. If I had unlimited budget, I'd roll out Struxureware Central and be done with it. Since I'm working with...about nothing, I have to be more creative. The first AP9340 that I deployed, I was graphing with Cacti. This worked reasonably well, but when I connected the second EMU, it refused to graph, and then the first one followed suit. No idea why. Cacti uses SNMP to pull the data from the EMUs (templates for the devices found here), and an snmpwalk would gather that data just fine.

Until I can roll out Zenoss or something else, I've created a sort of "field expedient" bit of monitoring, using cron and snmpwalk.

The AP9340s were the easier of the devices to get working with this. The little command that I run to get the data from these is as follows:

for i in `seq 1 6`; do echo -n 'Temp in ' && echo -n `snmpwalk -v1 -c public$i | sed -e 's/.*STRING: \"\(.*\)\"/\1/'` && echo -n ": "; snmpwalk -v1 -c public$i | sed -e 's/.*INTEGER: \([0-9]*\)/\1/'; done

This prints out something like the following:
Temp in Aisle 3, Cold 1: 65
Temp in Aisle 3, Hot 1: 79
Temp in Aisle 3, Cold 2: 67
Temp in Aisle 3, Hot 2: 77
Temp in Aisle 3, Cold 3: 63
Temp in Aisle 3, Hot 3: 72

The location name, as well as the temperatures, are pulled from SNMP.

To break that line down, we're doing a for loop, looping from 1 through 6.

for i in `seq 1 6`;

For each iteration through the loop, we echo (with the -n to strip the trailing newline) "Temp in ", and then execute the snmpwalk to gather the location of the probe.

snmpwalk -v1 -c public$i

the $i is 1-6, depending on our current location in our for loop. You'll notice the echo -n and the ` before the first snmpwalk. We're actually executing this command inside it's associated echo -n, to strip the newline from the end of the string to make the output pretty. We also have the subsequent sed command in this same echo string. We pipe the snmpwalk output to the following:

sed -e 's/.*STRING: \"\(.*\)\"/\1/'

This looks for the word STRING: in the line, and then matches the parts between the subsequent double quotes into a regex variable. Since the contents of the double quotes is the location, and the only part we care about, we throw out everything else and replace it with just the location (marked as \1 on the replace side of the regex)

After we close out the echo, we pretty much repeat the process for the actual temperature, which is the second snmpwalk and sed. Here we're just looking for INTEGER, stripping out the number that follows it, and printing it.

By replacing the OID in the second snmpwalk, we can get this to pull the humidity data off of a temperature/humidity probe.

The OIDs I use are as follows:  LOCATION of probe TEMPERATURE the probe is indicating HUMIDITY the probe is indicating

The last .1 on there becomes .2 for port 2, .3 for port 3, etc. I believe the name of the probe is around that vicinity, maybe the last three are 3.0.1, I can't remember.

For pulling data off of the AP8941 PDUs, it's a bit more complicated. First off, you need the SNMP MIB for APC devices, found here (look for Firmware Upgrades: MIB, free compulsory upgrade required, try using bugmenot). Then, we need an even more magic and crazy bit of CLI-fu.

echo -n "Temp in " && snmpwalk -m ~/powernet403.mib -v1 -c public . | grep rPDU2IdentLocation.1 | tr -d '\n'| sed -e 's/.*STRING: \"\(.*\)\"/\1/' && echo -n ": "; snmpwalk -m ~/powernet403.mib -v1 -c public . | grep rPDU2SensorTempHumidityStatusTempF.1 | sed -e 's/.*INTEGER: \([0-9]*\)/scale=1;\1 \/ 10/' | bc

This starts out much like the previous one, though we have to call the MIB from snmpwalk. The first deviation is with the grep - the snmpwalks to pull the location is kind of a shotgun blast, you get a ton of data, and only want that one line. We use the tr to cut the newline off of the end of the output. We then do a second snmpwalk to get the temperature, which is where it gets weird. The AP8941's output the temperature via SNMP as a three digit integer - first two digits are the temperature to the left of the decimal, the third digit is the tenths place of the temperature. Rather than use a floating point, it's just stored in a beefed up int. Because of this, we have to divide the resulting data by 10 to get the real temperature.

We pull the integer temperature with this:

snmpwalk -m ~/powernet403.mib -v1 -c public . | grep rPDU2SensorTempHumidityStatusTempF.1

We then pass it through sed:

sed -e 's/.*INTEGER: \([0-9]*\)/scale=1;\1 \/ 10/'

This matches the integer out of the temperature string, and then on the replace side wraps the integer in some additional text that bc understands (basically, give me 1 place past the decimal, then divide my int temp by 10). The last bit is piped through bc to get us the true floating point temperature.

Output looks like this:
 Temp in Aisle 2, Rack 4 Front: 62.3

I'm sure if you connected a humidity probe to the PDU you could grab that info, but I'm only connecting temperature probes there, so I haven't gone seeking the OID.

Currently, each device I'm pulling info off of has its own line in a script. There's a better way to do this, I just haven't bothered. Cron executes this once an hour and emails it to a few people.

Hopefully I'll have more on this, soon. I need to dig up the relevant OIDs for the NetBotz devices, I'm hoping these are the same as either the AP9340s or AP8941s. I also need to find a handy place to deploy Zenoss, so that I can get some graphs and logging going.

1 comment:

  1. Here's an snmpwalk of this entire area from an APC PDU with a temperature/humidity sensor, showing the humidity.

    % snmpwalk -m PowerNet-MIB -c abcdef -v 1 enterprises.318.
    PowerNet-MIB::rPDU2SensorTempHumidityStatusIndex.1 = INTEGER: 1
    PowerNet-MIB::rPDU2SensorTempHumidityStatusModule.1 = INTEGER: 1
    PowerNet-MIB::rPDU2SensorTempHumidityStatusName.1 = STRING: "rack1-pduB"
    PowerNet-MIB::rPDU2SensorTempHumidityStatusNumber.1 = INTEGER: 1
    PowerNet-MIB::rPDU2SensorTempHumidityStatusType.1 = INTEGER: temperatureHumidity(2)
    PowerNet-MIB::rPDU2SensorTempHumidityStatusCommStatus.1 = INTEGER: commsOK(2)
    PowerNet-MIB::rPDU2SensorTempHumidityStatusTempF.1 = INTEGER: 726
    PowerNet-MIB::rPDU2SensorTempHumidityStatusTempC.1 = INTEGER: 226
    PowerNet-MIB::rPDU2SensorTempHumidityStatusTempStatus.1 = INTEGER: normal(4)
    PowerNet-MIB::rPDU2SensorTempHumidityStatusRelativeHumidity.1 = INTEGER: 39
    PowerNet-MIB::rPDU2SensorTempHumidityStatusHumidityStatus.1 = INTEGER: normal(4)