Wednesday, October 24, 2012

EMU Monitoring with Zenoss


That probably should actually read "attempted EMU monitoring with zenoss."

I'm using Zenoss 4. It's really, horribly painful to use. What documentation I can find is for previous versions, and they don't seem to apply to v4. 

There's an easy to use script that actually installs Zenoss. It seems to Just Work, though it assumes that you don't currently have MySQL installed or running, and if you do, you have to remove it. Not a problem on a separate VM or host, but if you're trying to share with another application that uses MySQL, you'll have to install it by hand.

After a bit of flailing, I finally figured out how to add a new monitoring template, so that I could pull in the AP9340s and NetBotz devices (go to Advanced -> Monitoring Templates, had to add a new template to the / path using the + sign at the bottom left).

You then have to select each individual data point and add it to a graph. The names seem to be consistent across devices, so when I tried to change the name of a sensor on one device to reflect it's location, it switched on all of them. There may be a way to work around this, but hell if I can figure it out.

I have yet to successfully load a MIB or ZenPack. The PowerNet MIB that I've been using with my script apparently has some sort of syntax error that Zenoss refuses to load. The handy ZenPacks that Zenoss has to pull this stuff in automatically don't seem to load under Zenoss 4.

Still going to play with it a bit, but I think we might need to look elsewhere for something useful. Aside from the graphing that I've been successful in doing with the NetBotz and 9340s, I've had better luck with bash. Zenoss LOOKS really promising, and has a lot of features that might be handy, but Nagios seems much easier to configure and use for alerting, and I can't get the stuff that Zenoss does that Nagios doesn't do to work at all.

Guess we'll see.

Friday, September 28, 2012

Login Attempts from the Wild 'Net

So, I've had a box with an ssh port exposed to the Internet for a month or two, now. In that time, I've logged 15947 failed connection attempts to the server. Some of them are legitimate, me just mistyping a password (2 to my user, probably a couple oopses from me on root).

Using lastb, I can see the last failed attempts to the host.

Looking at the list of users (and number of attempts) that people tried to log in as...

3544 r00t
3065  root
42 xbox
118 www
4 webadmin
44 sysadmin
286 admin
20 Admin
20 webmaster
108 webmail
50 web
2 vpnuser
4 nobody
32 apache

 Those are some of the most interesting. The rest are random names of all ethnicities, random letters,  assorted characters, etc. There may be a few other common logins, but I didn't do a full dive through the list.

The more interesting part is that it seems that most of these connections were from about a dozen IPs.

I use fairly strong passwords, some of them are generated with mkpasswd (included when you install expect), some are generated by the password generator in KeePassX. I also don't allow ssh logins as root to that machine.

Since I started paying attention to this, I've started running fail2ban and DenyHosts. I probably don't need both, but I'll see what they turn up. I started DenyHosts first, and it scraped the existing logs and blacklisted a dozen IPs.

I was looking into using iptables to ratelimit connections, but it requires a particular flag compiled into the kernel, and I really don't feel like re-compiling a new kernel with every update, and my kernel doesn't ship with it enabled. I may have to find another method of doing this, though I believe that fail2ban and DenyHosts will help with this.

Guess the moral of the story is use strong passwords, and avoid using common login names, if possible. By using something like DenyHost or fail2ban, or sshguard, you can slow down attackers as well. I used to have ssh on a non-standard port, which might help to some degree - at least until someone scans for open ports.

I'm also considering picking up something cheap - a raspberry pi, sheevaplug, or similar, and having it act as a border guard. I can connect to it and bounce into everything else in my network.


Friday, June 15, 2012

Accessing SNMP Temperature information on a NetBotz 200 + NetBotz 150

Right, so a delivery of shiny new toys just arrived for me to play with^W^W set up and do work with.

The NetBotz 200 is the modern, new version of the AP9340 EMU. It has the ability to chain up to a dozen NetBotz 150 sensor pods off of it, effectively adding 6 additional sensor ports per pod. This uses some goofy proprietary PoE-ish type thing to connect them, over your run-of-the-mill cat 5/RJ45 cable (don't forget the included terminators on the unused ports).

I picked up a pair of the 200's and a pair of the 150's. The 200's arrived first, so I fired one up at my desk to start poking around. It's very similar to the AP9340's, and actually uses the same OIDs.

Both the AP9340 and the NetBotz 200 use the following OIDs for the temperature and humidity, as well as naming:

1.3.6.1.4.1.318.1.1.10.4.2.3.1.3.0.1  - Sensor Name
1.3.6.1.4.1.318.1.1.10.4.2.3.1.4.0.1  - Sensor Location
1.3.6.1.4.1.318.1.1.10.4.2.3.1.5.0.1  - Temperature
1.3.6.1.4.1.318.1.1.10.4.2.3.1.6.0.1  - Humidity

Like before, if you increment the last digit you'll get the different ports (.1 is for port 1, .2 is port 2, .6 is port 6, etc). Since the NetBotz 150 pods are dumb units, and just add additional port capacity to the 200, my first inclination was that port 1 on the 150 would show up as port 7, giving it an OID of
1.3.6.1.4.1.318.1.1.10.4.2.3.1.4.0.7
This turned out to not be the case. A simple snmpwalk against  1.3.6.1.4.1.318.1.1.10.4.2.3.1
turned up that the second to last digit is significant for the device that the sensor is connected to. Thus, for the first 150 in the chain, you use the following OIDs:

1.3.6.1.4.1.318.1.1.10.4.2.3.1.3.1.1  - Sensor Name
1.3.6.1.4.1.318.1.1.10.4.2.3.1.4.1.1  - Sensor Location
1.3.6.1.4.1.318.1.1.10.4.2.3.1.5.1.1  - Temperature
1.3.6.1.4.1.318.1.1.10.4.2.3.1.6.1.1  - Humidity

My assumption is that the first port on the second 150 in the chain would be .2.1, third 150 would be .3.1, etc.

Now, to get these online in the DC, and to get a more serious way of monitoring them and the AP8941 PDUs than a cron job and a mad script.


Monday, June 11, 2012

Temperature Monitoring with APC

In an effort to identify trouble areas of our datacenter, and to gather enough data to help justify some equipment to manage airflow in the server rows, I've begun rolling out temperature and humidity sensors throughout. One of our previous admins had partially deployed APC AP9340 EMUs, which can have 6 probes connected. These devices, however, were either never fully set up, or the set up was torn down prior to me getting my grubby little mitts on them. Additionally, they appear to be from 2008 - they do everything I want/need, but there are newer models out there that do more than I want/need (must have!)

The eventual goal is to have temperature/humidity probes at several locations in the aisles, and then temperature only probes connected to some of the rack-mounted PDUs. I plan on using some sort of software - whether APC's StruxureWare Central, Zenoss, or something similar, to log and graph the ups and downs in the lab.

As it stands, that goal is coming along, but still distant. I have a trio of the AP9340s currently, but have recently ordered a pair of the new shiny NetBotz 200 EMUs, which each will have a NetBotz 150 sensor pod daisy chained off of it. Between the 7 devices, I'll have sufficient capacity to get a pretty broad picture of the DC. On top of this, our current standard rackmount PDU is the APC AP8941, which has the capability of taking a temperature probe. Each one of these will have a temp probe connected. Since I care more about inlet temperature than the hot side of a rack, these probes will be dropped into the lower part of the front of the rack. If the rack contains a second PDU, there I'll monitor the hot side. As I said above, I have all of the parts to do this either in house or inbound.

The tricky part is the actual monitoring. If I had unlimited budget, I'd roll out Struxureware Central and be done with it. Since I'm working with...about nothing, I have to be more creative. The first AP9340 that I deployed, I was graphing with Cacti. This worked reasonably well, but when I connected the second EMU, it refused to graph, and then the first one followed suit. No idea why. Cacti uses SNMP to pull the data from the EMUs (templates for the devices found here), and an snmpwalk would gather that data just fine.

Until I can roll out Zenoss or something else, I've created a sort of "field expedient" bit of monitoring, using cron and snmpwalk.

The AP9340s were the easier of the devices to get working with this. The little command that I run to get the data from these is as follows:

for i in `seq 1 6`; do echo -n 'Temp in ' && echo -n `snmpwalk -v1 -c public 172.16.1.1 1.3.6.1.4.1.318.1.1.10.4.2.3.1.4.0.$i | sed -e 's/.*STRING: \"\(.*\)\"/\1/'` && echo -n ": "; snmpwalk -v1 -c public 172.16.1.1 1.3.6.1.4.1.318.1.1.10.4.2.3.1.5.0.$i | sed -e 's/.*INTEGER: \([0-9]*\)/\1/'; done

This prints out something like the following:
Temp in Aisle 3, Cold 1: 65
Temp in Aisle 3, Hot 1: 79
Temp in Aisle 3, Cold 2: 67
Temp in Aisle 3, Hot 2: 77
Temp in Aisle 3, Cold 3: 63
Temp in Aisle 3, Hot 3: 72







The location name, as well as the temperatures, are pulled from SNMP.

To break that line down, we're doing a for loop, looping from 1 through 6.

for i in `seq 1 6`;

For each iteration through the loop, we echo (with the -n to strip the trailing newline) "Temp in ", and then execute the snmpwalk to gather the location of the probe.

snmpwalk -v1 -c public 172.16.1.1 1.3.6.1.4.1.318.1.1.10.4.2.3.1.4.0.$i

the $i is 1-6, depending on our current location in our for loop. You'll notice the echo -n and the ` before the first snmpwalk. We're actually executing this command inside it's associated echo -n, to strip the newline from the end of the string to make the output pretty. We also have the subsequent sed command in this same echo string. We pipe the snmpwalk output to the following:

sed -e 's/.*STRING: \"\(.*\)\"/\1/'

This looks for the word STRING: in the line, and then matches the parts between the subsequent double quotes into a regex variable. Since the contents of the double quotes is the location, and the only part we care about, we throw out everything else and replace it with just the location (marked as \1 on the replace side of the regex)

After we close out the echo, we pretty much repeat the process for the actual temperature, which is the second snmpwalk and sed. Here we're just looking for INTEGER, stripping out the number that follows it, and printing it.

By replacing the OID in the second snmpwalk, we can get this to pull the humidity data off of a temperature/humidity probe.

The OIDs I use are as follows:

1.3.6.1.4.1.318.1.1.10.4.2.3.1.4.0.1  LOCATION of probe
1.3.6.1.4.1.318.1.1.10.4.2.3.1.5.0.1 TEMPERATURE the probe is indicating
1.3.6.1.4.1.318.1.1.10.4.2.3.1.6.0.1 HUMIDITY the probe is indicating

The last .1 on there becomes .2 for port 2, .3 for port 3, etc. I believe the name of the probe is around that vicinity, maybe the last three are 3.0.1, I can't remember.

For pulling data off of the AP8941 PDUs, it's a bit more complicated. First off, you need the SNMP MIB for APC devices, found here (look for Firmware Upgrades: MIB, free compulsory upgrade required, try using bugmenot). Then, we need an even more magic and crazy bit of CLI-fu.

echo -n "Temp in " && snmpwalk -m ~/powernet403.mib -v1 -c public 172.16.1.2 .1.3.6.1.4.1.318.1.1.26.2 | grep rPDU2IdentLocation.1 | tr -d '\n'| sed -e 's/.*STRING: \"\(.*\)\"/\1/' && echo -n ": "; snmpwalk -m ~/powernet403.mib -v1 -c public 172.16.1.2 .1.3.6.1.4.1.318.1.1.26.10 | grep rPDU2SensorTempHumidityStatusTempF.1 | sed -e 's/.*INTEGER: \([0-9]*\)/scale=1;\1 \/ 10/' | bc

This starts out much like the previous one, though we have to call the MIB from snmpwalk. The first deviation is with the grep - the snmpwalks to pull the location is kind of a shotgun blast, you get a ton of data, and only want that one line. We use the tr to cut the newline off of the end of the output. We then do a second snmpwalk to get the temperature, which is where it gets weird. The AP8941's output the temperature via SNMP as a three digit integer - first two digits are the temperature to the left of the decimal, the third digit is the tenths place of the temperature. Rather than use a floating point, it's just stored in a beefed up int. Because of this, we have to divide the resulting data by 10 to get the real temperature.

We pull the integer temperature with this:

snmpwalk -m ~/powernet403.mib -v1 -c public 172.16.1.2 .1.3.6.1.4.1.318.1.1.26.10 | grep rPDU2SensorTempHumidityStatusTempF.1

We then pass it through sed:

sed -e 's/.*INTEGER: \([0-9]*\)/scale=1;\1 \/ 10/'

This matches the integer out of the temperature string, and then on the replace side wraps the integer in some additional text that bc understands (basically, give me 1 place past the decimal, then divide my int temp by 10). The last bit is piped through bc to get us the true floating point temperature.

Output looks like this:
 Temp in Aisle 2, Rack 4 Front: 62.3

I'm sure if you connected a humidity probe to the PDU you could grab that info, but I'm only connecting temperature probes there, so I haven't gone seeking the OID.

Currently, each device I'm pulling info off of has its own line in a script. There's a better way to do this, I just haven't bothered. Cron executes this once an hour and emails it to a few people.

Hopefully I'll have more on this, soon. I need to dig up the relevant OIDs for the NetBotz devices, I'm hoping these are the same as either the AP9340s or AP8941s. I also need to find a handy place to deploy Zenoss, so that I can get some graphs and logging going.