The NDC (Network Data Collector) project at GTE Labs was a small team of people working on software for monitoring and managing the growing complexity of the cellular telephone system. At the heart of this project is software that maintains a set of databases (Sybase and Informix) on a set of machines scattered around the world. A collections of tools has been built for collecting assorted interesting data from several systems, and displaying it on a manager's workstation.

My part of the project was twofold: I supplied an SNMP agent that delivered the project's data, so that various vendors' SNMP-based management tools could be used. I also wrote a set of GUI tools for doing some of the management.

The SNMP agent was a clone of the public CMU agent, modified in previous projects to have an extensive logging and debugging capability. We went through several passes of defining MIB tables for various data in the databases, and developing tools that display (or sometimes modify) the data. An interesting aspect of this job is that the project team consisted mostly of people with SQL background. I spent some time getting across to them that SNMP can't do everything that SQL can. In particular, SNMP can't use character strings as instances. This resulted in several redesigns of software written to use SQL.

An especially interesting part of the job was the opportunity to use tck/tk to build some "real" management tools. This included such things as a window that showed the status of a set of NDC-related processes on a managed system. This is different from the usual process table, which usually only shows running processes. In this case, we needed to show the status of a process regardless of whether it was running. The general technique was to provide a "startup" script whose command-line args told it what to run. It runs the program as a subprocess, and notes various pieces of information about the program. This information is put into files in the SNMP agent's directory, so that the agent can report the data. Process information included variables such as:

procName
The process's name
procCommand
The shell command used to start the process.
procStartTime
The time (in ISO format) that the process was last started.
procStopTime
The time (in ISO format) that the process last exited.
procOperStatus
Whether the process is alive, and if not, how it last died.
procAdminStatus
Used to change the process's status.
procLogFile
The process's logfile, if it has one.
procMessage
A message string that the process can write.
Part of the tcl-based GUI tools was a Process script that shows the process table on a managed system. The procAdminStatus variable was used to implement remote start and kill buttons for these processes.

Part of the job was testing IBM's NetView package as a management tool. Unfortunately, like pretty much everyone else who has tried to deal with NetView, I found it to be a rather disappointing management tool. It is rather complicated for what it actually does, requiring long training, and often giving quite inaccurate results. I don't recomment NetView for this sort of management application.

One of the tools was a network map program whose display is somewhat similar to netView's (and to most other network-management tools, for that matter). While able to use SNMP to do the usual sort of status display for nodes, it also used the NDC MIB to give more information. One of its capabilities was to accept SNMP TRAP messages, decode their content, and do such things as change color or display a blinking border for nodes. Thus the failure of a process would trigger a trap to all of the node's managers, and the node's icon would start blinking.

Such blinking is a very coarse warning, of course, saying little more that "Something's wrong over there". One of the menu items for a node is a "What's wrong?" item. It pops up a window that queries all the status variables in the NCD MIB, and displays a list of all the problems known on the machine.