This is a list of outstanding bugs in Alpha Server software.

cscp

Bug: cscp may not be able to copy a remote file back to the same server.

A problem has occasionally appeared when trying to use cscp to remotely copy a file using the same server for both source and target files. For example:

This seems to run into problems due to the overlapping use of the same struct to talk to "both" remote servers.

Workaround: If the remote system has two interfaces, you can use both of them to trick cscp into thinking it's talking to two different systems. Otherwise, all you can do is copy the file to some other machine and back.

Fix: At present, our scenarios don't include doing this, so it's a low priority problem. It should get fixed eventually ...


Bug: TCP's 40960-byte problem.

There's a problem with writing large blocks of data to a TCP connection in a single write() call: If you write more than 40960 bytes, the reader may not wake up from its select() call. Furthermore, the reader may not be able to read the very last byte in the stream until the writer sends more data.

You can test this by having the sender write 40961 bytes while the receiver is in select(), with a timeout specified. The receiver may not wake up right away. When the timeout occurs, the receiver will be told that there's data in its input. If the receiver tries to read more than 40960 bytes, it will get only 40960, and a further read of 1 bytes will hang or get EWOULDBLOCK. If the sender then writes one more byte (a newline, for example), the receiver will be able to get the 40961st byte, and may or may not get the extra byte. This will happen for any block size > 40960 bytes.

Fix: The CS_TCP_MAX environment variable tells the CS software how much data can be sent in a single TCP write. Don't set this to more than 40960.


Bug: UDP's 9216-byte problem.

As Digital-Unix (OSF1) comes out of the box, there is a limit of 9216 bytes to a single UDP message. Anything larger is rejected by the kernel with an EMSGSIZE errno code.

This number can be increased by tweaking some kernel parameters. Do the following, as super-user, on all system that are running cscp or csfd:

	% dbx -k /vmunix /dev/mem
	(dbx) print udp_sendspace, udp_recvspace
	(dbx) assign udp_sendspace = 65507
	(dbx) assign udp_recvspace = 65507
	(dbx) print tcp_sendspace, tcp_recvspace
	(dbx) q
If the first print command shows values of 65507, then this has already been done. What this does is increase the maximum UDP packet size to 65507 bytes.

You might ask "Why 65507? Isn't that a rather bizarre number?" Yes, it is. Experiments have determined that this is the largest UDP packet that OSF1 will actually accept and deliver. You can set the variables to larger values, but if you try to send a UDP message larger than 65507 bytes, it will not be delivered, and your program may hang for long periods when it tries to write larger packets.

This strange number actually comes from the fact that the UDP packet size field is 16 bits, and part of the packet is the UDP header which may be 28 bytes. Thus 2^16-1-28 = 65507 is the largest data size that UDP can handle. You might well say "But UDP includes fragmenting large packets into smaller, so why can't I write a larger packet and let the UDP layer fragment it?" Good question. I don't have a good answer. Considering that any packet larger than the interface's MTU must be fragmented locally, and the size params to write() and sendto() are ints, there seems to be no obvious reason not to accept 4-GB buffers and fragment them into MTU-size packets. Perhaps in some future release.

In any case, in order for the Alpha Server to use this larger packet size, you must also override its default packet size. Both cscp and csfd must have their CS_UDPMAX environment variable set to 65507:

	% setenv CS_UDPMAX 65507

This sounds like it should be something important, but it isn't, actually. The 7-times larger packet size makes for only a tiny increase in throughput. The probable reason is that the underlying hardware can't send packets of even the smaller size. Ethernets have a 1500-byte packet size, and FDDI has a 4352-byte packet size. So in any case, UDP messages will be fragmented at the physical link level and reassembled on the other size. The only real benefit to a larger UDP message size is that it decreases the number of system calls needed to send a large file, and thus uses slightly less CPU time on both ends. But since file copies are I/O bound, this doesn't make much difference.


Asymmetry in file-copy times.

With some pairs of machines, there is a significant difference in the time to "pull" files (csfd -> cscp) and to "push" them (cscp -> csfd) using the udp protocol. There is also a difference with tcp, but it is much smaller. I've been studying this problem, with only limited success in finding evidence.

When it happens, running at debug level 4 shows evidence of large numbers of packet losses when cscp sends a burst of packets to csfd's port. When sending the other direction, cscp reports receiving the entire burst of packets, typically in the original order, and in less than one second. But csfd doesn't receive them all. The sending and receiving is done by the same routines in both cases, and the logs show the same number and sizes of packets.

Something that may be significant: Several people have reported that, when this slow copy occurs, other applications also slow down dramatically, and there is a lot of disk activity. However, running top or

doesn't show any evidence of swapping or paging activity.

Long months of testing have turned up strong evidence that swapping (or some other form of memory contention) is the culprit here. The main evidence is that the inner data-copy loops in cscp and csfd have been modified to invoke the cspipe program as a subprocess. This is just the data-copy loop run as a separate process. When this is done, the copies don't seem to slow down as when cscp and csfd do the copy. Since the code is the same (a simple read/write loop), the conclusion is that the larger sizes of the cscp and csfd processes are the explanation.