(I don't care about the back story. Take me to the configuration section)
Update June 29, 2004: I had too many problems with active sites being reported as 'down' by nsvhr, so I've installed Pound as my reverse proxy. It's very nice.
I'm currently running a number of OpenACS based sites,
Integrated Badgertronics,
Borkware,
The Loudoun Symphony, and
others off of one IP address on a machine hosted by the fine folks
at Acorn Hosting. I have one IP
address at my disposal, so I run my sites as independent AOLserver
instances running as back-ends, being fed from an AOLserver running
nsvhr
on the front end.
I quickly dismissed the idea of running everyone in one OpenACS instance. I would have collisions in page names, plus I've heard of problems with using subsites in this manner. I didn't particularly want to learn and debug all the subsite code, especially given the pressure of getting the sites back up and running. Also, the user communities and audiences amongst the different sites are very different, so it didn't make sense to lump them all together.
Host:
header to
decide which one of multiple back-end servers should handle the
request. (the Host:
header contains the hostname part of
the request you see in your browser. for
http://borkware.com
, there would be a Host:
borkware.com
header in the HTTP request)
nsvhr
(which stands for NS Virtual Hosting, and
uh, R-something) is the AOLserver module that looks at the
Host:
header and makes the
decision which back-end to use. nsvhr
can communicate with the
back-ends in one of two ways One way is by using unix domain sockets (via
the nsunix
module), which are pretty neat. It passes a file
descriptor from one process to the other: the back-end gets the file
descriptor of the network connection and then writes the resulting
data through it. You can also use TCP sockets which use standard
networking calls to move data back and forth. Unix sockets are a more
efficient transport. I use the TCP sockets, so my setup looks
something like this:
nsvhr
+ nsunix
sockets, but there were a
couple of problems. The first is the front-end would go numb and stop
accepting requests. I'd check my sites in the morning and find them
unresponsive. Restarting the front-end would make everyone come alive
again. I eventually setup the arsDigita keepalive
to restart the front end if it would become unresponsive, but that
Just Felt Wrong having to do that.
Even worse, the back-ends would spaz out occasionally, going into tight loops reading from the unix socket. The server would still handle requests, but some threads would be stuck in loops, maxing out CPU usage. This is decidedly anti-social behavior for a shared server, and I didn't want to get kicked off the machine. So TCP sockets was the next thing to try.
front-end.tcl
. The first is increasing the
socktimeout
for the nssock
module. This
fixes a problem where folks uploading big files via HTTP POST would
get "Invalid HTTP Request" errors:
ns_section ns/server/${server}/module/nssock ns_param socktimeout 240 ...Add
nsvhr.so
to your modules:
ns_section ns/server/${server}/modules ns_param nsvhr ${bindir}/nsvhr.soand configure it:
ns_section "ns/server/${servername}/module/nsvhr" ns_param Method "GET" ;# methods allowed to proxy (can have > 1) ns_param Method "POST" ns_param Method "HEAD" ns_param Timeout 600 ;# timeout waiting for back-endI've got a 10 minute timeout waiting for the back-end for supporting large file uploads. I'm not sure if it's 100% necessary, but I haven't seen any bad behavior by having a large timeout there.
And then you give it the hosts to proxy in the ns/server/server-name/module/nsvhr/maps
section.:
# hosts to proxy ns_param "loudounsymphony.org" "http://loudounsymphony.org:8006" ns_param "loudounsymphony.org:80" "http://loudounsymphony.org:8006" ns_param "www.loudounsymphony.org" "http://loudounsymphony.org:8006" ns_param "www.loudounsymphony.org:80" "http://loudounsymphony.org:8006" ns_param "borkware.com" "http://borkware.com:8007" ns_param "borkware.com:80" "http://borkware.com:8007" ns_param "www.borkware.com" "http://borkware.com:8007" ns_param "www.borkware.com:80" "http://borkware.com:8007" ns_param "badgertronics.com" "http://badgertronics.com:8008" ns_param "badgertronics.com:80" "http://badgertronics.com:8008" ns_param "www.badgertronics.com" "http://badgertronics.com:8008" ns_param "www.badgertronics.com:80" "http://badgertronics.com:8008"These settings tell nsvhr how to map incoming requests to back-end requests.
borkware.tcl
configuration file:
set httpport 8007 set hostname borkware.com set address 207.142.4.59 ns_section ns/server/${server}/module/nssock ns_param address $address ns_param hostname $hostname ns_param port $httpport ns_param socktimeout 240
Due to the way nsvhr
works, all of the back-ends were
seeing the IP address of the front end as the request IP. So my
server logs had all the request IPs the same, and it looked like some
loser at 59.acornhosting.net
was hammering my site. (no
wait, that's me).
Luckily I'm not afraid to dig into the AOLserver source and figure
things out. (it's actually very beautiful code.) I added a header to
the request between the front-end and the back-end.
x-bork-ip:
has the IP address of the true originator.
TCPProxy()
in nsvhr.c
and
Ns_ConnPeer()
in nsd/conn.c
each needed a
bit of code to support that.
The binary file upload problem was due to SockWrite()
in
nsvhr.c
. It was doing a strlen()
on the
data going to the back end. If there happened to be a zero byte in
the stream (like when uploading photos), the data would be truncated.
The back-end would be sitting waiting for more data to come in, and
eventually would fail with a "Error writing content: resource
temporarily unavailable" error in ns_conncptofp
.
Explicitly passing in the length of the string to write fixes the
problem (and also saves the CPU time of spinning over the string with
strlen()
).
Here are some patches for nsvhr.c
and
nsd/conn.c
I haven't done a lot of the spit and polish yet, like handling page not found errors, and fixing things when the back-end doesn't respond. I haven't had any problems with the back-end locking up, so I haven't been too worried. I also haven't done anything about SSL / https yet. I don't think SSL can be reverse-proxied like this. My plans currently only include one site that needs SSL, and it's not ready for prime time yet, so for me it's not too much of the issue.
SockWrite()
in socktimeout
parameter
for nssock, since 30 seconds wasn't long enough for big amounts of
data to make the trip from computer to DSL modem to front-end to back-end.
A higher timeout (240 seconds) fixed that.
His next attempt at uploading stuff gave the "Error writing content:
resource temporarily unavailable" error in ns_conncptofp
,
after about 4 minutes of uploading. WTF was going on?"
So in trying it myself, the upload of an image would just hang. I didn't know if it was due to the new satellite internet system we have which replaced a very bad ISP (avoid Alltel for any of your business dealings), or some weird Mac+Mozilla issue.
I wanted to see what traffic was happening on each of the servers (the
front-end and the badgertronics back-end). Was anyone receiving any
data? Only part of it? Was some stuck in the queue on my end? After
a bit of poking around in the code I discovered that
Ns_SockRecv()
in nsd/sock.c
is the call that
does reading on the sockets. I stuck in this right before the return
{ int i; int blah; unsigned char saved; blah = (nread == toread) ? nread - 1 : nread; saved = ((unsigned char*)buf)[blah]; ((unsigned char*)buf)[blah] = 0; for (i = 0; i < nread; i++) { Ns_Log (Notice, "TRACE: %d: %x : %c", i, ((char*)buf)[i], ((unsigned char*)buf)[i]); } Ns_Log (Notice, "TRACE2: %s", (unsigned char*)buf); ((unsigned char*)buf)[blah] = saved; }Which prints out each byte, as well as the whole block read. I stuck in a zero byte to terminate the printing, making sure to restore it when I was done.
The front-end was getting all of the data, so it wasn't sitting in
some modem buffer. The back-end was getting all the header data, but
none of the image data. It was stopping right after the MIME header
in the HTTP POST for the image which declared the rest of the data to
be a GIF image. I wasn't sure why it would be doing that. Maybe
something was looking at the data and only reading header information.
Looking closer at the output there was a zero byte there. How
suspicious. I used emacs to change all of the zero bytes to 1 bytes
in a gif file and uploaded that. Almost instantly I got a "this is a
corrupt image" from the identify
program, so it
definitely it wasn't connectivity issues on my end. The zero byte is
the culprit. That means someone is assuming the data is pure text and
nothing binary. There's probably a call to strlen()
somewhere in the process.
Working backwards from ns_conncptofp
, everything looked
like it was doing the Right Thing on the back-end, everyone passing
lengths of buffers around. So the back-end was doing everything
right. Looking at the nsvhr
code, I found
SockWrite()
doing a strlen()
, and that gets
called in the TCP path of the virtual hosting. The call to
strlen()
was unnecessary since we could easily get the
length of the data to be sent to the back end. Explicitly passing in
a length fixed the problem, plus made it a hair more efficient, since
strlen()
is an O(N)
operation, having to
walk over every byte in the string.
So why the "Error writing content: resource temporarily unavailable"
error in ns_conncptofp
? That particular error is coming
from NsTclWriteContentCmd
, and the actual error is
happening on the reading from the socket, rather than the
writing to the file, so the error is misleading. The "resource
temporarily unavailable" is simply the EAGAIN
errno
, meaning something timed out, try again. In this
case, the timeout is waiting for more data from the front-end, which
isn't coming since it stopped at the zero byte. The timeout happens,
EAGAIN
is returned in errno
And why was Kevin able to upload photos previously? He did those
while I was still struggling with
nsunix
as the transport
mechanism between the front-end and back-end. Recently was the first
time he's uploaded photos since I made the switchover.