• 28 July 2013 19:21:14

    Cluster move complete, a few bugs to fix

    posted by Dark0ne Game News
    We successfully completed the move of our databases over to a new centralised cluster yesterday with the final site, Fallout 3 Nexus, moved in the afternoon. Moving the databases off the same servers that actually serve the site’s pages has freed up resources on the servers that can go towards handling more concurrent connections and as such I’m hoping you’re finding the browsing experience on the sites this weekend, especially on Skyrim Nexus, are much more satisfactory.

    There are currently a few known bugs on the sites that we’ll be working on sorting over the next week. You do not need to report these errors to us as we already know about them:

    • Problems with people either being unable to login or being unable to stay logged in through page loads. Generally what happens is you enter your username and password, click the submit button, the main page loads but you’re still not logged in. No error messages are shown. We’re not sure why this isn’t working for some people while it’s working for the majority but we’ll look in to fixing that first.
    • People have reported the file servers sometimes report themselves as being overloaded or “struggling” despite other people downloading the same file perfectly fine. Assumed that any NMM downloading errors are of a similar ilk for the same people (e.g. if manually downloading fails so does downloading through NMM).
    • Issues with the Image Share not posting images and linking to the correct forum topics.

    If you’re experiencing a problem that’s not on that list then by all means report it.

    While I’m very pleased that the cluster has worked out well I’m still not happy. I want to be in a position where there’s not one single point of failure for the sites so if a server goes down the sites still run perfectly. I also want to be in a position where increasing capacity is as simple as buying new servers. The old system wasn’t that simple, and I want a system that is that simple. Right now we’re serving about 700 concurrent connections/second to the sites (not the file servers, they do loads more than that) and we’re hitting that limit regularly. I want to be able to serve twice that much. We’ve spent over £33,000 ($50,000) on the database cluster and I’ve now outlayed a further £18,000 ($28,000) on several static content servers that will serve all the images and static pages and a further £40,000 ($61,000) on moving all our page serving in to a specially developed cloud server system utilising 48 CPU cores and 192GB or RAM across 6 virtual servers, load balanced by two extra virtual servers to ensure redundancy. On top of that we’ve got to continue paying the billls on the old servers until everything is moved over, so we’re essentially paying double the amount it normally costs to keep things going from a hardware standpoint. Don’t get me started on bandwidth or the file servers where we’ve got another 3 file servers on order at the moment. When you wonder where the Premium Membership money goes; this is where it goes.

    I’m happy with how things have gone so far and I’m looking forward to a point where we can stop dealing with making the sites run properly and get on with making the sites better and expanding the network. Frankly, if I were to have another year like this current one I’d seriously consider throwing in the towel. So fingers very much crossed.

Comments (231)