• 29 December 2013 10:50:35

    End of year network review

    posted by Dark0ne Site News
    At the start of the year I outlined my plans for 2013 with our focus this year on improving the stability and reliability of the service on offer. Now, quite clearly, this hasn't exactly gone to plan. We've hit snag after snag along the way that has caused massive delays, dampened spirits and made for a rather sorry year for us. We're having our most fun when we're coding exciting new features and updates to the sites that you can actually use. We do not have fun working on behind the scenes stuff and it's made worse when it doesn't actually work out, but it really needs to be done.

    We started the year with a hefty sum of money to invest that had been saved up from before Skyrim was launched, primarily from Premium Membership fees. I'd been saving this money over the course of a few years so we could roll out a whole new server setup that would not only provide us with the power we needed but also the quick expandability necessary to deal with the ever increasing traffic and load placed on the servers. This was most definitely the right course of action and I have no regrets on that decision at all. Look at it like you would the gears on your car. In second gear you can only reach a certain speed before you start redlining, you can't go any faster without switching to another gear. Problem is, when you've only got a 2-speed gearbox with no potential to go to third gear the only thing you can do is rip out the old one and install a new 7-speed gearbox. And that's what we did. We installed our new gear box, switched into third gear and opened the way to fourth, fifth, sixth and seven gears in case we need it in the future. What we didn't take into account was just how hard it would be to switch gears and how many unforeseen circumstances you can actually run in to.

    We ordered the database cluster servers at the beginning of February and received them at the beginning of March. We initially thought it would only be a matter of weeks to get things all set up and have the sites moved over to the new database cluster. 3 month's later, at the end of May, we thought we had it but after a straight 48 hours of work we couldn't get it to work properly and had to accept defeat. At which point we paid a rather hefty sum of money for two separate professional consultants to come in and take a look at the setup. It wasn't until the end of July that we finally managed to get our database cluster setup completed and moved to, 5 months after we'd originally started working on a project we thought would take 2-3 weeks. In September I proceeded to buy a further 2 cluster servers, taking the setup to 5 servers, each running 96GB of RAM and dual processors for a combination of 480GB of RAM and 80 CPU cores running at 2.1Ghz (or a hypothetical 160 CPU cores with threading).

    The move to our database cluster helped to remove one issue only to highlight another major issue that needed to be rectified. While we could now serve the database requests we were struggling with a bottleneck with the HTTP requests. Thankfully we'd already earmarked funds for a move to a Cloud setup that would form the basis of our expandability in to the future. Towards the end of July I was in discussions with our server provider to have a Cloud setup specially requisitioned for us. They'd need to build it from the ground up for us so we weren't actually given the keys until the middle of October. All the while the sites on the old setup were on their last legs. Well, they weren't really standing up so much as they were spluttering blood all over the place with multiple puncture wounds in this analogy. Once we got the keys we needed to run extensive tests and mock runs and get it ready for an eventual move over, which didn't occur until the start of December to coincide with our centralisation of the sites. By centralising the sites we managed to make our future jobs a lot easier. No longer were we dealing with 20 different databases for 20 different sites on multiple separate servers, we were now dealing with 1 database with 1 site on a distributed, but for all intents and purposes centralised, server setup. This simplifies things drastically with the down-side being that the problems that were originally localized mainly to just Skyrim Nexus were now problems for all the sites,

    Traffic is at an all time high, as usual...

    If we were still on the old setup then you wouldn't be reading this right now, even if it did take you 5 page reloads and 20 minutes to reach this page. The old setup would simply have been incapable of handling this Christmas/Steam sale traffic. So you can picture this situation like a dramatic Indiana Jones scene if you wish, there's Harrison Ford (the Nexus sites in this analogy), stood on a crumbling platform about to collapse at any second. With a crack of his whip he hooks on to a low dangling tree branch and swings on to a new platform. Albeit this platform still doesn't seem too stable but it's a darn sight better than the previous platform he was stood on which has now fallen into a seemingly bottomless chasm. And this is where we find ourselves at the end of 2013, stood on a precarious platform right now, but the right platform, from which we need to build from and solidify our position.

    The situation is extremely infuriating for us. I've said it before and I'll say it again, we know it's frustrating for you when you can't download your mods or update your file pages but my god, try working 18-20 hour days trying to sort it out in the background all the while trying to answer people's ferocious questions and trying to remain calm. I can't do it and I blow my lid sometimes. It is infuriating to have spent this much money and time on an issue that still isn't resolved. But it will be. It's not a matter of if, it's a matter of when.

    As users you get understandably upset. Lets do a quick Q&A on the regularly asked questions on this topic:

    Q: Isn't it about time you spent more money to fix this issue?

    We've spent close to £150,000 ($250,000) on sorting out our stability issues this year. Aside from the fact it's no longer about the amount of money that needs to be spent but what we need to do with what's already been bought, do you have a spare extra £150,000 lying around that we can use? Please? No? Didn't think so (if you do, by all means get in contact. I can offer you a bag of Nik Naks and half of a Panettone as payment).

    Q: It was way better before. Why not just go back?

    If you're talking from a Skyrim perspective: It wasn't. Your memory has failed you.

    If you're talking from any other Nexus site: it probably was, although it wouldn't be right now in the midst of all the Christmas traffic. You've been lumped in with the same lot as Skyrim now. Your fate is linked to Skyrim's as much as Skyrim's is to yours. It's a necessary temporary down-side in the interest of future stability.

    Q: Why do you keep performing "maintenance" at peak times?

    We put the sites into "maintenance mode" to give the servers time to catch their breath and recover. If we put the sites into maintenance mode they'll recover 90% of the time within 1 minute. If we don't then the problem will remain the same, or get worse, for hours.

    Q: Why don't you work on fixing the issues rather than releasing new NMM versions/updating the sites/eating/sleeping/leaving the house?

    I think by the 20,000+ words written on this very topic this year alone, and indeed the regular maintenance updates and redirects, you can infact see that work is being done. However of the 5 people (including myself) working on the sites only 2 of us can help in this field.

    Telling us to not release an NMM update until the site stability issues are rectified would be like telling a UI designer at Microsoft to not release any fixes or updates to the Windows UI until they've sorted out all their security bugs. The UI designer doesn't work on that aspect of Windows, he doesn't know anything about it and he can't be roped in to help. Telling him to stop working until the bugs on a completely different aspect of the program are sorted would be dumb. And a waste of money.

    I'd like to have another full-time person on board to be able to handle this side of things but there's a few barriers in this regard. Money. The time it would take to hire the person. The time it would take to bring that person up to speed. All negatives for issues we need resolved right now, and not in 3-6 months time.

    Q: I haven't been able to login to NMM since you updated the sites at the beginning of December, what gives?

    So you found the forums and the "new topic" or "reply" buttons but you didn't notice the multiple news posts and indeed the 1,000 other threads and posts about this issue raised in the past week? Sometimes you really need to help yourself. Or do a search! You need to update to version 0.46.0 of NMM. Download it from the site and simply install it over your current NMM installation, making sure your folder locations are the same as your old NMM install. If you do this you won't lose any of your mods.

    Having said that we are going to make documenting things in NMM a bit better over the next year.

    So what has 2014 got in store? Well hopefully we can resolve these issues once and for all very soon and get back to doing what we enjoy doing. I don't think I need to say anything more on that topic.

    At the start of the new year we're going to be decommissioning all our current file servers as they're close to their space capacity, replacing the current 19 file servers: 15 for everyone and 4 for Premium Members with 23 brand new ones, 20 for everyone and 3 for Premium Members. Why a drop for Premium Members? A couple of reasons. We've bought Premium servers in various locations across the globe to act as alternatives to the normal servers. These haven't worked out as well as I'd hoped and they're often unreliable for people. I'll be replacing them with some top quality stuff in Dallas, Washington and Amsterdam that will cost twice as much. I want Premium Members to realise that the real bonus on the downloading front isn't the Premium-only servers but instead is the speed cap being removed and the ability to multi-thread your downloads in NMM. Some Premium Members come undone because they will only use the Premium-only servers when the normal servers are equally or often times better suited for their connection.

    We haven't touched the Image Share section in quite some time. I want to overhaul many aspects of the Image Share section and implement some updated features like galleries, and removing those horrible image pages some users make that scroll on and on into infinity as the author has added their entire back catalogue in to the image description. We'll replace it with a more suitable system that hopefully won't diminish what some people try to do with their descriptions but instead simply make it a lot more aesthetically pleasing and manageable for users. I'll begin consulting with certain users from the Image Share soon, likely to be ones that cause as little drama as possible. No need to get in contact with me, I'll get in contact with you if I'm interested. Image Share drama is the worst drama on the Nexus and I try to steer well clear of it as much as possible.

    I also want to begin work on a Videos section, much like the Image Share section but for YouTubers to showcase their work on the Nexus. I've no interest in hosting people's videos so this system will link straight into YouTube's API system. I think it'd be great to have all the top Skyrim mod video authors, for example, having their own channels on the Nexus from which they can easily showcase their work and the work of the mod authors and you can easily keep up to date with all of them. It won't take away from their subscriber base, it will simply augment it, and best of all for us it won't cost us any valuable bandwidth or server resources.

    On the NMM front we're still working on the 0.50 update with it's profiling features. We want to get it right, and we also want to create a full backup system for users so that they can revert to a legacy system if needed. We've managed to gather valuable feedback from the people using the 0.50 alpha but what we cannot ascertain is how successful it's been for users as a percentage. We don't know how many people it hasn't worked for, e.g. how many people couldn't update to 0.50. We want to create a version of NMM that we'll likely dub NMM Legacy. This will be a version of NMM that we shall feature-freeze at 0.46.0 for ever but that we'll always ensure is able to access the NMM web services (for logging in/downloading/mod version checking). It's our hope that each time we release a major update to NMM we'll ask you if you want to backup your current copy of NMM. If the update fails for any reason then your backup will work with NMM Legacy, so you won't have lost anything.

    We've also got that NMM design overhaul to look in to. I'll be writing up a blog post about NMM soon as it seems some people are upset we're still in Beta after a couple of years. I'll set the record straight on that one because we're not coming out of Beta any time soon.

    I've absolutely no idea how long the stabilization of our services is going to take. I'm not even going to fathom a guess because each time I do, each time I assume we're close, we hit another snag. We've now got a huge backlog of stuff to do after the centralisation, however, so I hope we can get some updates out thick and fast soon.

    I hope your Christmas, New Years and indeed your entire year has not been as stressful (or expensive) as ours has. I hope that 2014 brings us all more success.

Comments (316)