Tuesday, February 23, 2010

Installing ejabberd on Ubuntu

Recently I've installed ejabberd server on Ubuntu box. Thanks to this nice document, the process was pretty straightforward. My experience was little bit different from the author's one, so I want to show here exact steps I did to make it work, maybe it will be helpful for you too.

The first step is to install the required package. You can use Synaptic Package Manager or just command line:

sudo apt-get install ejabberd

During the installation a new user, ejabberd, will be created in the system. This is the user the server will be running on. When installation is finished ejabberd server is started. To configure the server you need to stop it

sudo /etc/init.d/ejabberd stop

Next step is to configure administrator and hosts. Open /etc/ejabberd/ejabberd.cfg file for edit and make the following change

%% Admin user
{acl, admin, {user, "andrey", "jabber.ndpar.com"}}.
%% Hostname
{hosts, ["localhost", "ubuntu", "jabber.ndpar.com"]}.

For admin you need to specify the user name and domain name that you want to use as a Jabber ID. By default it's localhost and it's functional but it's better to change it to something meaningful. The list of hostnames is tricky. In theory you can provide there just localhost but in practice it didn't work for me. After digging into some Erlang exceptions I got while registering admin account (see next step) I came to conclusion that in the list of hostnames there must be a short hostname of the box. You can get it by running hostname -s command (in my case it was "ubuntu"). In addition you can provide other hostnames you like, but the short one is mandatory.

When you are done with editing, start the server

sudo /etc/init.d/ejabberd start

Now it's time to register the admin user we configured on the previous step. Run the following command replacing password placeholders with the actual password and providing user name and domain name from ejabberd.cfg file

sudo ejabberdctl register andrey jabber.ndpar.com xxxxxx

That's it! You have now working XMPP server with one registered user. To verify that everything is ok, in your browser go to the admin page of the server (http://jabber.ndpar.com:5280/admin) and check the statistics. You'll be asked to type your JID and password, so use the information you entered on the previous step



As a note, I didn't create my own SSL certificate because for isolated intranet the default one is quite enough. If you are not comfortable with that feel free to create a new certificate following the steps from the original article.

Now you are ready to add newly created account to your Jabber client. In Adium, for example, go to File -> Add Acount -> Jabber and provide server hostname/IP, JID and password.





Click OK button, accept security certificate permanently and go online.

Now, to really enjoy IM you need more users on your server. The best part here is that you can create new users just from your Jabber client. You can actually do many things from the client, and you don't need to ssh to the remote server and run command for that. Just go to File -> your ejabberd account, and chose whatever you need from the menu



Pretty cool, eh — client and admin tool in one place.

Wednesday, February 10, 2010

Multithreaded XmlSlurper

Groovy XmlSlurper is a nice tool to parse XML documents, mostly because of the elegant GPath dot-notation. But how efficient is XmlSlurper when it comes to parsing of thousands of XMLs per second? Let's do some simple test

class XmlParserTest {

static int iterations = 1000

def xml = """
<root>
<node1 aName='aValue'>
<node1.1 aName='aValue'>1.1</node1.1>
<node1.2 aName='aValue'>1.2</node1.2>
<node1.3 aName='aValue'>1.3</node1.3>
</node1>
<node2 aName='aValue'>
<node2.1 aName='aValue'>2.1</node2.1>
<node2.2 aName='aValue'>2.2</node2.2>
<node2.3 aName='aValue'>2.3</node2.3>
</node2>
<nodeN aName='aValue'>
<nodeN.1 aName='aValue'>N.1</nodeN.1>
<nodeN.2 aName='aValue'>N.2</nodeN.2>
<nodeN.3 aName='aValue'>N.3</nodeN.3>
</nodeN>
</root>
"""

def parseSequential() {
iterations.times {
def root = new XmlSlurper().parseText(xml)
assert 'aValue' == root.node1.@aName.toString()
}
}

@Test void testSequentialXmlParsing() {
long start = System.currentTimeMillis()
parseSequential()
long stop = System.currentTimeMillis()
println "${iterations} XML documents parsed sequentially in ${stop-start} ms"
}
}

I ran this test on my 4-core machine and I got

1000 XML documents parsed sequentially in 984 ms

Not really good (0.984 ms per document) but we didn't expect much from single threaded application. Let's parallelize this process

class XmlParserTest {
...
static int threadCount = 5
...
@Test void testParallelXmlParsing() {
def threads = []
long start = System.currentTimeMillis()
threadCount.times {
threads << Thread.start { parseSequential() }
}
threads.each { it.join() }
long stop = System.currentTimeMillis()
println "${threadCount * iterations} XML documents parsed parallelly by ${threadCount} threads in ${stop - start} ms"
}
}

And the result is

5000 XML documents parsed parallelly by 5 threads in 1750 ms

This is definitely better (0.35 ms per document) but doesn't look like parallel processing — the test time shouldn't increase in true parallelism.

The problem here is the default constructor of XmlSlurper. It does too much: first, it initializes XML parser factory loading bunch of classes; second, it creates new XML parser, which is quite expensive operation. Now imaging this happens thousand times per second.

Luckily, XmlSlurper has another constructor, with XML parser parameter, so we can create the parser up-front and pass it to the slurper. Unfortunately, we cannot reuse one parser instance between several slurpers because XML parser is not thread-safe — you have to finish parsing one document before you can use the same parser to parse another.

The solution here is to use preconfigured pool of parsers. Let's create one based on Apache commons-pool library.

public class XmlParserPoolableObjectFactory implements PoolableObjectFactory {
private SAXParserFactory parserFactory;

public XmlParserPoolableObjectFactory() {
parserFactory = SAXParserFactory.newInstance();
}
public Object makeObject() throws Exception {
return parserFactory.newSAXParser();
}
public boolean validateObject(Object obj) {
return true;
}
// Other methods left empty
}

public class XmlParserPool {
private final GenericObjectPool pool;

public XmlParserPool(int maxActive) {
pool = new GenericObjectPool(new XmlParserPoolableObjectFactory(), maxActive,
GenericObjectPool.WHEN_EXHAUSTED_BLOCK, 0);
}
public Object borrowObject() throws Exception {
return pool.borrowObject();
}
public void returnObject(Object obj) throws Exception {
pool.returnObject(obj);
}
}

Now we can change our test

class XmlParserTest {
static XmlParserPool parserPool = new XmlParserPool(1000)
...
def parseSequential() {
iterations.times {
def parser = parserPool.borrowObject()
def root = new XmlSlurper(parser).parseText(xml)
parserPool.returnObject(parser)
assert 'aValue' == root.node1.@aName.toString()
}
}
}

and run it again

1000 XML documents parsed sequentially in 203 ms
5000 XML documents parsed parallelly by 5 threads in 172 ms

That's much better (0.034 ms per document), and most importantly multi-threading really works now.

Resources

• Source code for this blog

• Article "Improve performance in your XML applications"

• GPath vs XPath

• commons-pool home page