If you ever worked with a big enough distributed system, you know that at some point you need to test how the system works with a large amount of traffic before deploying it to production.

One of the methods you can use to get more confident with your system is to simulate traffic on one end of the system and gather as much information as you can in order to understand what is happening.

Recently, when developing one of these systems, we needed to build a simulation that could test the system under a load of 1000 requests/minute.

The problem is that in order to do this, we shouldn't just generate 1000 events, sleep for a minute, and then generate 1000 more events. This behavior doesn't emulate the behavior of users in production at all and the system would probably perform in a very different way. What we need is a way to distribute those 1000 events over a minute, preferably in a way that resembles production users behavior.

One simple way to do this is model the users behavior as Poisson process. This way, the events will not be uniformly distributed over a minute, instead, they will follow a Poisson distribution over time.

We were developing this system in ruby, so I will explain how you can generate events that simulate a Poisson process using ruby and then check the results of observe to confirm they are close to what you would expect.

Generating events

I tried two methods to generate events modeled by a Poisson process.

Method 1: Sleeping between events

If a process can be modeled using a Poisson Process, the probability distribution of the waiting time until the next occurrence of an event is an exponential distribution 1.

This means that one way to model this system is to generate an event, sleep until the next event should be generated and then start over.

We can determine the time our process needs to sleep until the next event using the following code:

def sleepFor(rateParameter)
  -Math.log(1.0 - Random.rand) / rateParameter
end

Where rate parameter is the number of events that should happen in each unit of time. In our case, 1000 events per minute, that is about 17 events per second. The rate parameter is then our λ for our Poisson Process.

Jeff Presshing wrote a very good blog post that explains this rational in more detail.

So with our sleepFor function in place, to generate events in your system, we just need the following code:

require 'socket'

def sleepFor(rateParameter)
  -Math.log(1.0 - Random.rand) / rateParameter
end

def generate_event
  c = UDPSocket.new
  c.send('hello', 0, 'localhost', 6767)
  c.close
end

20000.times do
  generate_event

  sleep_for = sleepFor(17) # lambda = 17
  sleep(sleep_for)
end

Problem: Does sleep have enough granularity?

For a small rateParameter, this works as expected, see below to learn how you can confirm this. For a big number of events events per second (a big λ), the sleep function does not have sufficient granularity. This means that sleep does not actually sleep for enough time, so we need to use a different approach.

Method 2: Constantly check if an event should be generated

Another approach would be to not use sleep at all. Start an infinite loop and constantly check if we should generate an event in the current iteration. For example:

require 'socket'

LAMBDA = 17

def f_next_time(lambda)
  -Math.log(1 - Random.rand) / lambda
end

def generate_event
  c = UDPSocket.new
  c.send('hello', 0, 'localhost', 6767)
  c.close
end

start = Time.now.to_i
next_time = 0

2000.times do
  time = Time.now.to_f

  if time >= next_time
      generate_event
      next_time = time + f_next_time(LAMBDA)
  end
end

This solution is more CPU intensive, since it's executing an infinite loop checking if it should generate an event, but the results should be more precise, especially for a bigger lambda value.

Caveats

Get as much information about your system as much as you can

Your system is probably more complex than this, so you should measure and get as much information as possible in order to ensure that you are generating enough events and that those events are properly distributed in each second.

Time required to generate an event

In my example, the time to generate an event is actually negligible, but if your events take more time to generate than the inter-arrival time of the events, this will not work! Make your event generation is fast enough or try to find another solutions.

Observing events and saving data

To check if the generated events are following our Poisson process, we need to collect those events and group them by the second in which they happened, then we can analyze the results.

This can be done with the following code.

require 'json'
require 'socket'

def start_server port
  s = UDPSocket.new
  s.bind(nil, port)
  data = Hash.new(0)
  start = nil

  Signal.trap('SIGINT') do
    File.open('poisson.json','w') do |f|
      f.write(data.to_json)
    end

    puts data

    exit
  end

  while true
    msg, sender = s.recvfrom(256)
    start = Time.now.to_f if start.nil?
    bucket = (Time.now.to_f - start).round.to_i

    data[bucket] += 1
  end
end

start_server(6767)

Using this code, when the server is interrupted, using SIGINT (CTRL-C), the results will be saved to a json file than can then be analyzed.

Analyzing results

With the results collected using the above scripts, we can easily analyze the number of events generated in each second and check if they are distributed in time as we expect them to.

I used Python with matplotlib to process and visualize the results.

λ = 17 Events/Second

Since the results are already grouped by arrival second, we can plot the number of events that happened in each second.

We can then generate enough random samples using a Poisson distribution with λ = 17 and plot this data against our results.

If the plots are similar enough, then the events were generated following a Poisson process.

There are better and more exact ways to check this, for my problem this is just good enough. You should use whatever method you feel comfortable with.

With λ = 17 and using method 2, I got the following results:

lambda = 17

As you can see, the standard deviation and mean are very close to the values you would expected (between parenthesis) and plotted data seems to also match our expectations.

Conclusion

If you are trying to simulate traffic into your system don't just throw traffic sequentially into it. Try to find a way to simulate production traffic. Even if you end up not using a perfect model of expected traffic, using some kind of model is always better than just generate uniformly distributed events.

If you want to try to model your production traffic using a Poisson process, use method 2 describe above if you don't mind the CPU intensive algorithm or use method 1 if your λ is small enough.

Don't forget to always check your results against or expectations.



comments powered by Disqus