Computer Science Department, KSU
Source Code of Traffic Generation Module
Source Traffic Modeling and Generation is a subproject of
project - “Research and Development of IP Packet Processors”. Its objective is
to simulate various Internet traffic and generate IP packet flow at runtime in
order to test the functions and performance of a router application, which is
going to be implemented by other components of main project. This technical
report will discuss the characteristics of Internet traffic, some important
distributions used to describe different aspects of traffic, and several
effective ways to simulate Internet traffic. The algorithms of traffic
generation are given in report.
Source traffic modeling and
generation is a very important aspect of research about network communication.
It is a starting point and somehow a basis of network traffic processing.
First, an
accurate estimation to the network traffic is the basis to handle/control the
traffic. This is the task of traffic modeling. A good estimation to network
traffic (PDF, CDF, mean, variance, persistence of clustering, heavy tail, etc)
helps to solve the questions in the design of router’s algorithm. For example,
which parameter is crucial and is easy to get, how to handle short-term data
burst? It is also an important factor to
the resource configuration of routers and switches. For example, how large the
buffer should be for each communication channel?
Second,
traffic generation provides a powerful and flexible tool to test all kinds of
service designed for network communication. Developing such a tool is the
primary goal of this subproject.
Many works
have been done in this area and the approaches cover every aspects of Internet
traffic. For examples, San-gi Li and his group developed SMAQ for traffic
modeling and queuing analysis [8]. P. Tran-Gia discussed traffic modeling of
Wireless application [10]. While IPv6 has not yet been widely used, Liren Zhang
and others have shown their interest on IPv6 traffic and its QoS performance
[3].
Although it
is hard to dig deep into most popular questions about traffic modeling and
generation with this in-course project, we still do our best to discuss some
characteristics of Internet traffic, some important distributions used to
describe different aspects of traffic, and several effective approaches to generate
Internet traffic in following part of this technical report.
The purpose
of traffic modeling is to find out the statistical characteristics of various
Internet traffics. Many relative works have been done. Some results and
conclusions are shown in papers [1][2][11];
According to the requirement of our main project, traffic
modeling is less emphasized than traffic generation. But it is still important
because:
To perform
traffic modeling, we use TCPDUMP developed at Lawrence Berkeley National
Laboratory. It can be downloaded through URL address
(ftp://ftp.ee.lbl.gov/tcpdump.tar.Z). TCPDUMP is a powerful tool that allows us
to collect network packets and make some statistical analysis out of those dumps.
Dumped data are saved into flat files for later usage.
With the
data collected by TCPDUMP, some further analysis can be done using TCP-Reduce,
which is a collection of shell scripts for reducing a TCPDUMP trace file to a
summary of the corresponding TCP connections. TCP-Reduce give a list of
identified TCP/IP connections together with the related data. In particular,
the following data are collected:
ü Connection start time
ü Connection duration
ü Higher-level protocol carried
ü Port of connection
ü Number of bytes transferred between
server and client
ü IP Server & Client addresses
ü State the connection ended to
Another
software tool TCPTRACE, which is downloadable from Ohio University, can also be
used to analyze TCPDUMP files and report information such as elapsed time,
bytes and segments sent and received, retransmissions, round trip times, window
advertisements, throughput, and more. It can also produce a number of graphs
for further analysis.
Traffic generation and traffic modeling
are really two sides of a coin. Many important works have been done. Mark W.
Garrett gives anlysis, modeling and generation of self-similar VBR video
traffic [5]. Kai Below discussed the Internet traffic generation for large
simulation scenarios [6]. E. Casilari talked about how to use neural network in
the generation of ATM video traffic. [7]
Out traffic
generation will simulate Internet traffic by randomly generating traffic flow
for different applications. Each flow will contain a train of packets. The flow
could either come from real network traffic log or is generated synthetically
from given probability density functions. In following sections, we will
discuss several different approaches of Internet traffic generation.
3.1 Trace Traffic
The traffic
flow is generated from TCPDUMP trace file, which record the real Internet
traffic in a long enough time interval.
According to the requirement of main project, a flow is
described by 6 important elements:
v
The
number of IP packets in the flow
v
The
number of bytes in the flow
v
Staring
time of the flow (Unix time in ms)
v
Ending
time of the flow (Unix time in ms)
v
Transport
protocol: TCP (6), UDP (17), Others (47, ...)
v
Port
of transport protocol (Optional)
A sample list of flows is as follows:
Packets Bytes Start End Protocol Port
---------------------------------------------------------------
1 1500 3913644961 3913644961 6 0
1 40 3913644961 3913644961 6 0
1 40 3913644961 3913644961 6 0
1 40 3913644961 3913644961 6 0
1 40 3913644961 3913644961 6 0
1 48 3913644961 3913644961 6 0
1 48 3913644961 3913644961 6 0
1 40 3913644961 3913644961 6 32
3 132 3913644961 3913682958 6 0
2 80 3913644961 3913684145 6 0
8 9202 3913644961 3913686568 6 0
4 172 3913644961 3913692399 6 0
30 1200 3913644961 3913702043 6 64
24 33080 3913644961 3913702459 6 0
15 2922 3913644961 3913704792 17 0
1 88 3913644962 3913644962 17 0
1 94 3913644962 3913644962 17 0
---------------------------------------------------------------
Table-1
Obviously,
we can easily generate synthetic traffic flows like above table using
TCP-Reduce on TCPDUMP trace files. The IP packets for each flow can be obtained
separately from same group of TCPDUMP trace files. This work can be done by
either UNIX shell script or specially designed C/C++ code.
For test
purpose of IP packet processor and reducing the memory size of Queue in project
G7, only header information of IP packet is generated. The format of each
packet follows the IPv4 header definition and it is delivered as follows:
0 |
4 |
8 |
|
16 |
19 |
|
|
Version |
IHL |
Type of Service |
Total Length |
||||
Identification |
Flags |
Fragment Offset |
|||||
Time to Live |
Protocol |
Header checksum |
|||||
Source address |
|||||||
Destination address |
To generate
special flow for major applications like TCP/UDP, FTP, HTTP, Email, etc, we can
change the command line parameter and filter expression of TCPDUMP to get particular
flows and packets of flow required by our project.
3.2 Synthetic Traffic
Synthetic traffic is generated from
given Probability Density Functions (PDF). The generation of traffic is divided
into two parts: generation of IP packets, and synthetic flow.
Before we
discuss the detail of traffic generation, we list all of random distributions
and their generation algorithms used in this project. The distributions and
algorithms that generate random variates are based on the course notes – “Commonly used random distributions and
generation algorithms” of Prof. Peyravi
3.2.1 Random Distributions and Generation Algorithms
Binomial Distribution
Usage: Model the number of resource
available; number of packets that reach the destination without loss; number of
bits in error in a packet.
PDF: ‘x’ is the number of successes in a sequence
of n Bermoulli trials
Mean: np Variance:
np(1-p)
Generation
Algorithm:
Algorithm Binomial(n, p)
Begin
C = 0
For (i=0; i<n; i++)
S = rand() / RAND_MAX
If S ≤ p
C = C+1;
Return C;
End
Poisson Distribution
Usage: Model call arrivals, number
of tasks in the system, number of requests to a server, number of failed
components, and message length;
PDF:
‘x’ is the number of successes in a
sequence of Bernoulli trials’
, k = 0, 1, …
Mean: l Variance:
l
Generation
Algorithm:
Algorithm Poisson( l )
Begin
R =
n = 0
initialize array U
U[0] = rand() / RAND_MAX
DO
n = n+1;
U[i] = U[n-1] * rand() /
RAND_MAX
If U[n] ≤ R < U[n-1]
Break;
WHILE (1)
Return n
End
Geometric Distribution
Usage: Model the number of attempts
between successive failure (or successes), the number of packets in a message.
PDF: ‘x’ represents the number of
trials up to including the first success in a sequence of Bernoulli trials. ‘p’
denotes the probability of success, ‘1-p’ is the probability of failure, (0≤p≤1),
and x = 1, 2, …, ¥
Mean: Variance:
Generation
Algorithm:
Algorithm Geometric( p )
Begin
u = rand() / RAND_MAX
Return
End
Normal Distribution
Usage: model the sum of a large
number of independent random variables according to central limit theorem
PDF:
Mean: m Variance:
s2
Generation
Algorithm:
Algorithm Normal(n, m, s
)
Begin
U = rand() / RAND_MAX
For (i=1; i<n; i++) Do
U = U + rand() / RAND_MAX
T = (U – n/2) / sqrt(n/12)
Return (m + sT)
End
Uniform Distribution
Usage:
model the distance between source and destination
PDF:
Mean: (a + b)/2 Variance: (b-a)2/12
Generation
Algorithm:
Algorithm Uniform(a, b)
Begin
U = rand() / RAND_MAX
Return a + (b – a)U
End
Exponential Distribution
Usage:
Model the inter-arrival times between successive events
PDF: ‘l’ is mean arrival rate of events.
Mean: 1/l Variance: 1/l2
Generation
Algorithm:
Algorithm Exponential(l)
Begin
U = rand() / RAND_MAX
Return
End
Pareto Distribution
Usage: Model the
bursty/fractal/self-similar traffic, for example, packet interval time and
burst length
PDF: ‘a’ –
determine the mean and variance of random variables.
Mean and Variance: , a > 1
, a > 2
Generation
Algorithm:
Algorithm Pareto( a )
Begin
U = rand() / RAND_MAX
Return
End
3.2.2 Generation of IP Packets
Before generating traffic data, the
caller of traffic generation module should specify the value of several basic
parameters, which include the starting time of traffic, the duration of
traffic, and the type of traffic.
The IP packet header fields influenced by traffic generation
algorithm are:
·
Type
of Service (TOS)
·
Total
Length
·
Identification
Other fields are designated as follows:
·
Version
¬ 4
·
IHL
(Internet Head Length) ¬ 5
·
Flags
¬ Don’t care
·
Fragment
Offset ¬ Don’t care
·
Time
to Live ¬ 255
·
Protocol
¬ Specified by caller
·
Header
Checksum ¬ Don’t care
·
Source
Address ¬ Specified by caller
·
Destination
Address ¬ Specified by caller
Another
property not showing up in IP packet header is the interarrival times between
packets, which is also generated by our algorithm.
We have
several different approaches considering different situation of Internet
communication. In following discussion,
all approaches of traffic generation use above configurations.
No Error with Short Term Clustering
This type of traffic considers high
quality, error free communication network. It is supposed there is no data
error in packet during transmission and no packet is dropped between two
communicating entities due to network congestion or inappropriate configuration
of Internet route information.
Within the IP packet header, the value of TOS
is always generated through a random variable with Uniform Distribution. Ipv4 define 5 possible values for TOS. So we
always call Uniform(1,
5) to set TOS.
The length
of IP packet usually follows Poisson
distribution. The call to algorithm Poisson(l) will generate length of packet in
bytes – Total Length in IP packet header. Usually, we use same l to
generate all IP packets for one flow. But packets generation for different
flows will probably use different l-value. ‘l’
denotes the average length of IP packet.
Identification
is a number generated in sequence.
If the
clustering of data packet occurs in short term (small time scale), we can assume
the arrival pattern of IP packet is also in Poisson
distribution. Then, the interarrival time between consecutive packets
follows Exponential distribution. We
call Exponential(l) to generate the interarrival time between packets and we
apply the time value to IP packet in following way:
Before any
IP packet of a flow can be delivered to IP packet processor, all packets as
well as their starting times are generated first and saved in a priority queue.
Packets in queue are sorted by their starting times. Given starting time –Tp of one packet, its
successor’s starting time Ts
is calculated as:
When
traffic generation function is called to serve traffics, it can
1)
Deliver
whole priority queue to caller and let caller decide what to do next; OR
2)
Deliver
packets to an output stream specified by caller one by one. The pace and rate
pf delivery is controlled by the starting time of each packet.
Error with Short-Term Clustering
This type of traffic considers the
error/loss of data packet in unreliable network communication. The loss of data
packet may be caused by either network congestion or the quality of
communication channel. Packet could also be rejected by destination due to the
error in data packet.
We assume
the error/loss of packets may not happen at same location where our IP packet
processor runs, but our IP packet processor will observe the phenomenon that
same packet is transmitted repeatedly.
Traffic
generation in this circumstance is similar to error-free network communication
in many aspects except that a packet may be transmitted several times. Two
different approaches are used to simulate the packet transmission with respect
to errors.
Suppose
‘n’ is the number of packets we previously generated for a flow. ‘p’ is the
probability reflecting the quality of communication. We calculate the total
number of repeated packets as follows:
Algorithm
RepeatPacketsEST(n, p)
Begin
A[0] = n
M = 0
While Binomial( A[M], p )
> 1, Do
M = M + 1
A[M] = (A[M-1] – ëBinomial( A[M-1], p )û
Return
End
The total number of packets that
should be transmitted is:
(n + RepeatPacketsEST(n, p))
Suppose ‘n’
is the number of packets we previously generated for a flow. ‘p’ is the
probability that a packet can reach destination with no error. We calculate the
total number of packets that should be transmitted as:
Algorithm TotalPacketsEST(n, p)
Begin
C = 0;
For (i=0;
i<n; i++) Do
C = C
+ Geometric(p)
Return éCù
End
Error with Self-Similarity
This type of traffic considers
unreliable network communication. Different with above two situations, the packet
arrival pattern is no longer Poisson
distribution. The arrival time of packets show self-similarity. This will
change the distribution of interarrival times of packets from Exponential distribution to Pareto distribution.
The packet generation
is similar to ‘Error with Short-Term Clustering’. But the interarrival time
between consecutive packets is calculated using: Pareto(a)
Caller of
traffic generation algorithm can specify different value of ‘a’ to control the
distribution of interarrival times of packets. If a ≤ 2, the distribution
has infinite variance, and if a ≤ 1, the distribution has infinite mean
and variance. So, caller can generate traffic with Heavy-Tail though
controlling the value of ‘a’.
3.2.3 Synthetic Flow
As traffic flows in Table-1, each
flow is described by 1) the number of IP packets; 2) the number of bytes; 3)
starting time; 4) ending time; 5) transport protocol; and 5) port (optional).
These values should be given as properties of Event object in IP Packet Processor.
The basic
process of generating one traffic flow is as follows:
Traffic
generation algorithm provides several different ways to generate a bunch of
flows automatically. This is about generation of flow list like the one shown
in Table-1. Algorithm requires three parameters: the very beginning time (TS)
of earliest flow; the ending time (TE) of last flow; and the maximum
packet arrival rate R.
Traffic generation algorithm will create a list of flows as
follows:
Geometric(1/Uniform(1, R))
Generate all packets for the flow
The second
way to generate flows is using flow list generated by TCP-Reduce. That means
using real traffic log to generate synthetic traffic flows. The only difference
with above approach is the just flow list. Flow is generated synthetically as
before.
The third
way is, given maximum number of flows caller want, the m and s of
number of flows in system, traffic generation algorithm call Normal(n, m, s) to generate the number of flows in list.
All other works are same as first approach. The convenience is caller can
create a series of flow list that each contains a bunch of flows. This is
useful to test performance of IP Packet Processor with long-term traffic and
heavy congestion.
The C/C++ code can be found in
files: sourcetraffic.tar.gz. It is written under Linux and compiled using
‘gcc’.
The source traffic modeling and
generation is still ongoing research. We use some convenient tools like
TCPDUMP, TCP-Reduce to try to catch some essential characteristic of Internet
traffic. With this knowledge and experience, we create a-version
of traffic generation module and discussed the detail of our solution in this
technical report. Apparently, our approaches are not enough to simulate all
kinds of Internet traffic. But we will keep moving forward and add more traffic
generation algorithms in the future.