Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

Performance drop over time #199

Open
junchen108 opened this issue Aug 25, 2018 · 7 comments
Open

Performance drop over time #199

junchen108 opened this issue Aug 25, 2018 · 7 comments

Comments

@junchen108
Copy link
Contributor

After having PriFi running for 45 to 60 minutes, we can usually experience some performance drop on the client side, which results

  • All network-related tasks become extremely slow, e.g. long loading session.
  • Cannot load anything and trigger timeout.

Restart PriFi on the client can fix the problem.

One of the potential roots of the issue is the ingress server. The following logs show (already after a long period of execution),

  1. Unusual long processing time between connections. (2 seconds, 4 seconds, and can be even longer).
08-25 19:42:33.179 29802-29954/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 590923822
08-25 19:42:35.579 29802-29954/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 2773462216
08-25 19:42:36.755 29802-29830/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 621338519
08-25 19:42:38.646 29802-29830/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 3599311296
08-25 19:42:42.547 29802-5690/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 1970092045
08-25 19:42:44.049 29802-29823/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 2778976067
  1. A lot of connection resets
08-25 19:42:47.777 29802-30716/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 2437827750
08-25 19:42:51.952 29802-29954/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 561995135
08-25 19:42:55.254 29802-29830/ch.epfl.prifiproxy E/GoLog: E : (stream-multiplexer.(*IngressServer).ingressConnectionReader: 191) - Ingress server: connectionReader error, read tcp 127.0.0.1:8080->127.0.0.1:56679: read: connection reset by peer
08-25 19:42:55.255 29802-29830/ch.epfl.prifiproxy E/GoLog: E : (stream-multiplexer.(*IngressServer).ingressConnectionReader: 191) - Ingress server: connectionReader error, read tcp 127.0.0.1:8080->127.0.0.1:56677: read: connection reset by peer
08-25 19:42:55.255 29802-5690/ch.epfl.prifiproxy E/GoLog: E : (stream-multiplexer.(*IngressServer).ingressConnectionReader: 191) - Ingress server: connectionReader error, read tcp 127.0.0.1:8080->127.0.0.1:56676: read: connection reset by peer
    E : (stream-multiplexer.(*IngressServer).ingressConnectionReader: 191) - Ingress server: connectionReader error, read tcp 127.0.0.1:8080->127.0.0.1:56673: read: connection reset by peer
08-25 19:43:03.246 29802-5690/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 2902954695
08-25 19:43:03.470 29802-29955/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 1253489795
08-25 19:43:03.753 29802-29830/ch.epfl.prifiproxy I/GoLog: 2 : (   stream-multiplexer.StartIngressServer: 103) - Ingress server just accepted a connection, assigning ID 3092217935
@junchen108 junchen108 changed the title Possible performance drop over time IngressServer Bug Sep 2, 2018
@junchen108 junchen108 changed the title IngressServer Bug Bugs in the ingress server Sep 2, 2018
@junchen108 junchen108 changed the title Bugs in the ingress server Performance drop over time Sep 2, 2018
@junchen108
Copy link
Contributor Author

junchen108 commented Sep 2, 2018

Updates 02.09.2018

Found some bugs in the ingress server implementation. One of these bugs should be the cause of this issue. @lbarman @italod @stevens-leblond @aledcuevas

Recap:

  • All "prifi-ed" connections are sent first to the ingress server (IG).
  • IG gives each connection an ID and store them into a data structure.
  • When data comes in, IG uses the ID to dispatch the correct connection and perform the write operation.

Bugs:

  1. IG uses a slice (dynamic array in go) to store connections, e.g. [conn1, conn2, conn3, conn4, ...]
    To dispatch the connection, a for loop is used, it iterates from the first element until the correct connection is found.

    Since this slice grows over time and new connections are always stored at the end, new write operations will all spend more and more time to finish.

    I am replacing the slice by a map, the performance is being tested.

  2. The ID that IG assigns to every new connection is not unique.

//generateID generates an ID from a private key
func generateRandomID() string {
	var n uint32
	binary.Read(rand.Reader, binary.LittleEndian, &n)

	return strconv.Itoa(int(n))
}

The generated random ID is a string from an integer, for example, "3503164205", "1433617593", etc.
However, IG uses only the first 4 bytes of the generated ID to identify connections.

// simplified and modified code
id := generateRandomID()
...
mc.ID_bytes = ID_bytes[0:4]
...
ID := incomingData[0:4]
...
if bytes.Equal(mc.ID_bytes, ID) {...}

Basically there are only 9 * 10 * 10 * 10 = 9000 possible IDs, and the duplication occurs very often.

This means that over time more and more connections are dispatched in a wrong way, the correct connection doesn't receive its data.

  1. The write operation v.conn.Write(data) can return broken pipe error. This error is not handled by the current code.

  2. IG stores all connections and never remove non-active connections from its data structure. I am not sure if we need to keep all connections, but I don't have a way to identify closed connections.

Summary:

The probability that the Bug 2 is the root of this "performance drop" issue is very high.

@junchen108 junchen108 added bug and removed help wanted labels Sep 2, 2018
@lbarman
Copy link
Collaborator

lbarman commented Sep 3, 2018

Thanks a lot @junchen108 for the analysis. I will try to confirm

@lbarman
Copy link
Collaborator

lbarman commented Sep 3, 2018

Each connection has its own randomID generated by sampling 32 random bits (one int32).

Basically there are only 9 * 10 * 10 * 10 = 9000 possible IDs, and the duplication occurs very often. 

Well perharps there's another bug here, but 4 bytes is 2^8 * 2^8 * 2^8 * 2^8 = 2^32, not 9000.
The bytes are taken from ID_byte[0:4], not id[0:4] (where your analysis would be correct).

BTW it seems that 32 bits is also perhaps too low: https://preshing.com/20110504/hash-collision-probabilities/, say we have 10k channels, probability of collision is 1/100 which is not negligible. We should log things there to be sure.

@junchen108
Copy link
Contributor Author

junchen108 commented Sep 3, 2018

@lbarman Hi, thanks for looking at this.

The randomID is generated by sampling 32 random bit, this is true.

func generateRandomID() string {
	var n uint32
	binary.Read(rand.Reader, binary.LittleEndian, &n)

	return strconv.Itoa(int(n))
}

But this function returns the string of this random int (strconv.Itoa(int(n))), for example "3503164205", "1433617593", etc.

Then

id := generateRandomID() // string of an int
mc := new(MultiplexedConnection)
mc.conn = conn
mc.ID = id
ID_bytes := []byte(id) // this line returns UTF-8 encoded version of the string 
mc.ID_bytes = ID_bytes[0:4] // UTF-8 encoded version of the first 4 characters

For instance, if random id = "3503164205", what the connection really gets is 0x33 0x35 0x30 0x33, which is the UTF-8 encoded version of '3' '5' '0' '3'.

Hence, I think there are only 9000 possibilities. (1-9, 0-9, 0-9, 0-9)

It will work, if we don't do strconv.Itoa(int(n)).

@lbarman
Copy link
Collaborator

lbarman commented Sep 3, 2018

Right, nice catch ! I let you do the change since you're already working on the 1st point you mentioned.

@junchen108
Copy link
Contributor Author

ok!

@junchen108
Copy link
Contributor Author

415548f should fix the ID collision bug.

There is one more thing that I want to do.
The egress server (EG) of the relay stores all the connections of the whole PriFi network into a map. The key is the ID that each connection gets from its client.

This is not ideal, because if there are only two clients that generate two same IDs, a collision occurs again.

To avoid this problem, EG should track the connections in its own way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants