你以为的timeout,不一定是用户的timeout

引言

最近在协助业务团队解决一些疑难问题,其中有一个就是有些用户反馈在进行某个特定的操作时,偶尔会遇到加载很久的情况,就好像是timeout不起作用一样,但是业务开发的同学明明将网络请求的timeout设置为30s,这是为什么呢?难道是okhttp有bug?还是说用户操作不当?

最终我花费了三天时间,慢慢地抽丝剥茧,终于找到了问题的原因

1.确认问题

由于产品经理收集到的用户反馈比较模糊,为了准确定位问题存在,就需要拿数据说话,于是查看这个请求的埋点数据,发现确实有几十个用户在这个请求上花费的时间超过30s,有些甚至达到了90s,这样的体验就非常差了。

那会不会是业务的童鞋在初始化OkHttpClient时timeout设置错误了呢,于是查看初始化代码,如下:

1
2
3
4
5
OkHttpClient.Builder httpClientBuilder = new OkHttpClient.Builder()
.readTimeout(30, TimeUnit.SECONDS)
.connectTimeout(30, TimeUnit.SECONDS)
.writeTimeout(30, TimeUnit.SECONDS)
.addInterceptor(new HeaderInterceptor())

显然,三个timeout值都设置成了30s,并没有问题。这样的话只能怀疑是okhttp有bug或者我们对于okhttp的使用不当了。

2.okhttp源码中timeout调用

在创建OkHttpClient时设置的timeout,会在何时使用呢?

readTimeout,connectTimeout和writeTimeout的使用有两个地方,一个是StreamAllocation,一个是在Http2Codec中,由于我们这个请求是http 1.1协议,所以Http2Codec就不用看了。

在StreamAllocation中的newStream()方法中,timeout的使用如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public HttpCodec newStream(OkHttpClient client, boolean doExtensiveHealthChecks) {
int connectTimeout = client.connectTimeoutMillis();
int readTimeout = client.readTimeoutMillis();
int writeTimeout = client.writeTimeoutMillis();
boolean connectionRetryEnabled = client.retryOnConnectionFailure();
try {
RealConnection resultConnection = findHealthyConnection(connectTimeout, readTimeout,
writeTimeout, connectionRetryEnabled, doExtensiveHealthChecks);
HttpCodec resultCodec;
if (resultConnection.http2Connection != null) {
resultCodec = new Http2Codec(client, this, resultConnection.http2Connection);
} else {
resultConnection.socket().setSoTimeout(readTimeout);
resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
resultCodec = new Http1Codec(
client, this, resultConnection.source, resultConnection.sink);
}
synchronized (connectionPool) {
codec = resultCodec;
return resultCodec;
}
} catch (IOException e) {
throw new RouteException(e);
}
}

可以看到这三个timeout都用于与连接有关的参数设置中,首先看findHealthyConnection()方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/**
* Finds a connection and returns it if it is healthy. If it is unhealthy the process is repeated
* until a healthy connection is found.
*/
private RealConnection findHealthyConnection(int connectTimeout, int readTimeout,
int writeTimeout, boolean connectionRetryEnabled, boolean doExtensiveHealthChecks)
throws IOException {
while (true) {
RealConnection candidate = findConnection(connectTimeout, readTimeout, writeTimeout,
connectionRetryEnabled);
// If this is a brand new connection, we can skip the extensive health checks.
synchronized (connectionPool) {
if (candidate.successCount == 0) {
return candidate;
}
}
// Do a (potentially slow) check to confirm that the pooled connection is still good. If it
// isn't, take it out of the pool and start again.
if (!candidate.isHealthy(doExtensiveHealthChecks)) {
noNewStreams();
continue;
}
return candidate;
}
}

发现这个方法主要就是会循环调用findConnection()直到找到一个健康的连接,而findConnection()如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/**
* Returns a connection to host a new stream. This prefers the existing connection if it exists,
* then the pool, finally building a new connection.
*/
private RealConnection findConnection(int connectTimeout, int readTimeout, int writeTimeout,
boolean connectionRetryEnabled) throws IOException {
Route selectedRoute;
synchronized (connectionPool) {
if (released) throw new IllegalStateException("released");
if (codec != null) throw new IllegalStateException("codec != null");
if (canceled) throw new IOException("Canceled");
RealConnection allocatedConnection = this.connection;
if (allocatedConnection != null && !allocatedConnection.noNewStreams) {
return allocatedConnection;
}
// Attempt to get a connection from the pool.
RealConnection pooledConnection = Internal.instance.get(connectionPool, address, this);
if (pooledConnection != null) {
this.connection = pooledConnection;
return pooledConnection;
}
selectedRoute = route;
}
if (selectedRoute == null) {
selectedRoute = routeSelector.next();
synchronized (connectionPool) {
route = selectedRoute;
refusedStreamCount = 0;
}
}
RealConnection newConnection = new RealConnection(selectedRoute);
synchronized (connectionPool) {
acquire(newConnection);
Internal.instance.put(connectionPool, newConnection);
this.connection = newConnection;
if (canceled) throw new IOException("Canceled");
}
newConnection.connect(connectTimeout, readTimeout, writeTimeout, address.connectionSpecs(),
connectionRetryEnabled);
routeDatabase().connected(newConnection.route());
return newConnection;
}

可以发现,就是在调用RealConnection的connect()方法时用到了三个timeout,该方法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
public void connect(int connectTimeout, int readTimeout, int writeTimeout,
List<ConnectionSpec> connectionSpecs, boolean connectionRetryEnabled) {
if (protocol != null) throw new IllegalStateException("already connected");
RouteException routeException = null;
ConnectionSpecSelector connectionSpecSelector = new ConnectionSpecSelector(connectionSpecs);
if (route.address().sslSocketFactory() == null) {
if (!connectionSpecs.contains(ConnectionSpec.CLEARTEXT)) {
throw new RouteException(new UnknownServiceException(
"CLEARTEXT communication not enabled for client"));
}
String host = route.address().url().host();
if (!Platform.get().isCleartextTrafficPermitted(host)) {
throw new RouteException(new UnknownServiceException(
"CLEARTEXT communication to " + host + " not permitted by network security policy"));
}
}
while (protocol == null) {
try {
if (route.requiresTunnel()) {
buildTunneledConnection(connectTimeout, readTimeout, writeTimeout,
connectionSpecSelector);
} else {
buildConnection(connectTimeout, readTimeout, writeTimeout, connectionSpecSelector);
}
} catch (IOException e) {
closeQuietly(socket);
closeQuietly(rawSocket);
socket = null;
rawSocket = null;
source = null;
sink = null;
handshake = null;
protocol = null;
if (routeException == null) {
routeException = new RouteException(e);
} else {
routeException.addConnectException(e);
}
if (!connectionRetryEnabled || !connectionSpecSelector.connectionFailed(e)) {
throw routeException;
}
}
}
}

不需要走代理时,调用到buildConnection()方法:

1
2
3
4
5
6
/** Does all the work necessary to build a full HTTP or HTTPS connection on a raw socket. */
private void buildConnection(int connectTimeout, int readTimeout, int writeTimeout,
ConnectionSpecSelector connectionSpecSelector) throws IOException {
connectSocket(connectTimeout, readTimeout);
establishProtocol(readTimeout, writeTimeout, connectionSpecSelector);
}

这里就开始分开了,其中connectTimeout和readTimeout用于socket连接,而readTimeout和writeTimeout则是用于与http 2有关的设置,先看connectSocket()方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
Proxy proxy = route.proxy();
Address address = route.address();
rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
? address.socketFactory().createSocket()
: new Socket(proxy);
rawSocket.setSoTimeout(readTimeout);
try {
Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
} catch (ConnectException e) {
ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
ce.initCause(e);
throw ce;
}
source = Okio.buffer(Okio.source(rawSocket));
sink = Okio.buffer(Okio.sink(rawSocket));
}

可以看到:

  • readTimeout最终被用于rawSocket.setSoTimeout(),而setSoTimeout()的作用是在建立连接之后,对于InputStream进行read()操作时的时间限制,所以这里采用readTimeout

  • connectTimeout则会最终根据不同的平台进行设置,在Android系统上最终会调用AndroidPlatform的connectSocket()方法,如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    @Override public void connectSocket(Socket socket, InetSocketAddress address,
    int connectTimeout) throws IOException {
    try {
    socket.connect(address, connectTimeout);
    } catch (AssertionError e) {
    if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
    throw e;
    } catch (SecurityException e) {
    // Before android 4.3, socket.connect could throw a SecurityException
    // if opening a socket resulted in an EACCES error.
    IOException ioException = new IOException("Exception in connect");
    ioException.initCause(e);
    throw ioException;
    }
    }

    可见这里就是为socket设置连接超时,所以是使用connectTimeout.

再回到RealConnection的buildConnection()方法中,在调用完connectSocket()之后,就调用了establishProtocol()方法了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
private void establishProtocol(int readTimeout, int writeTimeout,
ConnectionSpecSelector connectionSpecSelector) throws IOException {
if (route.address().sslSocketFactory() != null) {
connectTls(readTimeout, writeTimeout, connectionSpecSelector);
} else {
protocol = Protocol.HTTP_1_1;
socket = rawSocket;
}
if (protocol == Protocol.HTTP_2) {
socket.setSoTimeout(0); // Framed connection timeouts are set per-stream.
Http2Connection http2Connection = new Http2Connection.Builder(true)
.socket(socket, route.address().url().host(), source, sink)
.listener(this)
.build();
http2Connection.start();
// Only assign the framed connection once the preface has been sent successfully.
this.allocationLimit = http2Connection.maxConcurrentStreams();
this.http2Connection = http2Connection;
} else {
this.allocationLimit = 1;
}
}

可见如果是https连接则会调用connectTls()方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
private void connectTls(int readTimeout, int writeTimeout,
ConnectionSpecSelector connectionSpecSelector) throws IOException {
Address address = route.address();
SSLSocketFactory sslSocketFactory = address.sslSocketFactory();
boolean success = false;
SSLSocket sslSocket = null;
try {
// Create the wrapper over the connected socket.
sslSocket = (SSLSocket) sslSocketFactory.createSocket(
rawSocket, address.url().host(), address.url().port(), true /* autoClose */);
// Configure the socket's ciphers, TLS versions, and extensions.
ConnectionSpec connectionSpec = connectionSpecSelector.configureSecureSocket(sslSocket);
if (connectionSpec.supportsTlsExtensions()) {
Platform.get().configureTlsExtensions(
sslSocket, address.url().host(), address.protocols());
}
// Force handshake. This can throw!
sslSocket.startHandshake();
Handshake unverifiedHandshake = Handshake.get(sslSocket.getSession());
// Verify that the socket's certificates are acceptable for the target host.
if (!address.hostnameVerifier().verify(address.url().host(), sslSocket.getSession())) {
X509Certificate cert = (X509Certificate) unverifiedHandshake.peerCertificates().get(0);
throw new SSLPeerUnverifiedException("Hostname " + address.url().host() + " not verified:"
+ "\n certificate: " + CertificatePinner.pin(cert)
+ "\n DN: " + cert.getSubjectDN().getName()
+ "\n subjectAltNames: " + OkHostnameVerifier.allSubjectAltNames(cert));
}
// Check that the certificate pinner is satisfied by the certificates presented.
address.certificatePinner().check(address.url().host(),
unverifiedHandshake.peerCertificates());
// Success! Save the handshake and the ALPN protocol.
String maybeProtocol = connectionSpec.supportsTlsExtensions()
? Platform.get().getSelectedProtocol(sslSocket)
: null;
socket = sslSocket;
source = Okio.buffer(Okio.source(socket));
sink = Okio.buffer(Okio.sink(socket));
handshake = unverifiedHandshake;
protocol = maybeProtocol != null
? Protocol.get(maybeProtocol)
: Protocol.HTTP_1_1;
success = true;
} catch (AssertionError e) {
if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
throw e;
} finally {
if (sslSocket != null) {
Platform.get().afterHandshake(sslSocket);
}
if (!success) {
closeQuietly(sslSocket);
}
}
}

在这个调用中完成了握手以及证书校验,最后可以看到socket这个成员其实是SSLSocket对象。另外,在这里其实readTimeout和writeTimeout都没有用到,这两个参数其实是没必要传递进来的。

3.socket, source, sink的超时设置

3.1超时设置主流程梳理

再回到StreamAllocation的newStream()方法中,可以看到在findHealthyConnection()这个调用中,由于我们是http 1.1协议,所以其实我们只用到了readTimeout和connectTimeout,而并没有用到writeTimeout.

之后,就调用如下代码:

1
2
3
4
5
resultConnection.socket().setSoTimeout(readTimeout);
resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
resultCodec = new Http1Codec(
client, this, resultConnection.source, resultConnection.sink);

1)通过刚刚的梳理,我们发现在AndroidPlatform中给rawSocket(java.net.Socket对象)设置过readTimeout和connectTimeout,而这里的resultConnection.socket()返回的并不是rawSocket,而是socket成员,在采用https连接时它跟rawSocket是不一样的,它其实是SSLSocket对象,所以这里setSoTimeout()并不跟之前的setSoTimeout()重复。

2)source是在哪里建立的呢?其实我们刚刚分析过,就是在RealConnection的connectSocket()方法中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
Proxy proxy = route.proxy();
Address address = route.address();
rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
? address.socketFactory().createSocket()
: new Socket(proxy);
rawSocket.setSoTimeout(readTimeout);
try {
Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
} catch (ConnectException e) {
ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
ce.initCause(e);
throw ce;
}
source = Okio.buffer(Okio.source(rawSocket));
sink = Okio.buffer(Okio.sink(rawSocket));
}

可见source其实是先获取到rawSocket的输入流,然后调用Okio.buffer()进行包装,而sink则是先获取rawSocket的输出流,然后调用Okio.buffer()进行包装。先看一下Okio.source()方法:

1
2
3
4
5
6
7
8
9
public static Source source(Socket socket) throws IOException {
if(socket == null) {
throw new IllegalArgumentException("socket == null");
} else {
AsyncTimeout timeout = timeout(socket);
Source source = source((InputStream)socket.getInputStream(), (Timeout)timeout);
return timeout.source(source);
}
}

可见这里其实创建了一个AsyncTimeout对象,利用这个对象来实现超时机制,那具体是如何实现的呢?请看下一小节分析。

3.2AsyncTimeout原理

Okio中的与source()有关的timeout()方法,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
private static AsyncTimeout timeout(final Socket socket) {
return new AsyncTimeout() {
protected IOException newTimeoutException(IOException cause) {
InterruptedIOException ioe = new SocketTimeoutException("timeout");
if(cause != null) {
ioe.initCause(cause);
}
return ioe;
}
protected void timedOut() {
try {
socket.close();
} catch (Exception var2) {
Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var2);
} catch (AssertionError var3) {
if(!Okio.isAndroidGetsocknameError(var3)) {
throw var3;
}
Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var3);
}
}
};
}

可见这里其实就是创建了一个AsyncTimeout对象,这个对象重写了newTimeoutException()和timedout()方法,这两个方法都是定义在AsyncTimeout()中,其中前者用于在超时时抛出指定的异常,如果没有指定则抛出InterruptedIOException,而后者其实是用于在超时发生时的回调,以完成相关的业务操作(在这里就是关闭socket)。

那AsyncTimeout是如何实现超时机制的呢?会不会在这里面有bug呢?

首先找到调用链为Sink.sink()/Source.read()—>AsyncTimeout.enter()—>AsyncTimeout.scheduleTimeout(),这个scheduleTimeout()是很关键的一个方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
private static synchronized void scheduleTimeout(
AsyncTimeout node, long timeoutNanos, boolean hasDeadline) {
// Start the watchdog thread and create the head node when the first timeout is scheduled.
if (head == null) {
head = new AsyncTimeout();
new Watchdog().start();
}
long now = System.nanoTime();
if (timeoutNanos != 0 && hasDeadline) {
// Compute the earliest event; either timeout or deadline. Because nanoTime can wrap around,
// Math.min() is undefined for absolute values, but meaningful for relative ones.
node.timeoutAt = now + Math.min(timeoutNanos, node.deadlineNanoTime() - now);
} else if (timeoutNanos != 0) {
node.timeoutAt = now + timeoutNanos;
} else if (hasDeadline) {
node.timeoutAt = node.deadlineNanoTime();
} else {
throw new AssertionError();
}
// Insert the node in sorted order. 在这里进行排序
long remainingNanos = node.remainingNanos(now);
for (AsyncTimeout prev = head; true; prev = prev.next) {
if (prev.next == null || remainingNanos < prev.next.remainingNanos(now)) {
node.next = prev.next;
prev.next = node;
if (prev == head) {
AsyncTimeout.class.notify(); // Wake up the watchdog when inserting at the front.
}
break;
}
}
}

这个方法主要做了如下两件事:

  • 如果是首次创建AsyncTimeout对象时,会启动Watchdog线程
  • 所有的AsyncTimeout对象构成一个链表,这个链表是按剩余时间由短到长排列的
  • 调用notify()以唤醒等待线程

那么这个等待线程是谁呢?其实就是Watchdog,看一下它定义就知道了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
private static final class Watchdog extends Thread {
public Watchdog() {
super("Okio Watchdog");
setDaemon(true);
}
public void run() {
while (true) {
try {
AsyncTimeout timedOut = awaitTimeout();
// Didn't find a node to interrupt. Try again.
if (timedOut == null) continue;
// Close the timed out node.
timedOut.timedOut();
} catch (InterruptedException ignored) {
}
}
}
}

而awaitTimeout()方法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
private static synchronized AsyncTimeout awaitTimeout() throws InterruptedException {
// Get the next eligible node.
AsyncTimeout node = head.next;
// The queue is empty. Wait for something to be enqueued.
if (node == null) {
AsyncTimeout.class.wait();
return null;
}
long waitNanos = node.remainingNanos(System.nanoTime());
// The head of the queue hasn't timed out yet. Await that.
if (waitNanos > 0) {
// Waiting is made complicated by the fact that we work in nanoseconds,
// but the API wants (millis, nanos) in two arguments.
long waitMillis = waitNanos / 1000000L;
waitNanos -= (waitMillis * 1000000L);
AsyncTimeout.class.wait(waitMillis, (int) waitNanos); //这里其实是把waitNanos一分为二,比如1000003分为1ms和3ns,其实通过waitNanos/1000000L和waitNanos%1000000L也可以实现,不过采用减法更高效
return null;
}
// The head of the queue has timed out. Remove it.
head.next = node.next;
node.next = null;
return node;
}

结合上面两个方法可知,Watchdog线程有个死循环,在每次循环中会取出链表的头部节点,然后检查它是否已经超时,如果还没则陷入等待;否则就将头部节点从链表中移除,然后返回头部的下一个节点,此时由于该节点已经超时了,所以可直接调用它的timedOut()方法。

3.3 System.nanoTime()

这里需要注意的一点是System.nanoTime()与System.currentTimeMillis()方法的区别:

  • System.nanoTime()返回的是纳秒,nanoTime可能是任意时间,甚至可能是负数,因为它可能以未来某个时间点为参照。所以nanoTime的用途不是绝对时间,而是衡量一个时间段,比如说一段代码执行所用的时间,获取数据库连接所用的时间,网络访问所用的时间等。另外,nanoTime提供了纳秒级别的精度,但实际上获得的值可能没有精确到纳秒。
  • System.currentTimeMillis()返回的毫秒,这个毫秒其实就是自1970年1月1日0时起的毫秒数,Date()其实就是相当于Date(System.currentTimeMillis());因为Date类还有构造Date(long date),用来计算long秒与1970年1月1日之间的毫秒差

可见,Okio中使用System.nanoTime()来衡量时间段是一个很好的选择,既保证了足够的精度,又能保证不受系统时间的影响,因为如果采用System.currentTimeMillis()的话如果在超时等待的过程中系统时间发生变化,那么这个超时机制就可能会提前或延后,那样显然是不可靠的。

3.4 okhttp超时总结

再回到3.1节开头,它们调用的timeout()方法其实是Timeout类中的方法:

1
2
3
4
5
6
public Timeout timeout(long timeout, TimeUnit unit) {
if (timeout < 0) throw new IllegalArgumentException("timeout < 0: " + timeout);
if (unit == null) throw new IllegalArgumentException("unit == null");
this.timeoutNanos = unit.toNanos(timeout);
return this;
}

显然,这里就是将传入的时间转化为纳秒,这个timeoutNanos在scheduleTimeout()会用到。

综合前面3个小节,可以得到如下结论:

  • Source,Sink对象的超时都是通过Timeout的子类AsyncTimeout来实现的
  • 所有的AsyncTimeout对象构成一个链表
  • 每个AsyncTimeout在会按照它的剩余时间来插入到链表中的合适位置
  • 有一个叫Watchdog的daemon线程会维护该链表,如果发现链表头部节点还没超时,则会陷入等待;否则将该节点从表中移除,并且调用它的timedout()方法,在该方法中会完成相应的操作,比如socket.close()操作

目前看来,okhttp以及okio的超时机制的实现是足够可靠和准确的,并没有发现什么bug,既然这样,那只能从其他地方入手了。

4.竟然是默认参数的锅

既然okhttp的超时机制没什么问题,那就从业务直接调用okhttp的代码入手吧,由于是调用Retrofit中Call.enqueue()方法,那就从这个方法入手吧。

看过我博客中Retrofit源码分析的同学,应该知道其实这里的Call其实是OkHttpCall对象,这个类是为了将Retrofit与okhttp进行衔接而创造的,它的enqueue()方法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
@Override public void enqueue(final Callback<T> callback) {
if (callback == null) throw new NullPointerException("callback == null");
okhttp3.Call call;
Throwable failure;
synchronized (this) {
if (executed) throw new IllegalStateException("Already executed.");
executed = true;
call = rawCall;
failure = creationFailure;
if (call == null && failure == null) {
try {
call = rawCall = createRawCall();
} catch (Throwable t) {
failure = creationFailure = t;
}
}
}
if (failure != null) {
callback.onFailure(this, failure);
return;
}
if (canceled) {
call.cancel();
}
call.enqueue(new okhttp3.Callback() {
@Override public void onResponse(okhttp3.Call call, okhttp3.Response rawResponse)
throws IOException {
Response<T> response;
try {
response = parseResponse(rawResponse);
} catch (Throwable e) {
callFailure(e);
return;
}
callSuccess(response);
}
@Override public void onFailure(okhttp3.Call call, IOException e) {
try {
callback.onFailure(OkHttpCall.this, e);
} catch (Throwable t) {
t.printStackTrace();
}
}
private void callFailure(Throwable e) {
try {
callback.onFailure(OkHttpCall.this, e);
} catch (Throwable t) {
t.printStackTrace();
}
}
private void callSuccess(Response<T> response) {
try {
callback.onResponse(OkHttpCall.this, response);
} catch (Throwable t) {
t.printStackTrace();
}
}
});
}

显然,这个方法的主要目的就是调用okhttp3.Call的enqueue()方法并且将okhttp3.Call的回调最终转换为Retrofit中的回调。而这里的call其实是okhttp3.RealCall对象(因为OkHttpCall中的createRawCall()调用serviceMethod.callFactory.newCall(),而callFactory其实就是OkHttpClient对象,OkHttpClient的newCall()方法返回的是RealCall对象),RealCall的enqueue()方法如下:

1
2
3
4
5
6
7
8
@Override public void enqueue(Callback responseCallback) {
synchronized (this) {
if (executed) throw new IllegalStateException("Already Executed");
executed = true;
}
captureCallStackTrace();
client.dispatcher().enqueue(new AsyncCall(responseCallback));
}

显然,这个方法创建了一个AsyncCall对象并且调用dispatcher()这个调度器来处理:

1
2
3
4
5
6
7
8
synchronized void enqueue(AsyncCall call) {
if (runningAsyncCalls.size() < maxRequests && runningCallsForHost(call) < maxRequestsPerHost) {
runningAsyncCalls.add(call);
executorService().execute(call);
} else {
readyAsyncCalls.add(call);
}
}

这个方法非常重要,因为就是在这里潜藏着用户等待时间比timeout更长的危险,注意这里的两个限制条件:

  • 第一个是当前运行的请求数必须小于maxRequests,否则就加入等待队列中。而maxRequests默认值是64
  • 第二个是runningCallsForHost(call)必须小于maxRequestsPerHost,也就是说属于当前请求的host的请求数必须小于maxRequestsPerHost,否则就先加入等待队列中。而maxRequestsPerHost默认值非常小,为5

再看一下调度器中线程池的创建:

1
2
3
4
5
6
7
public synchronized ExecutorService executorService() {
if (executorService == null) {
executorService = new ThreadPoolExecutor(0, Integer.MAX_VALUE, 60, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>(), Util.threadFactory("OkHttp Dispatcher", false));
}
return executorService;
}

显然,调度用的线程池足够大,一般情况下maxRequests默认为64也足够使用了。

但是! 凡事就怕个但是!

如果是弱网环境,请求密集,并且timeout设置得比较大的情况下呢?

那么,就有可能发生如下情况:

  • 正在运行的请求数在短时间内(极端一点,比如3s内)就超过maxRequests,那么在3s之后的请求都只能先进入等待队列,然后如果网络足够差,每个连接都是等到发生超时异常后被迫关闭,那么就意味着在3s之后的请求至少要等待timeout-3s的时间,这个时间再加上它自身的timeout,那么用户的等待时间就是timeout-3s+timeout,显然这个值远大于timeout了
  • 虽然总的请求数不密集,但是恰好在某个很短的时间段内针对同一个host的请求比较密集(类似地,比如3s内),那么在3s之后针对这个host的请求也要先进入等待队列中,同样地在这之后的请求,用户至少要等待timeout-3s+timeout的时间

再结合业务中的初始化代码发现,并没有对于Dispatcher中的maxRequestsPerHost进行自定义设置,也就意味着同一时间对于每个host的请求数不能大于5,那么考虑到我分析的这个业务请求对应的host下有很多请求,并且业务同学在这个地方其实也犯了一个低级错误,就是在点击隐藏加载框时,没有及时取消掉对应的请求,这样其实也造成了请求的浪费。

为了验证这个结论,查看了10多位用户发生超时远大于timeout的日志,发现都是在Ta们的网络切换到2G时发生,说明这个结论是可靠的。

4.解决方法及使用okhttp的建议

找到了原因之后,解决办法就很简单了,这其实也是使用okhttp的一点建议:

  • 初始化okhttp时,将Dispatcher中maxRequests和maxRequestsPerHost都设置得比默认值大一些
  • 当用户点击隐藏加载框时,需要把对应的请求也及时取消掉
  • timeout尽量设置得小一些(比如10s),这样可以减小弱网环境下手机的负载,同时对于用户体验也有好处