This is the third part of our series on push technologies. In this part we will provide a primer on WebSockets and how they work.
Push technologies must cover a range of data and user needs. Parts one and two of this series discuss techniques for providing what amounts to real time server-to-server data exchange, with clients receiving pushed data outputs that can happen after the fact. However, in some cases, you actually need a server-to-client real time interactive exchange, which is where WebSockets come into play. The third part of this series discusses what you can do with WebSockets and how it differs from other push technologies.
What is WebSockets?
WebSockets makes it possible for a client to make a data request to a server, and then receive event-driven responses from that server in real time. Unlike many Web technologies, WebSockets doesn't use a request/response strategy where a connection is opened in the course of making the request, and then closed after it's initially fulfilled. In the case of WebSockets, the connection remains open. The fact that the client doesn't have to continuously poll the server for updates means that the application runs significantly faster and uses resources more efficiently. A WebSockets setup consists of the following elements:
- Client: Makes a request for specific data from the server.
- Websocket Gateway: Provides a websocket interface between client and server.
- Server: Sends updates to the client through the websocket gateway in real time.
The intent of WebSockets is to provide an alternative to HTTP for communication over TCP. The initial client request can take place over HTTP using an upgrade request header, after which, communication takes place using the WebSockets WS protocol. In other words, instead of sending a request to http://<something>, you send it to ws://<something>. Using this technique accomplishes the following goals:
- Reduce HTTP header overhead by transferring only essential information, which produces a significant reduction in resource usage and makes real time communication possible.
- Create a full duplex communication environment to get rid of the need for polling and the request/response architecture. Both client and server can push data in the direction needed whenever needed. The reduction of network traffic also serves to increase application speed by removing the latency normally encountered in Web communication.
- Use a single TCP connection to reduce resource usage.
- Maintain an open connection over TCP to make data streaming possible.
- Overcome limitations with existing technologies such as:
- Polling: The periodic, scheduled, request cycle on the part of the client to obtain information from the server, even when such information doesn't exist, wastes a huge number of resources.
- HTTP Streaming: Even though the connection remains open all of the time, the use of standard HTTP headers increases the file size and reduces efficiency.
To use WebSockets, you must have a compatible browser, which currently includes: Chrome, Edge, Firefox, Internet Explorer, Opera, and Safari. The need for a compatible browser also limits the usefulness of WebSockets (however, the limitation is insignificant unless you have a large client base that uses older browser technology). This article is based on a conventional WebSockets deployment involving a browser on the client side. However, server-to-server WebSocket integrations are a possibility; WebSocket clients are available for most major server platforms including: Node.js, PHP, Python, Ruby, .NET, and Java.
How Does WebSockets Work?
WebSockets are relatively straightforward. However, you can encounter a few oddities. For example, some setups rely on a WebSocket gateway to make it easier to continue to use the server without modification and some have the client interact directly with the server. The following diagram shows the most common setup.
Starting with a Handshake
In all cases, the client follows the same order of interacting with either the websocket gateway or the server. The session begins with the handshake, which relies on the HTTP GET method. What you want to do is upgrade the HTTP connection to a WS connection using a number of request headers like those shown here:
GET /chat HTTP/1.1 Host: example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Sec-WebSocket-Version: 13
The request must use the HTTP GET method and send the method call to the server's GET handler, which is at /chat in this case. This request must use HTTP 1.1. The various headers perform specific tasks:
- Host: Defines the host location.
- Upgrade: Specifies that the server should upgrade the request to a WebSocket, which uses the WS protocol.
- Connection: Defines the connection type.
- Sec-WebSocket-Key: Provides the server with the WebSocket key, which proves to the server that it has received a valid WebSocket request. This key is only used during the opening handshake and isn't the same as the key used to mask data (as explained later in the article). The key comes from an agreed upon source and may be issued by the API vendor. According to RFC6455 the client must generate a new random key for each request.
- Sec-WebSocket-Version: Determines the websocket version.
If the server can't handle the request, it responds with a 400 Bad Request error. Otherwise, it sends an HTTP response with the following headers:
HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
The two pieces of essential information in this case are the message, which is 101 Switching Protocols, and the Sec-WebSocket-Accept header. The header contains a key that verifies to the client that it has contacted the correct server. When the client doesn't receive the correct key, then the server is suspect and communication should end. Interestingly enough, the server derives this key from the base64-encoded SHA-1 of the concatenation of the Sec-WebSocket-Key that the client originally sent, so the Sec-WebSocket-Accept key will differ each time because the client's request key differs each time.
It's important to realize that the headings described in this section are the minimal headings used by client and server. All the normal HTTP headings apply. For example, the server can send the client a cookie using the standard techniques.
Establishing and Using the WS Connection
The use of the WS protocol means doing things in the WebSocket way, with events. Data moves between client and server using a series of messages that include data frames (described later in the article). No matter which language you use, you need to either write custom code or use a library that can receive and interpret the four WS events:
- Open: Occurs when the WebSockets connection is established between client and server using the initial handshake.
- Message: Happens during the entire open phase of the connection. Client and server can both push messages using the same bidirectional TCP connection. Messages can take several forms as described by an Opcode.
- Error: Signals that a communication problem has happened. The message always includes an error code, which may actually signal a normal event. For example, an error code of 1000 specifies that the connection closed normally.
- Close: Occurs when the websocket connection closes.
Viewing the Low-level Details
In working with WebSockets, you might find it helpful to see the details of the communication between client and server. Of course, you can use products like Wireshark to perform this task, but using Wireshark can be difficult and it isn't as if you can directly read the communication in human form. Fortunately, you have access to tools specifically designed to make working with WebSockets easy as shown in the following list:
Dealing with Data Frames
WebSockets rely on data frames to send and receive information. The main reason to use data frames is to enhance security--to make it much harder for an intruder to corrupt the message.