Introduction to HTTP

Hello, in this post we’re going to dig a bit into HTTP. I assume no networking knowledge beforehand, so you can freely read this article if it’s your first step into computer networks. HTTP is a rather simple and easy protocol, and it’s a good choice if it’s your first one. Please note that I go (exceedingly in depth) and not always it is required to read everything in order to understand the jist of this. You decide the amount of depth that interests you, there’s absolutely no need to read 100% in order to produce value of the texts / to understand http.

I do not explain every single term I use. This is mostly a fault of mine; I can not explain nor know precisly what you know or not know about the things I write about. I highly recommend extensively googling everything you can not understand, and if you still find yourself with questions – I’m happy to help at ([email protected], @RainbowBash on telegram)

Reading instructions aside, this is the contents of the article:

  • how it defines requests and responses
  • What is a HTTP request?
  • Get, POST
  • Cookies
  • file uploads
  • What is a HTTP response?

Lets roll!

REQUEST & RESPONSE

HTTP stands for Hypertext Transfer Protocol, it’s a stateless protocol (it doesn’t keep a state, we’ll talk more on that later) that your browser and the server it talks to use in order to request website content, and receive it.

Stateless, meaning that it doesn’t actually keep a state. There’s a lot of protocols that allow you to form a “conversation” of which both sides send commands and messages that are contextualized. HTTP does not work that way. It sends out a reuqest, it receives a reply. You want to continue? You’re going to send an entire new request.

http request

GET /welcome.php HTTP/1.1
Host: host.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7
Cookie: PHPSESSID=ssh7qo6ioklmm94sh97fplpng7
Connection: close 

Looks kinda like this. Lets step through this.

HTTP METHOD

GET is a reqeust method. There’s multiple types of request methods:

  • GET
  • POST
  • HEAD
  • OPTIONS
  • PUT
  • DELETE
  • TRACE
  • CONNECT
  • PATCH

lets focus on two main ones – GET, and POST.

GET is basically “give me”, it is the most basic and essential request method in the protocol. It tells the HTTP server to give us whatever follows after the GET message by specifying a “path” – a relative location in the HTTP server that it can serve us the file.

Headers

After that, we’ll specify our protocol version using the “HTTP/1.1” bit. Why? Because according to the RFC(https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5), the only must-have parts inside our HTTP packet are the first line. A method, a resource (the file), and the HTTP version.

GET /welcome.php HTTP/1.0

The rest of the headers are “modifiers” in a way, they’re enabling us to further improve our browsing exprience and enrich the content we recieve online. The format boils down to:

Header-Name: Header-Value; More Value, This is ok too;\r\n

while \r\n is the newline HTTP uses to diffrentiate between one header and another. We can see proof of this information on the RFC.

RFC specification:

       Accept         = "Accept" ":"
                        #( media-range [ accept-params ] )
       media-range    = ( "*/*"
                        | ( type "/" "*" )
                        | ( type "/" subtype )
                        ) *( ";" parameter )
       accept-params  = ";" "q" "=" qvalue *( accept-extension )
       accept-extension = ";" token [ "=" ( token | quoted-string ) ]

RFC: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Also, here.

HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
   protocol elements except the entity-body (see appendix 19.3 for
   tolerant applications). The end-of-line marker within an entity-body
   is defined by its associated media type, as described in section 3.7.

CL RF is \r\n, this marks the end of an HTTP line in the protocol. It works for every single part, except something called the entity-body. We’ll learn about it too. Also, remember not to use \n only, the protocol only treats \r\n as line breaks!

Fun fact, in HTTP/1.0 the only thing we must include in our HTTP request is the first line.

Starting from HTTP/1.1, we must also include the Host header

Host

The “Host” header allows us to specify exactly which website we actually want to receive when we talk to the server. It’s important to note that one server does not equal one website. One server can include hundreds of websites! It boils down to how we define a website. In order to do that, lets define some terms.

  1. A website is a bunch of pages {index.php, welcome.php, tutorials.php} grouped together by an address.
  2. That address is usually called a domain {google.com, wtfismyip.com, reddit.com}.
  3. a page can mean multiple things, for now we can look at it as a file.

now lets imagine we have a bunch of folders in a directory called “websites”, our HTTP server works from this directory

Every single time a browser client wants a page out of one of our websites, it needs to tell us the name of that website. Otherwise, our server has no idea which website the client wants!

Host: reddit.com

In HTTP1/0 the base assumption was that 1 server = 1 website. However, concepts such as “Shared Hosting” that companies invented, allowed them to host multiple sites on a single server (thus lowering costs of maintence for clients), and so in the HTTP/1.1 version we suddenly have this new addition.

So yeah, request method, resource name, HTTP version, and Host header. What’s next?

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7
Cookie: PHPSESSID=ssh7qo6ioklmm94sh97fplpng7
Connection: close

Lets continue looking at the headers. Whilst these aren’t neccesary for the bare minimum HTTP experience, it is important for us to learn about them and understand them.

User-Agent

is how our clients declare themslves to the server. Our client could be a desktop browser, a phone, a python script, a command line client or even a bot. HTTP specifies an option for us to further explain exactly what we use in order to talk HTTP to the server.

This information allows the server to make adjustments to the sort of reply we’re going to get from him, based on who we are. If we’re a phone, maybe it’s best to first show us a different page that also talks about the new application the site has? If we’re an old browser, maybe it’s best to restrict some of our newer functionality of the website and kindly ask the user to please upgrade to a better product? Or we just want to further understand our userbase and know what it uses.

Our current line:

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36

We can recognize a bunch of information out of this straight away

Remember: It’s user input. The traffic comes out of the user, so the user can decide what’s inside this value.

Accept:

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9

This header specifies to the server the sort of content we expect to receive. The server may or may not respect our preference, but allowing ourselves to specify exactly what we prefer and in which order allows us to affect the type of content we get.

Maybe the server has several versions of the content we are interested in. That’s cool, but if we have a perference, why not stick to it? We attempt to dictate to the server what we want to see, and if it does not exist, the next-in-line content type we’re interested in. The textual values here (“text/html”, for example) are MIME types, which is basically a an identifier for a type of content that has a specification, a file format, and basically is a thing people recognize. It could be an image, a video, a flash applet, or any dozen different things! Read more here if you want: https://stackoverflow.com/questions/3828352/what-is-a-mime-type

Accept-Encoding

Accept-Encoding: gzip, deflate

in this header, we specify to the server how we compress our requests so the server would know how to open them. Compresion is extremely important in HTTP seeing how inefficient a protocol HTTP is (a topic I’ve yet to explain more about, but simply think about how much of the headers you’ve learnt so far and how… textual in nature they are. Assume there’s more protocols around, and some of them aren’t so textual. Do we actually need our protocols to be understood by humans?)

In regards to how gzip works, you can read about it here: https://en.wikipedia.org/wiki/Gzip

Accept-Language

similar to how Accept works, Accept-language is a variant which tells the server the kinds of languages we prefer to see in the reply.

Cookies

Remember when we said HTTP does not save states? Well, the RFC says so.

But then there’s a new RFC saying otherwise 🙂

https://tools.ietf.org/html/rfc6265

HTTP Cookies are a way for a website to know who we are. That can be done by planting little textual “files” on our computers (In an implementation chosen by our browsers), these textual “files” hold some pieces of information on these cookies.

Chrome can present this information to us in a nice way using it’s DevTools.

We can see each cookie holds a name, a value, a domain, a path (inside that domain), an expiration

date, a cookie size and whether it’s “HTTP Only”, as well as “Secure” or “SameSite”.

Without going needlessly in-depth (Don’t worry, we do that soon enough) here, the name of the cookie is how the website calls the cookie it gives you. It uses the header “Set-Cookie” followed by a cookie name, a “=” seperator, and a value. Like so:

Set-Cookie: SIDCC=AJi4QfHiHQvwm0bOps6slcBRh1masRfEZqD-KPfAgWybrb9GBzNOeuoBgPIa7qnfRHZt2PGqdFSI; expires=Sun, 09-May-2021 12:40:04 GMT; path=/; domain=.google.com; 

A lot of cookies will hold seemingly random information inside them. It may not mean a lot to us, but to the server it could be everything it needs to know about us. That works through concepts like utilization of unique identifiers and sessions – You can google about it if you want. For now, we can pretend that long string is a key, and that key is used by the website to aggerage information about us in one way or another. Kinda like a struct has a name and a bunch of values assosiated to it.

We can see it will expire someday. Afterwards, we may need to request that cookie again. We may or may not get the same value again. See how this may apply in login systems? Good login, get cookie with your “random” name only you have. Bad login, you don’t get a cookie.

It has a path. That / path means it applies to the “web-root” of the site (all of it), sometimes it won’t be that way. So we can give cookies to only parts of the website, not all of it. Good practice. It’s better to limit resources and information to only the areas it explicitly needs to be used.

We can see the domain is “.google.com”. This may mean this cookie will be read by every single domain under “google.com”. Like a wildcard.

Stuff like “SameSite” and “HttpOnly” are extensions, flags that can change how this cookie works and behaves under certain requests or conditions. You can google about these to learn more.

Connection

This header specifies how the underlying network protocol stack should behave in relation to our current browsing session. Basically, it specifies whether we want to keep the underlying connection (Not the HTTP connection – There isn’t any. Remember, stateless protocol) but the things under the HTTP that keeps it going. Without mixing up, errors, duplicates, or any of the sort!

The Connection headers can either be Keep-Alive (tells the underlying connection to not disconnect, will be this value if our browser thinks it’ll talk a lot to this server, maybe if it has multiple different resources our site uses, or it’s the site itself we’re talking to)

Otherwise, it can simply be Connection: closed. Fun fact, this entire header is deprecated in http/2!

http response

HTTP/1.1 302 Found
Date: Fri, 01 May 2020 19:42:21 GMT
Server: Apache/2.4.18 (Ubuntu)
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
location: index.php
Content-Length: 439
Connection: close
Content-Type: text/html; charset=UTF-8


<html">
   
   <head>
      <title>Net Tool v0.1 </title>
   </head>
   
   <body>
	<h1>Net Tool v0.1</h1>
	<form method="POST" action="">
	<select name="command">
		<option value="traceroute">traceroute</option>
		<option value="ping -c 1">ping</option>
	</select>
	<input type="text" name="host" value="8.8.8.8"/>
	<input type="submit" value="Execute!"/>
	</form>
	      <p><a href = "logout.php">Sign Out</a></p>
   </body>
   
</html>

Looks similar enough, lets break our example down..

Status Code

HTTP/1.1 302 Found

We start by confirming we’re on the same page and saying we’re talking the same protocol version (Otherwise the conversation wouldn’t have happened), and continue by giving a status code.

A status code, in a crude yet awfully reasonable explanation, goes like this

  • 1xx: hold on
  • 2xx: here you go
  • 3xx: go away
  • 4xx: you fucked up
  • 5xx: I fucked up

it’s a bunch of numbers grouped together by a general sentiment, and each number has a different meaning. 404 for example, is file not found. 200 would be “OK”, all’s fine. What we received, specifically, is 302 – a redirect response. It means we receive a reply (as you can see below the headers).

Redirect responses would usually be accompanied by location: headers (which we can also see down there) specifying where the browsr should direct the next request towards.

Date

Date: Fri, 01 May 2020 19:42:21 GMT

This is the date header, specifying when exactly the response before us had been generated. This is extremely useful in cases where I don’t want to needlessly ask again and again the website for the same page over and over again, I can just ask him whether had anything changed in the page since yesterday morning, and if there’s a new change, only then give me the page. This is called If-Modified-Since and is part of HTTP conditional get requests (https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests) which you can read more about if you want.

Expires

Expires: Thu, 19 Nov 1981 08:52:00 GMT

Following, there’s the Expired header. It serves to mark the page with X amount of time to be “fresh”, basically signaling the client when it’s expected of him to possibly have a new version of itself. The client is supposed to uphold this recommendation and get new versions of this page for at least this much time each interval.

The cache-control header aids in these tasks, so I saw no reason to delve into it.

Server

Server: Apache/2.4.18 (Ubuntu)

The server header specifies information about the the underlying technologies the website uses in order to properly work, this is most notably seen filled by the httpd software itself – software like Apache, nginx, IIS, lightspeed httpd, Tomcat and more can be observed time to time appear here.

Content-Length

Content-Length: 439

is a header specifying the character length of the body of the http reply. This information is extremely important for us when for example trying to size up and attempt downloading a file – We always like to know how much a file weighs without having the need to first fully download in order to know it’s size.

So this is how it works! 🙂

Content-Type

Content-Type: text/html; charset=UTF-8

Remember the MIME types I explained in the Accept header? It’s the same here. First we prefered a certain order of types of pages, and then we get one of them (at least this time).

This one specifies that the content below will be of textual HTML.

<html>
   ...
  
</html>

That’s the body. It’ll contain the client side code your browser parses, interprets and executes in order to present the website page for you. You can always see all of it one way or another, otherwise your browser will not be able to execute the instructions and do the work to present it.

If the browser can present the page for you, then you can open the code responsible for all of the work your browser did and understand it. It must be this way. If the browser can understand it, you can understand it.

HTTP Request Parameters (Get & Post)

Lets look at an HTTP request with parameters.

GET /json?a=b&c=d HTTP/1.1
Host: wtfismyip.com
Connection: close
DNT: 1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7

We can focus on the first line, and see how after requesting /json, we do “?a=b&c=d”

If we go back to the RFC here –

and look at all the possible symbols and their meaning

       token          = 1*<any CHAR except CTLs or separators>
       separators     = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT

we can see the question mark symbol is supposed to seperate something. CTRL-F’ing a little more, we can actually see a section

3.2.2 http URL

which details a little bit more about this:

   The "http" scheme is used to locate network resources via the HTTP
   protocol. This section defines the scheme-specific syntax and
   semantics for http URLs.

   http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

now we can see the ? symbol seperates a query. What query?

Digging deeper into https://tools.ietf.org/html/rfc1738#section-3.3 (The URL specifier, which is itself part of the URI specifier – https://tools.ietf.org/html/rfc3986) we can get another hint about this. Also there’s information here: https://tools.ietf.org/html/rfc3986#section-3.4

We learn this is not an HTTP thing. This is a URI and URL thing.

3.  Syntax Components

   The generic URI syntax consists of a hierarchical sequence of
   components referred to as the scheme, authority, path, query, and
   fragment.

      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

      hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

   The scheme and path components are required, though the path may be
   empty (no characters).  When authority is present, the path must
   either be empty or begin with a slash ("/") character.  When
   authority is not present, the path cannot begin with two slash
   characters ("//").  These restrictions result in five different ABNF
   rules for a path (Section 3.3), only one of which will match any
   given URI reference.

   The following are two example URIs and their component parts:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

HTTP borrows and utilizes concepts already familiar and known to us in other protocols which also borrow and utilize a thing called URL scheme. HTTP uses this, sure, but it’s not HTTP itself or something it invented. It is simply how the protocol does it’s requests and handles additional information inside those requests.

The query, or query string (also defined here: https://en.wikipedia.org/wiki/Query_string) is a set of values seperated by the “&” sign, while each set of values works as a key-value pair. in our case

GET /json?a=b&c=d HTTP/1.1

‘a’ is a key, a name that we can use to open a “door” or retrieve a value. Here, the value is b.

we then use the & seperator in order to make another statement, a new pair. C holds the value d now.

This may look a little odd for you – how come we are making a requestwhile also stating information?

In order to make a good request for a complex website page that does some logic with our request, maybe the page wants us to specify inside the page what part we want. Maybe it wants to specify in what order we want the page to print itself. Maybe we’re getting the entire book, and we want to specify a singular page out of that bunch. There’s a lot of possible reasons. HTTP Allows us to specify more information in order to help the website do it’s thing and give us precise information.

Alright, that’s enough about GET. Let’s learn more about POST.

POST

This is how our HTTP request would look like, if we did it in post.

POST /json HTTP/1.1
Host: wtfismyip.com
Connection: close
DNT: 1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7
Content-Type: application/x-www-form-urlencoded
Content-Length: 7

a=b&c=d

Suddenly, our GET parameters are stuck at the buttom. How odd.

It seems as if we pushed the parameters to the end, and we’re not using any seperator like (?) to mark where the query string starts and ends. That’s because this is no longer a query string, this is an entity-body.

we have a bunch of new headers. Lets read about them.

14.13 Content-Length

   The Content-Length entity-header field indicates the size of the
   entity-body, in decimal number of OCTETs, sent to the recipient or,
   in the case of the HEAD method, the size of the entity-body that
   would have been sent had the request been a GET.

       Content-Length    = "Content-Length" ":" 1*DIGIT

   An example is

       Content-Length: 3495

   Applications SHOULD use this field to indicate the transfer-length of
   the message-body, unless this is prohibited by the rules in section
   4.4.

The entity-body, judging by the value of Content-Length in our packet (7), is probably “a=b&c=d”. You can count the characters and see it’s 7 too. So this “entity-body” thing must be it. Wait, we also wrote something about entity body in the start of this article.

HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
   protocol elements except the entity-body (see appendix 19.3 for
   tolerant applications). The end-of-line marker within an entity-body
   is defined by its associated media type, as described in section 3.7.

It appears we have a rule and now we see the exception of that rule.

CR LF, the characters we use to seperate our HTTP lines in the protocol, doesn’t work the same with entity-body. It appears as if we have it twice!

Content-Length: 7 CR-LF, CR-LF. Two “enters”. Not one. Interesting.

Because of not having any signs like “?” to mark the start, HTTP specifies double CR LF as the “seperator” of this new content. Oodly enough, you can still pass GET parameters in this POST request, thus allowing you to do a POST request with GET parameters when you need it.

Enough with the RFC tech, we can focus on the packet. HTTP POST request is a different, new HTTP request type that allows us to specify significantly more information to the HTTP website rather than using GET.

In POST, we request a page, but we “post” information to it. The focus here is our information, we are the ones providing information to the page – this is used when we upload files and images to the site, when we perform login actions, or generally when we do any sort of activity that provides info to the website but also doesn’t quite fit in the context of a GET request.

GET requests should be OK to be stored in areas like browser history. Login attempts shouldn’t appear there. Just imagine username=admin&password=hunter2 appearing in your chorme history. It’s not

that fun.

So we have a request that can also store some information, and we have a request that has the dedicated role and responsibility of storing information. In some cases we can use both of these interchangeably, but we’d rather not. Please use GET and POST in their proper roles.

Well, kinda. Google links to a stackoverflow article that says:

RFC 2616 (Hypertext Transfer Protocol — HTTP/1.1) states there is no limit to the length of a query string (section 3.2.1). RFC 3986 (Uniform Resource Identifier — URI) also states there is no limit, but indicates the hostname is limited to 255 characters because of DNS limitations (section 2.3.3).

it then goes on to say how each browser handles query string limitations. Important to note, however, that the limits we care about aren’t only operating at the browser level. The server can limit us too.

https://tools.ietf.org/html/rfc2616#section-10.4.15

10.4.15 414 Request-URI Too Long

   The server is refusing to service the request because the Request-URI
   is longer than the server is willing to interpret. This rare
   condition is only likely to occur when a client has improperly
   converted a POST request to a GET request with long query
   information, when the client has descended into a URI "black hole" of
   redirection (e.g., a redirected URI prefix that points to a suffix of
   itself), or when the server is under attack by a client attempting to
   exploit security holes present in some servers using fixed-length
   buffers for reading or manipulating the Request-URI.

Another interesting header we have is

Content-Type: application/x-www-form-urlencoded

Seems fairly complex, why does it look like that?

We get a hint at https://www.ietf.org/rfc/rfc1867.txt which talks a bit about this “urlencoded” value, but it seems to talk file uploads! We’re not talking about that, yet.

Well, putting the RFCs aside for a moment, we can judge ourselves that it seems our GET parameters simply skipped a bunch of lines are suddenly are now in the buttom request. That’s ok. this “x-www-form-urlencoded” value is for us the equivilant of a GET query string shoved elsewhere.

So how do other values of this Content-Type look? Lets look at file uploads!

file uploads

I uploaded an image to imgur.com. Looks like this.

POST /3/image?client_id=546c25a59c58ad7 HTTP/1.1
Host: api.imgur.com
Connection: close
Content-Length: 77421
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
DNT: 1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryaoztPeq54SLf263s
Accept: */*
Origin: https://imgur.com
Sec-Fetch-Site: same-site
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://imgur.com/a/xzXKgVm
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7
Cookie: frontpagebetav2=1; pp=9596005613620903; IMGURUIDJAFO=ee0952601773b117f0d0df82c19e53d65bddbce1bf0917bd8a62d83dfb9b2075; SESSIONDATA=%7B%22sessionCount%22%3A1%2C%22sessionTime%22%3A1587642545963%7D; is_authed=0; IMGURSESSION=35a7e8ee70bf8e88959b35daca635147; amplitude_id_f1fc2abcb6d136bd4ef338e7fc0b9d05imgur.com=eyJkZXZpY2VJZCI6IjUwOTMyOTk4LTRmYTktNGNkZS1hMWJhLThiYTk5MDM0Y2U2OVIiLCJ1c2VySWQiOm51bGwsIm9wdE91dCI6ZmFsc2UsInNlc3Npb25JZCI6MTU4OTAyNzk5NjU2MCwibGFzdEV2ZW50VGltZSI6MTU4OTAyODAzNjk3MSwiZXZlbnRJZCI6NiwiaWRlbnRpZnlJZCI6NCwic2VxdWVuY2VOdW1iZXIiOjEwfQ==

------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="image"; filename="large.jpg"
Content-Type: image/jpeg

ÿØÿàJFIFÿÿþLavc58.35.100ÿÛC""" ""%&%##"#&&(((00..88:EESÿÀŽ"ÿÄÿÄK	!1AQaq"2‘¡±ÑBRÁáð#br3C‚’ñS¢Â²$cÒDs5ƒâ%4“³òÿÄÿÄ1!1AQaq‘"2¡±B
…O‰lÛ§{Æ+ŸsLN§r·ôÀ­´LÜ3Ñiû»wg:×s±ž¦Àˍâ³!}ÇPð+ U:õ+Ð!Z3	©ÉB$bUA•"D©Z”ƒ2¡qÑBBD©H' •	SOB:Ž>JeU܃GyRÇz`ÌnM8×y¢Fc†ªÍ7M
¼ª=ªjâò8ßîžÅqR„È$J‘„ !BB *‚jŒj5©+J¿½‰Kj]Lð#Ñ4Ôw›•‘õ§ßßùP¹ÔÃiè¦êR	FOÝ}µ¦?~

úŸ¿¿’“eqi£´>»´
ÌsèáJ`¡xy55xÓÓMé¨ âÖCXz·o¢ºᘮñ譚ÚaÀ䞙*ÉdÌvêC£þ ¼ß¹ö…Ð&ÑN•µâº+†›!ZÕ†™…6ZjÌ6·E0uHڀœâ—<
•%Rë™ÕÄ|'äcÔ4:´«j'4;ŽÑš—ô=áÞ¤PŽ°¼6ý@¦,8l9z¤TDS.äÛú;£Ç.õ*`Àꢩ¤ê¡5aQ´$(:ÃÇêŸxÁêÀT©ËHÞ¨–ÓÜ~ð[J"ÐTé[e†µüv;>›;>©#qËÑY|5Ë5•ñàþõ*8Zzì#xÉ_lT‚£®Pº;Pín
’¼ª9¤ÍQ]•™8<lv½/<G]ŽnñˆðL,5áÉHR%®é1»¦<;Š´-Èӎ#Õ:ý3Þ1
dÊv&GTª©n£ÃÐó‘ñÀ -¤QސeUÜÓ§ÏÕN‘ É2]ë7Ô©Zö;ß^jùÅRt»JpÁGUô[ã▣jÇ6R:#ÑDYhn·»š[¾†¾[‰‹•™·À… µm¿è—4­¤,ÁjfµŠÀž3ïw«Ü-U´(CÚrpïR¦™D—ÛñôÃ#¼;ÐeÇmx£ˆQsÑüABgfÑÜRهBÇnU
Ÿzy´ì§qPûK·)¼«ýFsy¦þc3	ÞÒýÊ3;Σ¹GEõøNÙsÁ2+·‹BJ£žçgäilôÔ}ŸV÷*¬‘Ñj¶c7šÓRkµHøÃÂÓ^c-ø¬).¼5ó[Ìuàá-W¬îèïIN•W³Y"D-Ø"$a"TÔ3)Ù§$f©P˜W8UUn&ª[email protected].f¤‰
rÈ·$õrR ="r¢BPŸ¯r•Då р%3ä<Jyª/óޒŽLˆ©rUª~ö©ÙšöÓ§dº;½V©rJ}ìQµé¬…–YÁ^k֛g¤É¡RH„¨L	È@"¡ˆNH€D!*aÿÙ
------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="type"

file
------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="name"

large.jpg
------WebKitFormBoundaryaoztPeq54SLf263s--

While it may seems there’s quite a lot to unpack here, there really isn’t.

That huge bulk of binary information is the image I decided to upload to imgur.com. Lets put that info aside. Lets also take off the cookies, they’re not relevant here.

POST /3/image?client_id=546c25a59c58ad7 HTTP/1.1
Host: api.imgur.com
Connection: close
Content-Length: 77421
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
DNT: 1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryaoztPeq54SLf263s
Accept: */*
Origin: https://imgur.com
Sec-Fetch-Site: same-site
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://imgur.com/a/xzXKgVm
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7
Cookie: xxx

------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="image"; filename="large.jpg"
Content-Type: image/jpeg

info
------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="type"

file
------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="name"

large.jpg
------WebKitFormBoundaryaoztPeq54SLf263s--

That still looks like a lot. Lets pretend this is a normal POST request, how would it look like?

POST /3/image?client_id=546c25a59c58ad7 HTTP/1.1
Host: api.imgur.com
Connection: close
Content-Length: 35
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
DNT: 1
Content-Type: application/x-www-form-urlencoded
Accept: */*
Origin: https://imgur.com
Sec-Fetch-Site: same-site
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://imgur.com/a/xzXKgVm
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7
Cookie: xxx

image=info&type=file&name=large.jpg

That looks familiar. So basically, here’s the interesting parts in the POST file request

Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryaoztPeq54SLf263s

------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="image"; filename="large.jpg"
Content-Type: image/jpeg

info
------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="type"

file
------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="name"

large.jpg
------WebKitFormBoundaryaoztPeq54SLf263s--

The Content-Type changed from urlencoded to multipart/form-data.

Then, it specifies a “boundary” value that the protocol uses to seperate between each parameter.

The “image=info” becomes

------WebKitFormBoundaryaoztPeq54SLf263s
Content-Disposition: form-data; name="image"; filename="large.jpg"
Content-Type: image/jpeg

info
------WebKitFormBoundaryaoztPeq54SLf263s

And we see a second Content-Type! a content-type inside content-type.

This is how HTTP does truly complex things, like encapsulate entire different content (like image binary file) inside a request that’s textual (The HTTP protocol is textual).

We can see there’s two different parameters here, this is the only field that has 2 values. This is special for Content-Type. It’s customary and a bit redundant, as we’re sending it either way on “name” parameter.

As for the boundary, it can be anything you want. Replace that big string with “abc”, and as long as it’s consistent between all the boundarys and it’s the same on them, it’ll work.

We learn here that content-type dectates how entity-body seperates itself, how it acts and generally that it can be quite wild and absolutely not as clearly defined in the original HTTP request. We then learn more from google about XMLHttpRequest, or application/json, or any other request type and see this sort of behavior is “consistently inconsistentd”. Each Content-type, whether multipart/form-data, urlencoded or others – acts differently, has different rules, different seperators, it acts differently.

That’s it so far! I wanted to explain as best as I can the request and reply details. In the next parts we’ll talk about POST (multiform part data, encoded, json), auth (Cookies, sessions, Authorization-Bearer), and discuss a bit about how we can send and receive http requests and responses in comfortable ways in multiple tools and programming languages.

Hope you learnt something!

Leave a Comment