Recently, I was trying to write a little client script for Apache Solr. It was just supposed to add some data, retrieve it, and delete it. I thought I’d just do obvious things. But as it turns out I actually had to read carefully.
You see, if I post to http://localhost:8983/solr/my_collection/update/json/docs
a JSON document {"id": "1","title": "Doc 1"}
, it shows up somewhere other than http://localhost:8983/solr/my_collection/update/json/docs/1
.
Solr is essentially a search engine. Solr has a HTTP/JSON-based API. However, it isn’t exactly a REST API because it doesn’t really follow any of the rules for a well-defined REST interface. I whine about this sometimes. Maybe one day if I have time I’ll even fix it. (I work for Lucidworks, which does most of the development of Solr.)
But my script experience reminded me that REST’s rules (okay, guidelines) are there for your protection. They make it really easy to write to your API, and they make it behave in a predictable manner. If you really follow these rules, you can almost mount the thing you’re writing the API for as if it were a drive.
Here are the basics:
- HTTP
GET
should be used for all retrieval. It should never be used to create, update, or do things. - HTTP
POST
should be used for creating. It shouldn’t be used to update or get a resource. If a URI had never existed before now and you’re going to create it and make it hold some data, usePOST
. - HTTP
PUT
should be used for updating — meaning replacing a collection with different data. The URI should have existed before. - HTTP
DELETE
should be used for deleting.
Following from those basics:
- HTTP
GET
should never create anything or have side effects.GET
s can be cached, so your system needs to tolerate this. - HTTP
POST
should not be required to get or update something en masse. - If you
POST
something to a URL, you should be able toGET
it from that URL. If that thing is a collection, you should be able toGET
it from a subcontext of that URL. For example:curl -X POST --header "Content-Type: application/json" -d '[{"id":"1", "name":"Don Draper"}, {"id":"2","name":"Betty Draper"}, {"3":"Joan Holloway"}]' http://localhost/myservice/characters
should allowcurl http://localhost/myservice/characters
to return that whole array, butcurl http://localhost/myservice/characters/1
should return just{"id":"1", "name":"Don Draper"}
.DELETE
should work the same way, to hit the collection or the item. - Never have verb URLs like
/addNew
. The HTTP stuff provides the verbs. - Use
POST
for appends. But there’s a bit of a hole to watch out for: What if you have a collection with 1,000 items and you want to add 500 but not rewrite the 1,000? In general, just usePOST
.
For results, HTTP has return codes. These are the basic ones your services should return:
- 200: Done, it was okay. Generally, your
GET
s return this code. - 201: “Done, and created.” Generally, your
POST
s return this code. - 204: “Done, and no body.” Generally, your
DELETE
s return this code. - 400: “Client sent me junk, and I’m not going to mess with it.”
- 401: “Unauthorized, the client should authenticate first.”
- 403: “Not allowed. You can’t have it because you logged in but don’t have permission to this thing or to delete this thing.”
- 404: “Can’t find it.”
- 410: “Marked as deleted.”
- 451: “The government made me not show it.”
If you allow the client to specify “how much” or to filter the results, that should be done via query string parameters. For example: ?page=1
or ?q="name:*Draper"
. This shouldn’t be inherent in your API. It is fine to prevent the client from hurting itself by erroring if it asks for a whole collection and that collection would return a million rows.
Finally, think about the future. Consider explicit versioning such as in http://localhost/myservice/v1/character
s so that when you inevitably break something and have a new version of your API, you can either be backward-compatible or error out and warn the client as to why it can’t get in any longer.
In summary, your API should look nothing like Twitter’s crap “REST” API.