Yeah, you can cache onto S3. That wasn’t what I meant by bulk download. That leads the way of now designing APIs, which is you could bulk cache off S3. But we’re now talking about just doing it, how we actually implement an API.
You normally got a problem, because it becomes almost unusable to people, because they need something like an API to find the stuff [laughs] inside your S3. I’ve got you. That’s what we do in open spend.
Even then, the functionality of API first is that disc. How do I put it? Normally, once you’ve put that, let’s say you did take your cache, your whole structure in API, onto S3.
Yeah. What I’m saying is, even then you can have enough explosion in the size of your cache space, that you don’t do it. That’s why. The point is, in general, is yes you could.
We have a queue, right? You can reify the whole queue. We’ve done, and you can look at the space constraints...
It depends. It varies. I’ve done this with open spending.
First of all by the way we’re getting a little bit technical here, but to explode the query space and the official API into a completely rarefied form, to basically cache onto S3, generally has a huge space explosion complexity.
You could split it into the individual records. Even there, I would offer it is better, compared to running an API.
I know that you could say...
Granular APIs. Just to be clear, I don’t consider S3 an API in that sense. It’s just basically a basic bulk file download tool. Otherwise, we’re just starting to confuse our term API.
Whereas, that is not true as a cost of supply over an API, not zero.
One of the reasons I was explaining is, the logic of open data makes sense, because it’s reasonable to go to government and say the cost of supply of this data is essentially zero once you’ve produced it.
In my experience, when you go to, whether it’s governments or even corporations, running those APIs has a significant cost ‑‑ possibly running to tens or hundreds of thousands of dollars a year ‑‑ versus providing raw bulk data, has a cost in the dollars, or hundreds of dollars, or ...
The point I was saying, at some point they might converge, that’s fine.
APIs, basically the cost per request is not trivial. Whereas, relatively the cost of serving off S3 is again not truly zero, but it’s very, very low relative to the cost of serving each API request.
It should do. That’s not just that it’s accidental. That’s a good idea, because as an economic model it has a cost.
It might be that you get, 10,000 requests a month free, it might be 20,000, but at some point you pay for requests.
So, the cost implications. Generally, it’s not necessarily reasonable to say people should have free APIs. That APIs should be costless. Most people, most companies in the world, they charge for API usage at some level.
But my point is that, this system down here, which is an API, running APIs they need a DevOps person, they go down, they need to deal with spikes in traffic. The cost of running web servers for APIs, if you just look at even AWS pricing models, the cost ...
Perfect idea.
There’s a web API. Could we just not use that term, API? Let’s call that a format.
S3, my point is that, the cost of raw data read, let’s be clear what we mean by APIs. There’s a knowledge API, which is thin layer format. Otherwise, we’re overusing the word API.
In my experiences, orders of magnitude more expensive. You need sys admins and DevOps people. Having to run a bunch of APIs, they go down...
Open data makes sense because it’s cheap to give out data. It’s not cheap to write APIs. That way, you can...
You can also give out the software if you’re doing that. If other people want to run APIs, also a point about that, and this is actually the one. So that’s point one. The other point is, what’s the logic of open data?
Then, the part of it that pulls it into to run an API, you’ve got a nice partition line, here, a nice service break line where you can, A, test.
One of the nicer things about that is, you have a nicer sandboxing and partitions. For example, you can then open source stuff. You can say, "We’ve published from," whatever. You talked about the NDC, every week, pulling in the new version... That should go to S3, or whatever your ...
The thing I try to emphasize is saying, "Hey, rather than doing that directly, why don’t you...?" This should really be bulk export into this database. That bulk export also goes to that, say, S3. Then, from here, you pull it into your read API.
It depends on your model ‑‑ Read API, which is you pull the data into your own local database, so you don’t endanger the SLA on the main one. Then, you’ve got the world out here.
One design you could think about in the technical architecture, and sometimes a little bit artificial, but if you’ve got a database here, which is normally your internal read‑write database. It’s going to be Oracle or something in some ministry. One model you can have is like, "Hey, basically we ...
I guess, as a designer, with this architecture...
One of my points is, once you’ve got bulk data, you can build APIs. We’re being a little bit...but in terms of the principles, so go back here.
We’ll go take the data and build our own app. It was very successful. Millions of people visited. Obviously it completely substituted for other third‑party usage. It’s similar on the API a little bit, obviously government does need its own APIs, but as you go the route of API, you ...
Then, they did a police one. They decided they wanted to release police crime data. They said, "Well, this one? One of the things last time was, we didn’t have enough traffic on our...not enough people came to our website, so we’ll build our own map."
The best story I have on this one is always, a little bit, was UK government was releasing data, and they released, for example, financial data. A lot of people used the data, for example, The Guardian used it, and other people...
Personally, I am quite concerned about it. First of all, it leads to a point where you don’t need bulk data. No one’s asking for it.
There’s a trend in this direction.
That’s the way to deliver services. I’ve seen it in Finland recently, and so on.
I bet a lot of governments now, the last 6 to 12 months, who are going, "Oh! API-first. We don’t need to do bulk data download." No one wants it. We just want to do APIs first.
In my experience, it does matter, a lot. It also goes back, the other reason I’m a bit concerned about APIs, why I get on a bit...
I got that, but what I’m offering, what I’m just saying is basically there can be a tendency when you also have this API focus, and people talking about open APIs generally that they still have to think that the software stuff, like you were talking about, the black box ...
They can evolve.
It’s a struggle to do that. APIs ultimately, first of all, if you get very rigid about APIs, for example you talked about your scorecard. There’s a good aspect of that, but there’s also a bad aspect which is you can write API at procurement time, that turns out to ...
If you look at, for example I’ve implemented S3‑based APIs for years, because everyone copied S3 APIs. If you look at the stack for Boto, and you look at Google, it’s a very, very powerful, rich company. They struggle to make it compatible with the Amazon API. You have a ...
What does it matter? The truth of it is, first of all, APIs are quite brittle.
We’ve got this Socrato open data API. It doesn’t matter that you’ve got...
They first of all try to pretend they’re open source, but then they’re not. When they sometimes say, "Oh, you know, it doesn’t matter that we’re not open source, because everything is going open API."
I see this argument sometimes. For example, I see it in, there’s a vendor who supplies data platforms called Socrato. Socrato make a big play that, because they competed for years with CKAN. I’ve seen their pitches. They go on and on about how, "Look, we..."
One thing that I think is important, that I think at the moment, is that sometimes it’s like Google Takeout. People think a little bit, that if they have access to the data or they’ve got a well‑defined API, it doesn’t matter whether they buy proprietary or not buy it ...
You could also talk about freedoms act as freedoms, but yeah, exactly. I just say on the API stuff, it’s just open APIs have been so abused, and it needs to... On the point you said at the end about procurement, that is great.