Open Data 📖📡

95 readers
6 users here now

Chatter about “open data” policies, philosophies, activism, and advocacy thereof.

founded 2 years ago
MODERATORS
1
 
 

I think I may have found a gem here:

EN (machine translation, emphasis mine):

Article 3/5.[¹ The communication of federal administrative authorities is clear and recognizable. Federal administrative bodies communicate in a politically and commercially neutral manner. The obligation to communicate in a politically neutral manner shall not apply to the administrative bodies referred to in the second paragraph of Article 1(f). ]¹

What about that exception? We have:

1°[¹ administrative instance:

(f) the federal government's strategic bodies referred to in the Royal Decree of 19 July 2001 on the installation of the federal public services strategic bodies and relating to the personnel of the federal public services designated to form part of the cabinet of a member of a government or a college of a Community or Region; ]¹

I’m not going on a chase to dig that up. But I would like to know if this means all “SPF …” agencies (SPF Economy, SPF Mobility, SPF Finances, SPF Foreign Afairs, etc) are exempt from commercial neutrality.

I also wonder if this apparently accidental legal effect is also accidentally nullified by this clause:

Art. 9. Where the application for advertising relates to an administrative document of an administrative [¹ instance]¹ [² ...]² including a work protected by copyright, the authorization of the author or of the person to whom the rights of the author were transferred is not required to authorize the on-site consultation of the document or to provide explanations about it.

Because I suspect that shitty corps like Facebook have a clause that transfers copyright to Facebook, in which case a request to liberate FB publications by a public service can be brushed off. But then that raises another question. In Belgium, copyright holders cannot transfer their copyright (which is actually to protect the human creator). E.g. the creator of the Smurfs cartoon retains copyright ownership. But then if my understanding is true, does that mean Belgian law is catoring just for the corner case of copyright being transferred outside of Belgium?

2
 
 

This is the 2024 update to the “Law of 11.04.1994”:

EN (machine translation):

Art.3/1.[¹ Federal administrative bodies inform citizens of federal regulations and, in particular, of the rights and obligations arising therefrom. This information includes at least the federal legislative and regulatory standards for the jurisdiction of the administrative body concerned. It is at least published on the website of the administrative body. ]¹


(1 Inserted by L 2024-05-12/18, art. 5, 007; Effective: 15-07-2024)

FR (original):

Art.3/1.[¹ Les instances administratives fédérales informent les citoyens de la réglementation fédérale et en particulier des droits et obligations qui en découlent. Cette information porte au moins sur les normes législatives et réglementaires fédérales relatives aux compétences de l'instance administrative concernée. Elle est à tout le moins publiée sur le site internet de l'instance administrative.]¹


(1 Inséré par L 2024-05-12/18, art. 5, 007; En vigueur : 15-07-2024)

The official website for federal statutes is https://www.ejustice.just.fgov.be/, which is an access restricted website that blocks people on the Tor network.

3
 
 

(crossposted from !exclusive_public_resources)

Musk’s changes to Twitter:

  • must register as a member to get read access to content
  • members no longer see a non-biased chronological linear timeline; an opaque algorithm decides what to prioritise on the timelines, subject to Elon’s hard-right manipulation

It’s perhaps fair enough that boot-licking pawns decide to subject themselves to quasi-brainwashing manipulation. But when GOVERNMENTS use Twitter, people with a legitimate interest in gov communication are forced to register on an exclusive walled garden where they will come under influence of the extreme right (and thus climate denial among other garbage; hence the crosspost to !climate_action_individual).

How to fix this using open data laws:

Public content published on Twitter legally must be openly accessible to all people. Perhaps not by default but certainly by request. This means individuals can submit an open data request to gov agencies who use Twitter, requesting a copy of all content they publish on Twitter. The gov legally must satisfy the request. All tweets must be in an open machine-readable format (csv, json, or xml).

The dataset is “dynamic”, so I believe future updates must be added to the open data. But what I’ve seen in practice is the gov is not diligent about pushing the updates. They may need to be nagged. If they are nagged enough, perhaps they will decide Twitter is not worth it.

BTW, all of this applies to Facebook as well, noting that Cambridge Analytica is why Trump took power in 2016.

4
 
 

Like most libraries, the public libraries in Belgium have a GUI online search page to search their catalog. The websites are often Tor-hostile. Some of them work with a text browser but it’s a bit rough going. And of course it’s impossible for offline people to search for books or media.

The Belgian gov is generally obligated under the constitution and open data laws to share their data. So does that include libraries? I think it would be interesting to have a local copy of all book and movie titles that I can search without having to be online and without whatever limitations their UI creates.

Belgian libraries are subject to some degree of enshitification because they do not implement their own tech. They outsource to private entities like Cisco. And Cisco operates as cheaply as possible. Cisco will not give support and does not care if some people are marginalised. If their captive portal is broken on your device, or you have no GSM number to verify via the captive portal, there is no recourse.

It’s a bit of a blur with libraries what is public and what is private. If the media dataset is held by some private entity, I wonder if it’s regarded as non-public and thus not subject to being liberated by open data law.

5
 
 

In Belgium, the national train service runs a protectionist bot-hostile tor-hostile website that chains users to an enshitified js-plagued GUI webapp. You can only query one day and one destination at a time. It’s the typical shit-show that consumers give in to for this kind of website.

HOWEVER, Belgium’s open data law requires the gov to share any data they get with the public. And for some reason the gov maintains a DB of the train routes and schedules -- which means everyone gets the raw data as a bullshit-free CSV file (but sadly no prices, which fucks everything up as far as being able to avoid the enshitified web entirely).

Does anyone know /why/ the gov gets that data? It would be useful to konw what law compels SNCB to share the info because I wonder if other data can be liberated through the same mechanism (such as bus routes, flights, rideshares, etc). My first thought was customs and immigration must have a need-to-know, but the dataset covers both directions and IIRC it only has good coverage of domestic routes not international (strange).

6
 
 

cross-posted from: https://libretechni.ca/post/302171

The websites of trains, planes, buses, and ride shares have become bot-hostile and also tor-hostile. This forces us to make a manual labor-intensive effort of pointing and clicking through shitty proprietary GUIs. We cannot simply query for the cheapest trip over a span of time for specified parameters of our choice. We typically must also search one day per query.

Suppose I want to go to Paris, Lyon, Lille, or Marseilles, and I can leave any morning in the next 2 weeks. Finding the cheapest ticket requires 56 manual web queries (4 destinations × 14 days). And that’s for just one carrier. If I want to query both Flixbus and BlaBlaCar, we’re talking 112 queries. Then I have to keep notes - a shortlist of prospective tickets. Fuck me. Why do people tolerate this? (They probably just search less and take a suboptimal deal).

If we write web scraping software, the websites bogart their inventory with anti-bot protectionist mechanisms that would blacklist your IP address. Thereafter, we would not even be able to do manual searches. So of course a bot would have to run over Tor or a VPN. But those IPs are generally blocked outright anyway.

The solution: MitM software

We need some browser-independent middleware that collects the data and shares it. Ideally it would work like a special purpose socat command. It would have to do the TLS handshake with the travel site and offer a local unencrypted port for the GUI browser to connect to. That would be a generic tool comparable to Wireshark (or perhaps #Wireshark can even serve this purpose?) Then a site-specific program could monitor the traffic, parse it, and populate a local SQLite DB. Another tool could sync the local DB with a centralised cloud DB. A fourth tool could provide a UI to the DB that gives us the queries we need.

A browser extension that monitors and shares would be an alternative solution -- but not as good. It would impose a particular browser. And it would be impossible to make the connection to the central DB over Tor while making the browser connection over a different network.

Fares often change daily, so the DB would of course timestamp fares. Perhaps an AI mechanism could approximate the price based on past pricing trends for a particular route. A Flixbus fare will start at 10 but climb to 40 on the day of travel. Stale price quotes would obviously be inexact but when the DB shows an interesting price and you search it manually, the DBs would be updated. The route and schedule info would of course be quite useful (and unlikely stale).

The end result would be an Amadeus DB of sorts, but with the inclusion of environmentally sound ground transport. It could give a direct comparison and perhaps even cause air travelers to switch to ground travel. It could even give us a Matrix ITA Software UI/query tool that’s more broad.

7
 
 

The websites of blablacar, flixbus, and various trains sites are hostile toward the idea of travelers getting all the data they need to plan a trip.

E.g. Flixbus is tor-hostile, but even when access is granted you cannot just ask for the cheapest trip from A to B over a range of days. The motherfuckers force us to search one day at a time and just one destination at a time.

Fuck that. How can we get the data? I know these sites have bot-hostility so scraping it seems like a huge effort.

Some countries have “open data” laws that requires sharing the data, but that only works if the gov gets the data to begin with. If the gov does not get the Flixbus data, then there is no legal requirement to share it.

8
 
 

cross-posted from: https://sopuli.xyz/post/32830968

I stopped distributing Linux Mint to the low-tech users who I support roughly ~10 years ago when the project jailed their docs in tor-hostile Cloudflare websites (e.g. readthedocs.io, IIRC).

A recent general search for info on getting a piece of hardware working on linux led to forums.linuxmint.com (the query had no relevance to Mint specifically). This website uses #Sucuri for elitist tor-hostile gatekeeping. There is no action for me to take since I already quit supporting Mint, other than perhaps to ask others in my local linux support group to also drop Mint support because our users should not face a choice between software freedom and privacy. Certainly when I am asked to install Mint for someone, I will refuse and try to steer them to Debian, perhaps with Cinnamon.

Screenshot attached. Not sure how long linuxmint has been using Sucuri for crude IP reputation discrimination, but note that the Debian project that feeds the Mint project demonstrates respect for people’s privacy. Mint adds value in some ways, but at the same time worsens a good distro by jailing information.

This is not a “something is better than nothing” scenario. It’s actually destructive. When you host a discriminatory access-restricted forum, you create an attraction for useful info and simultaneously become an obstacle to the information that would otherwise find a better host. If forums.linuxmint.com did not exist, the discussion would still occur somewhere and it would have a chance at occurring in an open access venue.

9
 
 

I could not reach the site from Tor. The linked page is the archive.org cached version, which actually is open to all.

10
 
 

cross-posted from: https://lemmy.sdf.org/post/35371288

The regulator of banks at a state-level responded to reports of legal infringements by a credit union to say: “why don’t you change banks?” Of course the important question here is: “why don’t you enforce the law? Are banks above the law?”

I wanted to find out how many reports of unlawful conduct by banks in the state were reported and how many are acted on. So I requested disclosure of reports and remedies for a specific credit union.

They’re response: investigations and actions taken against banks are secret.

WTF? This is a public regulator. How is this even possible? To be clear, we pay taxes to finance this regulator of banks, yet we are blocked from seeing whether they do their job? And we are blocked from seeing complaints submitted by the public, thus blocked from taking self-defense measures to avoid bad actors?

Would it be sensible to have a non-profit host a searchable website that publishes people’s complaints before forwarding them to the secretive regulator?

11
 
 

The linked site apparently launched in 2013 to collect metrics on open data by govs around the world and rank them. Then what.. in 2015 they quit?

Did anyone pick up the slack? I would like to see how much rank the US would be losing under the GOP’s Trump regime.

12
 
 

Elon’s DOGE regime stormed into NOAA and demanded direct access to their IT systems to snoop on the data. This is in the name of cutting fat.

climate

Climate scientists worldwide rely on weather data from NOAA. Obviously the party of climate denial is no friend to climate science. They want to stamp out that particular segment of science.

abolition of environmental regs

The GOP also hates environmental regs because they prioritize big business over the environment. From the linked article:

“The organization [NOAA] cited impacts of cuts could include overfishing, increased imports of illegal or unethically sourced seafood, threats to endangered wildlife, and threats to life and property without its weather forecasting and data resources.”

DEI

Team GOP is also looking to stamp out diversity, equity, and inclusion. This article covers that angle of DOGE’s likely assault on NOAA.

privatization

Of course Musk is also looking for his personal business advantage and any maneuver using government power to increase Tesla and Space-X revenue. Any opportunities to kill off public spending on public resources create opportunities for his private corporate empire will not be overlooked.


I tagged it as “US/world” because even though the data comes from the US, and is threatened within the US, the whole world uses the data.

(edit) It was noticed on !science@mander.xyz (where I was about to cross-post):
https://mander.xyz/post/24567559

13
14
 
 

FYI France has this open data website... https://www.data.gouv.fr/en/datasets/

15
 
 

I have not been able to track down the Belgian open data law¹ but it seems in principle blocking both Tor users and archive.org from access to the address book of Chamber of Representatives would not be in line with the spirit of open data. They may not have the IT competency to serve Tor users but to treat archive.org like a malicious robot is to underachieve.

¹ I can only find an old archive of the goals of the open data policy (in French), but not the law:

http://web.archive.org/web/20160416034829/http://www.digitalbelgium.be/sites/default/files/content/FR/_strategisch/_dossier.pdf

The original link was from https://openknowledge.be/ which seems to be a stale website and an inactive project. It feels like open data got started in Belgium but then the ball was dropped.

16
 
 

(original post)

To reach the Belgian datasets of open data from Tor you must go through archive.org:

http://web.archive.org/web/20241003145143/https://data.gov.be

And because the website is interactive and also not completely archived, I ultimately could not even browse through to see what data there is beyond the first page of databases. Thus not entirely “open”.

But the Brussels datasets are open to all.

I could not find the data I was looking for. That is, I wanted to know how many complaints are sent to the various different SPF regulators as well as ombuds people -- and very specifically how many complaints are ignored. Some offices produce annual reports but I have never seen an annual report that exposes the count of ignored complaints.

Anyway, the question I have is what section of legal code covers open data in Belgium?

17
 
 

And if you try to visit the archive¹, that’s also fucked.

Not sure who these people are.. maybe they are actually watchdogs in opposition to open data.

¹ https://web.archive.org/web/20240925081816/https://www.opendatawatch.com/