10 Mar 2017

Algorithms should be made open ("transparent") to the public. I doubt, however, that we will live in a world where you can demand an explanation of what an algorithm did to you.

So, algorithms are a hot topic now. They change our world! You might remember how Facebook has an algorithm which manages user's news feeds, but it performed very poorly and kept trending fake news. Sad! Or the current discussions about self-driving vehicles being a danger to our safety - or even a danger to social peace because millions of driver jobs are endangered.

Let me make the side-note here that in both of these cases mentioned above, and many others which are similar, the awareness might be new, but the fact that algorithms do crucial work in our newsfeeds and our car safety systems is actually not that new actually. As a good example for this take banks, who have been using data mining algorithms to decide who should get a mortgage decades ago.

Lots of demands, but few incentives

However - the conversation is starting on what we should demand from entities (companies, governments) who employ algorithms w.r.t. the accountability and transparency of automated decision-making. The conversation is being led by many stakeholders:
 

  • Think Tanks: For example, Pew research just published a report, in which expert opinions around this question where gathered: "Experts worry [algorithms] can (...) put too much control in the hands of corporations and governments, perpetuate bias, create filter bubbles, cut choices, creativity and serendipity, and could result in greater unemployment.".
  • NGOs: With Algorithm Watch we now have the first NGO that dedicates itself to "evaluate and shed light on algorithmic decision making processes that have a social relevance".
  • Governments: The EU General Data Protection Regulation (GDPR, enters into law 2018) has a section about citizens having "the right to question and fight decisions that affect them that have been made on a purely algorithmic basis".


There are many things being said, and I find this discussion more vague the longer I listen, but maybe the one specific demand all these stakeholders would get behind is this: Explain how the algorithm affects my life.

So it seems like a lot might be happening in this direction, but I believe very little will happen. Why?
 

  1. It is very costly to develop this feature (of explainable algorithms). I'll discuss why that is in more detail below. In fact, this feature is so costly, making it mandatory might actually drive smaller software development shops out of the market, as only the big players have enough manpower to pull it off.
  2. Customers will not demand it in everyday products, no matter what think tanks say. Open source software is not demanded by customers - it is being used because engineers love it.
  3. It will only become the norm in a specific type of algorithmic software if the product actually needs it. Consider a medical diagnosis software - the doctor needs to explain this diagnosis to the patient. Or the mortgage decision example where I believe lawmakers could step in and make banks tell you exactly why they will not give you a loan.
     

Explanations: decisions versus design

Algorithms should be "explainable", they say. However, there is no consensus to what that actually means.

In the light of complexity, I believe that what is explainable is not well-defined - while many people talk about explaining individual decisions, all we can really hope for in most cases is explaining the general design - which is much less helpful if our goal is to help people who have been wronged in specific situations. There can and should be transparency, but it might be less satisfying than many people hope for.

A new paper by Wachter et al (2017) makes a good point here that I agree with. They say that many stakeholders to the EU General Data Protection Regulation (see above), have claimed that "a right to explanation of decisions" should be legally mandated, but is not (yet). Rather, what the law includes as of now is a "right to be informed about the logic involved, as well as the significance and the envisaged consequences of automated decision-making systems". They also make a distinction between explanations for "specific decisions" versus "system functionality", and they state that the former is probably not "technically feasible", citing a book explaining how Machine Learning algorithms are not easy to understand.

I agree with their doubts here. Let me shortly dive into the feasability problems with explaining specific decisions and also broaden the scope a bit. To me, it comes down to complexity which is inherent to many IT systems that are being built these days:
 

  1. Complex data: Let's look at the Facebook news feed example: There is a lot of data going into this algorithm. The algorithm makes thousands of decisions for you while you surf and to fully explain one of them at a later time, you'd need a complete snapshot of Facebooks database at the very second the decision was made. That is unrealistic.
  2. Complex algorithms: Many algorithms are hard to understand for humans. Maybe the prime example here are machine learning algorithms. These algorithms are shown real-world data, so they can build a model of the real world. They use this model they built to make decisions in real-world situations. The model often is intelligible to human onlookers. For instance, a neural network (which is for instance used in Deep Learning) reaches a decision in a way that the engineers who "created" it cannot explain to you because the algorithm propagates information through this network many times over because it links back from its end to its beginning (so-called "back propagation").
  3. Complex systems: Finally, many algorithms might actually not live inside one computer. You might interact with a system of networked computers, which interact and all contribute their own part to what is happening to you. My favourite example here is a modern traffic control system that interacts with you while you travel through it, and at each intersection you encounter a different computer. I actually argued in a paper I published in 2010 that decentralised autonomic computing systems tend not to be "comprehensible".

So I believe (and agree with Wachter et al) that the demand to explain for any given situation how exactly the algorithm made a decision, is hard to hold. What can be explained is the design of the algorithm, the data that went into its creation and its decision-making and so on. This is a general explanation, which might be useful to explain to the public why an algorithm is treating problems in a certain way. Or it might be useful in a class action law suit. But it cannot be used to give any particular person the satisfaction of understading what happened to them. That is the new reality which probably already exists, but it has to sink in.

The notion of complex, almost incomprehensible algorithms has been brought up previously by the way. I want to mention Dr. Phoebe Senger, who noted that software agents tend to become incomprehensible in their behaviour as they grow more complex ("Schizophrenia and Narrative in Artifcial Agents", 2002), and of course the great Isaac Asimov who invented the profession of "robot psychologist".
 

 

# lastedited 11 Mar 2017
14 Jun 2016

I dragged it with me after my contract ended in 2014, but I actually made a finished product out of my dissertation after all and defended it at the TU Delft this past May.

It was a pretty formal procedure as you can see, but quite a meaningful end to , and actually a fun day in the end.

The dissertation itself is available here officially, but also hosted by me. I'll post the propositions here:

1. Both the need for low computational complexity of bidding and for effective capabilities of planning-ahead can be addressed in a market mechanism for electricity, that combines the trade of binding commitments as well as reserve capacity into one bid [this thesis, chapter 3].

2. In settings where a uniform price changes dynamically over time and where these dynamics are influenced significantly by consumer behaviour, the ability of a consumer to comprehend price patterns increases if a large part of the other consumers reacts to price dynamics in a manner similar to how he himself reacts to them [this thesis, chapter 4].

3. Dynamic pricing for electricity can effectively reduce consumption peaks, also under the two conditions that the retailer promises an upper limit on prices and designs his pricing strategy for profit maximisation [this thesis, chapter 5].

4. A heuristic control strategy for a battery which is limited in capacity can be designed such that it has the following three advantages: it reacts fast, it can reduce overheating of a connected low-voltage cable significantly and (if prices are dynamic) it can partly earn back the acquisition cost of the battery by performing revenue management [this thesis, chapter 6].

5. There is not one silver bullet to the problem of how to manage a smart grid in the most efficient way. Each setting has its own requirements, given by its own set of stakeholders and design objectives.

6. To have a healthy and happy toddler is not to a small degree a matter of luck.

7. For the foreseeable future, concerns about privacy need to focus on computers and mobile phones, which directly expose political views and social contacts of their owner, rather than smart meters, which expose less meaningful data.

8. If users do not comprehend the reason why a novel technology interacts with them in the way it does, it will not be adopted, even if it is useful and resource-friendly.

9. Electricity grids are the largest man-made synchronous machines, and economies are the most complex man-made systems. To combine them leads to much more complexity than is commonly assumed, and the resulting systems will therefore never be completely understood.

10. In a referral network, where agents base their opinion about the performance of a service on those of other agents, it is beneficial for users if the agents forget old information at a comparable rate. [N. Höning: "Discounting Experience in Referral Networks", Master thesis, Vrije Universiteit (2009)]

I also was asked to write a very short summary, which might be useful here:

New developments require us to reconsider how electricity is distributed and paid for. Some important reasons are renewable energy, electric vehicles, liberated energy markets and the increasing number of smart devices. How we deal with these dynamics will affect important aspects of the upcoming decades, for example transportation, home automation, heating/cooling & climate change.

In order to keep the security of supply high and price fluctuations within acceptable ranges, we need to continuously make the decisions who will supply or consume electricity, at what price and at what time. The resulting complexity should not grow too high for small participants, otherwise novel technology might not be adopted. This dissertation contributes market mechanisms and dynamic pricing strategies which can deal with this challenge and reach acceptable outcomes in four relevant problem settings (mostly situated in lower levels of the electricity grid).

The most critical problem to address are intervals with very high power flow, or with high differences between demand and supply which need to be evened out. Such “peaks” can result in steep price movements and even infrastructure problems. We study decision problems that will arise in expected scenarios where peaks reduction becomes important. In order to arrive at an efficient and usable system, this research specifically looks into

  1. Encouraging short-term adaptations as well as enabling planning ahead (of generation and consumption) within the same mechanism.

  2. Ensuring that small and/or non-sophisticated participants can still take part in mechanisms.

  3. Letting smart storage devices contribute to network protection.

We develop agent-based models to represent expected settings and propose novel solutions. We evaluate the solutions using stochastic computational simulations in parameterised scenarios.

A similarly high-level overview was given by me in a short presentation before the defense.

# lastedited 15 Jun 2016
08 Dec 2014

Last week I attended DockerCon Europe 2014 which luckily happened in Amsterdam this year. I got my finger on the pulse of important developments in the ongoing evolution of the internet and a just-healthy dose of tech-optimism from current Silicon Valley prodigies. I thought I'd share my five favourite slides with a little comment on each.

The internet is still a technological Wild West. So many talented people. So much change and progress in technology each year. So many things you need to know to deliver something that doesn't break for some reason. So much still to achieve. Docker provides a usable (fast and well-documented) way to bundle into a container some things that you know are working, then upload this container on any server and expect this functionality to be up and running and simply work as you expect. It wasn't a DockerCon talk, but I like this short breakdown of what holds us back and why containers can help a lot (9 slides). At Softwear, we are using Docker both in the CI workflow and partly in production (by the way, let me know if you want to do influential UX or QA engineering work for us). The slide above shows the way of thinking going forward - build stacks from things you know will work and will also work together. Like Lego. Then make these light-weight stacks (your actual web applications) work together in creative ways. The current term for a picture like this is "Microservices". This slide from Adrian Cockcroft (who spent six years at Netflix) makes the point how useful Docker will be in more detail. Adrians presentation (all slides) probably generated the most food for thought. My favourite line (slide 26) is:

DevOps is a Re-Org!

meaning that software developers are taking over system operator/admin - tasks in any company which does not actually run data centers (which is becoming a very concentrated business nowadays).

Next, we get to some Silicon Valley - style notion of how suspected technology breakthroughs will change society. Docker Inc. CEO is asking here:

What happens when you separate the art of creation from concerns about production & distribution?

Subtly, there is a picture of the printing press. He wants to say that creators of web applications soon might need to worry less about how they will deploy their app such that it will work, as the "container revolution" will make this trivial. Of course, the web 1.0 kind of already did that for content. However, I can see how lowering a crucial technological barrier for inventing useful web applications can really be significant to innovation. We have a lot of content and ways to get it out there, now let us see what cool applications can be built to assist people everywhere in the world. And although we at this conference were a bunch of rich white males, poor people are hopefully getting access to cheap smart phones soon (Africa is a good example). I, too, find these times exciting. But I was glad to return to my normal life, and to cool down a bit.

This slide gives an indication of the scale of change we are seeing in the software world. Henk Kolk from ING told us how this large bank sees itself as a technology company now and removed everyone from their large IT team who can not program at least something. Being a programmer means being in demand right now but as his slide says as well, speed is key from now on. If you don't get on board with this new way of having tight control over your stacks, together with being totally flexible towards switching technologies, all you will be doing is to jump from one sinking ship to the next. I got both excited and chilled, actually.

An interesting take away of the current weeks is how Open Source currently works. Big money has gotten in on it, because in the software world, you have to invest in widespread and sustainable technology while also having a modern stack. This only works when an open source community carries the technology. Even Microsoft is coming around in major ways. Companies are actually employing the best open source programmers directly to stay on top of things. The industry is a bit different than other industries in this regard (hopefully actually leading the way). On the slide above, Docker Inc. CTO Solomon Hykes is giving us his current set of the rules that he thinks make a technology successful these days. As a consequence, Docker got some interesting new functionality (announced on the first conference day), but it was kept out of the core code - "Batteries included, but replacable".

But it is also not all agreement and happy collaborative coding. No, sir. The latest trend is that a company or a startup guides an open source technology. This makes progress faster and stable, but it can easily break if you annoy the comunity. Node was just forked, AngularJS is having a community crisis. The Docker community is also weary of the Docker startup Docker Inc. In fact, Solomon Hykes spent a lot of his time on stage at DockerCon Europe 2014 to discuss how he wants to succeed as a steward of the Docker technology, using a process he calls "Open Design" (see all slides here). There is an Open Design API through which all feature requests have to go, thus separating people acting on behalf of Docker Inc. from people acting on behalf of the Docker Open Source project - no matter which company pays them at the moment. They are creating and updating their own constitution which deals with this construct as we speak (of course in structured text files, so if you suggest a change, you submit a pull request). So the message of this slide above is simple and compelling:

The real value of Docker is not technology. It's getting people to agree on something.

Replacing "Docker" with any standard, this is something you could also have said during any time of rapid development and change. Interesting times.

P.S. There were some really smart people at this conference, building amazing companies and systems. We can expect to see a lot, e.g. from the Apache Mesos project. I could have chosen more technical slides for this list, but it would have taken me longer to explain why I fancy them. A lot of them were also quite intimidating, actually.

# lastedited 07 Jun 2015
25 Jun 2014

Today, I gave a talk at CWI on how to become more efficient with complex computations in our scientific work. I discussed how I have approached the need for distributed computation (to scale up towards larger and more complex problems) and the problem of organising the scientific workflow when doing experimental work.

I promised listeners that they would get

  • hands-on information on getting results from large-scale, "embarrassingly parallel" computations,
  •  ... without actual parallel programming,
  •  ... little ssh effort
  •  ... and using the programming language of your choice.
  •  Plus, some tools to keep track of experiments and data.

where I would (not exclusively) mention the tools I have written, StoSim and FJD. I was happy to have a turnout of 16 people and I think we spent an interesting hour.

Here are the slides (direct link to PDF):

 

# lastedited 25 Jun 2014
27 Mar 2014

I'm quite happy with how the visits to this website of mine have developed over the last year. Here are the monthly numbers:

Visits to nicolashoening.de from April 2013 to March 2014

Btw, since March 2013 I use Piwik for my visitor analytics and am very happy. You should also be happy about that, because I don't store your metadata on Google's servers, only on mine.

An average week looks like this:

One week in visits

You can tell that people mainly come to my site on workdays, weekends are rather quiet. Why is that?

Well, in 2007 or so, I wanted to have javascript tooltips, or "popups" to display context to links when you hover them. I wanted to style them like I wanted. So I wrote a small script. It is very simple but the page explaining it creates almost all trafic to this website. Look at this example of page view numbers ("page views" are the amount of loaded web pages, where "visits" consist of one IP address performing one or more page views in one session) from (I think) one week:

The trend is clear. Around 300 people come to that page about the little javascript thingy every day, and not much else I write gets attention. And as you can tell from the long list of comments there, many people use this javascript thingy in the websites they build. I actually get some satisfaction in making them happy, so I answer many of their questions and actually improve the codebase once in a while.

But here is the problem: There are many other similar scripts for this out there, and I never get mentioned when experts list libraries for such a feature (I have been mentioned in two or three forums, I think). Why do people keep finding this? I think the dirty little secret is that I called it a "popup", while the technically more correct term is "tooltip". Look at search queries that people used when they came to this page:

There you go. Me and a significant amount of people use the slightly wrong term, and that's what drives traffic here. Accidental linguistic match-making in cyperspace. Positive things come from this. The traffic probably improves my Google ranking a lot. Our interactions help my users get something done and make me feel some fulfilment. It's a weird world.

Speaking of the world, here is where my visitors are from Mostly english-speaking countries where a lot of web development happens:

# lastedited 27 Mar 2014
21 Sep 2013

The European Commission just announced a new indicator to measure innovation in its economy. I think it shows how the bureaucrats favour big companies, want to make it easy for themselves and hold a simplistic view of how innovation leads to positive net effects for the economy.

To quote, here are their new ingredients to the indicator:

  • Technological innovation as measured by patents.

  • Employment in knowledge-intensive activities as a percentage of total employment.

  • Competitiveness of knowledge-intensive goods and services. This is based on both the contribution of the trade balance of high-tech and medium-tech products to the total trade balance, and knowledge-intensive services as a share of the total services exports.

  • Employment in fast-growing firms of innovative sectors.

  1. First, the view that patents are important favours big established companies over small innovative ones. Small innovative software companies are often better off to just move on and not waste their time trying to get a patent. And don't forget that (luckily) in Europe it is difficult to patent software. So comparing to the rest of the world here is even more questionable. Patents are also often used by companies for no other purpose than to block out other companies, which is maybe good for Europe (only if a European company bocks out a non-European one), but it is hardly helping substantial innovation in the economy to form. Another point: I think the EU has the view that innovation has this standard way to lead to positive impacts for everyone: Someone (a big company, see above) invents something, then they patent it and then hire people to develop the invention. As they start producing, they will hire other companies as subcontractors, and so the positive effect trickles down from the inventors to the others. I think that this sometimes happens, but only for big companies. This view is neglecting large parts of the economy. Innovation often happens in non-formal open spaces and/or in collaboration.
  2. And about the second ingredient - simply hiring people in "knowledge-intensive activities" gives you points, no matter their actual effects on innovation. That makes it easy to count, but what are we measuring? Are we measuring that these employees are doing something useful with their knowledge-intensive work or are we simply celebrating that people are getting paid to think? For example, I'm sure the banking and lawyer industry has lots of jobs that would count as knowledge-intensive. A reason to celebrate them? Also, government spending on research gives a country points here, no matter what is actually researched.
  3. The third ingredient is the only one that makes at least some direct sense to me. One can compare the contribution of high-tech and high-knowledge activities to other services and get some information out of it (albeit my point about the second indicator holds here, as well).
  4. The fourth ingredient is based on the assumption that fast-growing companies must be more innovative than other ones. Might sometimes be true, but often probably not. Growth != innovation. A dangerous way of thinking in my opinion. Most often, rapid growth is merely the ability to attract capital - capital that is interested in rapid returns. The substance below the company is indirectly related to the jope on rapid returns, but often not necessary for this capital attraction effect (as the last couple bubbles should have tought us). If your concept of innovation is to enable turnover, it has not been formulated in society's best interest.

I think that behind this formalism there is a dangerous fan-boy attitude toward big companies/big finance and telling the story that they like to tell - an effect of lobbyism. What about small companies? They'll have an even harder time being "sexy" to EU bureaucrats now.

There is so much more happening in an economy, which is one of the most complex systems on earth. One example that comes to mind: what about innovation that isn't directly making money but enabling others to make money? For instance, what if people create an intelligent new way of doing things, but don't directly sell this new way? Maybe they open-source their powerful idea and run a successful little consultancy based on their newly-found reputation. With their help, a part of the European economy considerably improves, in more than just one way (they provde direct services and they shared their knowledge to enable many others), but without the bureaucrats taking notice - their innovation is flying completely under the EU radar.

I know it is difficult to "measure innovation", and that is why I have not yet come up with a better way to measure it. However, it seems to me that to not measure would have been better than to measure like this. A counter-question: why do we actually need this number? It's as if the numbers we have (like GDP) aren't already alchimistic and misleading enough.

# lastedited 22 Sep 2013
19 Aug 2013

I think we live in times where both governments and citizens of western democracies are searching for a new role in their relationship. My point today is only this: the court systems are still working to some degree, and they seem to me a valuable battling field in this search. Citizens should actually pay attention and support causes fought in their name.

First, what do I mean with this "search" for new roles? It is not as easy as some pundits had described, who after 1990 simply awaited the state to take a step back. After some years of perceived openness, where capitalism had supposedly "won" and everyone could relax and enjoy, I think we now rather see the state taking a stronger stand. One reason might be that governments feel threatened by globalization and the internet, creating too much openness and feedback loops for them to feel in control. Another might be rising tension due to the resource front, as energy supply gets tight and climate change restricts what we might want to do with resources we still can get our hands on. Or, more general, this is just the usual dance between governers and governed, a step forth and a step back, and we will only see what it all meant after this song ends.

Anyhow, we are beginning to discuss what "freedom" should mean, again. And when I say discuss, I mostly mean "battling out". Emotions are running high and claims are bold and strong. Governments simply create new facts of what is legally possible, while the Chinese lean back and rub their hands.

Some citizens are being taken to court, others decide to let their quarrel with the government escalate to court. In most of these government vs. someone trials, important definitions are being made, and, as far as the court is still functioning, the outcome of these battles are important for (re)defining freedom, more important than, say, a general opinion piece in The New York Times.

What is also new today is that it is dead simple to support the legal funds of people whose cases you find interesting, via the internet. I believe if you are interested in funding societal change, investing in the legal defense of someone is (to use the lingo of our time) a sound investment. The amounts I gave are by no means to brag about, but I have begun to give to such individuals, and I think I should increase this activity. Here is a short list of who I remember supporting:

 

There are heroes out there fighting for you right now and it is easy to help them.

* Actually, I only gave some monetary support to Wikileaks as an organisation, after the Collateral Damage video. Criminal investigations into Assange started afterwards and only then it became clear that most of the financial funds of Wikileaks seem to have gone towards his legal defense.

# lastedited 19 Aug 2013
09 Jul 2013

On the way to the offices I work in, the bike path layout had a flaw: To reach Science Park (coming from Amsterdam center), one had to take a little detour. In the situation shown on the picture below, you had to follow the path on the left for about 200 meters and then turn right, in order to continue on the street in the background (here is the view from above). The more direct way, right down the hill, has been an emerging bike path for the better part of a year. It is now becoming an official bike path, as the picture shows (I took it only a couple of days ago).

While it was emerging, the path was muddy, and (especially on rainy days) quite difficult to manage, both down- and upwards. I spoke to some of my colleagues and everyone wanted to take this route, but until now only the young males felt confident enough on their bikes to do it.

Why did everyone feel strongly about taking this route instead of the slightly longer one? There is an emotional cost to pay for being forced to take a longer route, even if it is only slightly longer. Does this emotional cost justify the effort of building this new bike path? I'm not sure, but I sure can say that the city of Amsterdam has a bike path management that pays attention.

A rather famous example (at least in the scientific research community on emergent human patterns) of such an expression by users that their intent differs strongly from the planners' design is this (from an article by Helbig et al from 1997):

I'm not sure if the University of Stuttgart reacted to this, though, back in 1997. I tried to find that place on satellite images, but with no success. They probably redesigned it completely. Just as our emergent bike path, this one is, admittedly, not pretty to look at. However, the response from system planners can be quite different, apparently.

 

#
22 Jun 2013

Sometimes you can tell that technological progress is blending into a known science fiction scenario. I think right about now is a time to spot several of those - this year is like those special days when you see a lot of shooting stars. 

So, there is the obvious similarity of governments watching (well, storing and monitoring) more and more of everyone's steps to the famous book "1984".

However, there is more. I recently reviewed "Manna", where a complete automation of societey begins not by replacing low-wage humans, but by algorithms taking over middle management to make fast food stores more efficient (by organising the work of the low-wage humans as efficient as it can be). Recently, reports came out how Amazon's warehouses are actually like that: 

Amazon’s software calculates the most efficient walking route to collect all the items to fill a trolley, and then simply directs the worker from one shelf space to the next via instructions on the screen of the handheld satnav device.

Most of the article is about how bad they treat workers, but their fate is not my point here. And anyway, Amazon puts their warehouses in regions which have very few economic options. Consider that locals in the article are sad because they can't work in a mine anymore. My point is that even mines had middle management. Amazon warehouses are optimally efficient with very few middle management positions (mostly, what is left to be done is to ask the system who was the least efficient and fire and replace them). 

Next to be replaced are the workers themselves. Well... that depends. I'm not sure there will be a business case within th next 50 years for robots doing simple tasks in a warehouse over humans doing it. It's not decided yet if that is going to happen. However - the business case of doing simple middle management tasks has been decided in favour of computers. They are much better in making the most of employee time. 

I'll close ba admitting that the speed and the consequences of automation in our society are discussed by people with more insight and time for this than me, so I think I'm not qualified to say whether this development is negative or positive*. It is negative in Manna, but Manna is just a story. 

 

* However, Amazon is a monopolist, so there might be a different reason not to use them that much.

# lastedited 27 Mar 2014
22 May 2013

To test if a given string is indeed a valid email address is an unsolved problem in programming. The internet is full of endless threads of which regular expression would let all valid addresses pass and forbid all invalid ones. The only agreement is that there is no agreement.

So I welcomed the inspiration of my colleague Jim (at Vokomokum) to implement something straightforward: Check everything before @ to be a valid name (consisting of letters, digits and a couple other characters). Then use some library to check if everything after @ is a mail host currently known on the internet.

I was sold. That seems like a really nice trade-off between quick implementation and effectiveness. It is mostly the host name that is hard to check, so let's simply see if it exists. Mail hosts don't disappear and reappear, so it's pretty safe to ask for the DNS system to know about them at any time. The only drawback here is that you need an internet connection to run this test.

Anyway, here is an example implementation of this in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    def validate_email(address):
        ''' check email '''
        # check general form: a valid local name + @ + some host
        if not re.match('[A-Za-z0-9\-\_\.\+\$\%\#\&\*\/\=\?\{\}\|\~]+@[^@]+',
                        address): 
            raise('The email address does not '\
                  'seem to be valid.')
        # check host
        host = re.findall('[^@]+', address)[1]
        try:
            # dns.resolver throws an exception when it can't find a mail (MX)
            # host. The point at the end makes it not try to append domains
            _ = dns.resolver.query('{}.'.format(host), 'MX')
        except:
            raise('The host {} is not known in the DNS'\
                  ' system as a mail server.'\
                  ' Is it spelled correctly?'.format(host))

The dns.resolve module has to be installed, as it is not in the Standard Library. It is in the pythondns package, so you could do

pip install pythondns

 

# lastedited 09 Jul 2013
19 Mar 2013

At this time of writing, Posterous is an elegant, easy way to keep a blog. In 2011, I set up a simple blog to post articles I came across that had some angle on the interconnectedness of energy systems. I loved how I could simply send an email to <blog-name>@posterous.com and the article was posted.

However, Twitter bought Posterous as a modern way to hire their staff and then decided to shut Posterous down. All content will die end of April 2013. This is another hint that if you make something that lives online and there is a possibility you want to keep it, own the domain and be aware that you or someone you can ask for help needs to be able to curate the content over time (I started that blog just posting links, but then also wrote a summary or an opinion here or there - sometimes you don't know you'll want to keep something around for longer when you just started it).

Anyway - so I had to export the content from Posterous and import it into the MySQL database which currently underlines this website. Widely-used blogging software like Wordpress offers an importing tool, but everyone else is probably thinking how to get their Posterous data into their web publishing software.

Posterous offers an export in XML form, but it could have made it easier to deal with it. For instance,  it is hardly parsable XML, and the creation date is in a format used in emails. It took me a bit to get a simple Python script working, and I thought I'd share it here for anyone who needs to do something similar:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/python

from datetime import datetime
import re
import os
from bs4 import BeautifulSoup
from email.utils import parsedate_tz


def san(unicode_str):
    '''
    for sanitizing strings:
     - getting rid of all non-utf8 things
     - escape single quotations
    '''
    s = unicode_str.encode('utf-8', 'ignore')
    s = s.replace("\\'", "'") # first unescape escaped ones
    s = re.sub('''(['"])''', r'\\\1', s) # now we escape all
    return s


psql = open('posterous-import.sql', 'w')

xmls = [xf for xf in os.listdir('posts') if xf.endswith('.xml')]
for xf in xmls[:-2]:
    xfp = open('posts/{}'.format(xf), 'r')
    soup = BeautifulSoup(xfp.read())
    
    title = soup.find('title').text
    tt = parsedate_tz(soup.find('pubdate').text)
    tdt = datetime(tt[0], tt[1], tt[2], tt[3], tt[4], tt[5])
    sql_date = tdt.strftime('%Y-%m-%d %H-%M-%S')
    content = soup.find('content:encoded').text
    
    psql.write("INSERT INTO my_table \
                (title, content, inputdate) VALUES \
                ('{t}', '{c}', '{ed}');\n\n"\
               .format(t=san(title), c=san(content),
                ed=sql_date))

psql.close()

Well, it seemed to work for the contents of my Posterous blog, at least. I only cared about title, content and creation date of each post. 

Place the script above (download) into the folder you get from exporting your Posterous data (mine is called "space-3179959-energy-systems-ticker-0e4cc18feed3e8982ddae1ef4537b529") under the name of import.py (or whatever you like) and then call it via

python import.py

Then, you should find a file called posterous-import.sql which you can use to populate your Posterous posts into your own (MySQL) database. I used BeautifulSoup to be able to parse the XML, so you'll have to install that python library first, e.g. by

pip install beautifulsoup4

Also, you probably first would edit the INSERT statement in line 35 to match your specific table structure which you're importing to.

# lastedited 19 Mar 2013
28 Nov 2012

I recently commented on the idea to reward users if they in return offer flexibility to the management of the system they use. For instance, a congested road could pay people not to ride to work during peak hours. Or in energy management, consumers can offer that they use less energy than they originally planned or a generator offers to supply more.

The general idea is of course not new, but in the IT-congested world we live in now, it becomes possible to negotiate compensations with users in advance and identify them while they actually use the system. This new timing of such approaches makes it necessary to think about novel mechanisms (e.g. what negotation procedure makes sense, which compensation scheme will probably work for both sides).

This approach is not applicable everywhere, for good reasons. In some systems, like the road example, it is hard to track people and probably overall not a good idea. Common approaches like car-polling and toll houses (or simply taxing fuel) can also work to some extent. In other systems, like the energy example, it is not a new idea per se, but how to implement such mechanisms (such that they are usable and serve the existing infrastructure in the best way) is still under heated discussion.

Anyway, there are now two recent examples of this idea being brought to existence in novel circumstances. No idea if these experiments will be deemed a success, though. Time will tell.

 

 

# lastedited 29 Nov 2012
You are seeing a selection of all entries on this page. See all there are.