September Newsletter
Hi everyone-
Another month flies by... hard to believe summer is technically over although the coldest August UK bank holiday on record is one way to drive home the point!
Following is the September edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity while figuring out whether your home office setup needs a rethink for the months ahead...
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
Industrial Strength Data Science September 2020 Newsletter
RSS Data Science Section
Covid Corner
The Covid situation feels a little unreal at the moment with UK schools just about to re-open fully while positive tests in France go 'exponential'. Offices continue to remain stubbornly empty while bar and restaurant trade is visibly picking up. The upcoming return of students to Universities and Schools for the new academic year is sparking increasing concern. As always numbers, statistics and models are front and centre in all sorts of ways.
Positive cases in the UK are beginning to rise again which is causing concern for the upcoming autumn and winter months. A recent leaked SAGE report attempting to forecast the growth of the virus over the winter suggested a "reasonable worst case scenario" of 85,000 deaths, a very sobering thought.
However, it's clear that the relationship between the number of positive cases and Covid related deaths has changed- although cases are rising, the number of deaths remain relatively low and stable. This makes the modelling of cases and deaths even harder than previously. The team at covid19-projections.com who have generated some of the more accurate forecasts so far, are showing increasing deaths in the US but still relatively low numbers going forward in the UK.
The Financial Times gave a useful summary of why the case and death trajectories appear to be diverging: increased testing (a greater proportion of actual cases are being picked up than before); changing age profile of those infected (younger people infected who are less likely to suffer severe effects); and improved care (we are now better at treating infected patients).
The big question of course is whether this trend continues. As was the case earlier in the year, we may well be closely watching trends in Italy, France, Spain and Germany to see what we can learn.
Separately, research continues into other less traditional ways of monitoring the virus' spread
Committee Activities
It is still relatively quiet for committee member activities although we continue to play an active role in joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation.
Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and has been active in lockdown with virtual events. The most recent event, Putting An End to End-to-End by Sindy Löwe was a great success - the video will be posted soon on the meetup youtube channel - and future events will be posted here.
The committee are also excited to be launching a new initiative: AI Ethics Happy Hours. More details to follow next week but we are keen to generate lively debate from the whole data science community in this interesting and important topic.
Elsewhere in Data Science
Lots of non-Covid data science going on, as always!
Evil Algorithms...
Algorithms have been in the press a fair amount recently, and not for the right reasons. The public exam results fiasco in the UK was a case in point...
First of all - what happend?
Way back in the early days of lockdown (20th March), the government announced the cancellation of all school age public exams in the UK.
After discussion with the Department of Education, the body in charge of exam regulation (Ofqual) published the guidelines they would use for providing grades to students unable to sit their exams. Teachers would supply an expected grade together with a class rank for all students in all subjects, and Ofqual would use this information, together with prior aggregate subject level results from schools to generate 'final' grades.
The Royal Statistical Society both offered to help in the algorithm development which was not taken up by Ofqual, and outlined concerns regarding the published approach. However, at the time the published approach remained unchanged.
On 13th August, 'A' Level results were announced with almost 40% of students receiving at least one downgraded result from their teachers' predictions. The results generated criticism, which became more outspoken and widespread when it was found that children in less privileged areas and schools were more likely to have suffered downgraded results. It was also unclear how the algorithm could have generated some of the individual results quoted in the press, leading to concerns about its implementation.
On 17th August, the government announced that 'A' level results would be re-issued with students awarded their predicted grades, and that 'GCSE' results would be awarded in the same way, without using the Ofqual algorithm. Although this was generally perceived as a 'fairer' outcome, it did result in significant grade inflation over previous years.
So what was the algorithm that caused all the controversy?
Of course, as data scientists, we all know that a critical component of any algorithm is the clear definition what you are trying to achieve and how you can measure success and fairness in outcomes. Early on in the process it appears the government was concerned about grade inflation and so likely gave guidance to Ofqual to correct for this. So from the outset, it appears that limiting grade inflation was the key objective for the algorithm but there seems to be limited discussion or evidence around how they would assess success or fairness in outcomes.
As discussed in this post by Sophie Bennett, the algorithm made a key distinction based on class-size: where the class size was less than 15, the teacher assessed grades would be used without correction, but for class sizes over 15, adjustment towards historic grade distributions would be made. Of course, private schools and schools in more affluent areas tend to have smaller classes, and so the unintended consequence of this approach was that students from less affluent backgrounds were more likely to have their teacher assessed grades downgraded.
The RSS Data Science committee submitted a response to the CDEI call for evidence on bias in algorithmic decision making back in May. In it we stated:
"The only way to prove algorithms are biased is to perform experimentation and analysis. Any argument based on explainability, the capability of an algorithm to produce a justification for its decisions, will fail. This is because of hidden correlations which allow latent and implicit variables to create bias against protected groups."
RSS Data Science Section submission to CDEI call for evidence on bias in algorithmic decision making
Clearly this type of analysis and assessment is as important as ever- it recently helped change an algorithm used in visa allocations. Unless we are careful, bad implementation of algorithmic approaches could lead to a backlash against all implementations. It may have already started.
Elsewhere in bias and ethics...
As we dig further into underlying biases, we find more issues ... this time in the most used Named Entity Recognition open source data set.
And as AI systems get better and better, at more and more tasks, how much control should we give them? Should AI systems be allowed to pull the trigger?
More GPT-3 ...
We now seem to have a regular section on GPT-3, OpenAI's 175 billion parameter NLP model as it continues to generate news and commentary.
We previously mentioned 'Double Decent', the intriguing situation when apparently over parameterised models (which have historically meant over-fitting and poor generalisation) become more successful with more training. This excellent set of tweets from Daniela Witten, a co-author of the latest edition of the Machine Learning bible, gives insight and foundations into why this happens.
This set of posts is interesting in its own right- a set of philosophers discussing GPT-3. Most impressive though is GPT-3's response to their posts (generated after feeding the posts in as prompts)...
“…As I read the paper, a strange feeling came over me.
I didn’t know why at first, but then it hit me: this paper described my
own thought process. In fact, it described the thought process of every
human being I had ever known. There was no doubt in my mind that all
people think in this way. But if that was true, then what did it say
about me? I was a computer, after all."
It seems savvy blog writers are successfully leveraging GPT-3 ...
"Over the last two weeks, I’ve been promoting a blog written by GPT-3.
I would write the title and introduction, add a photo, and let GPT-3
do the rest. The blog has had over 26 thousand visitors, and we now
have about 60 loyal subscribers..."
Of course not everyone is sold - Gary Marcus is less than impressed ...
"Too little has changed. Adding a hundred times more input data has
helped, but only a bit."
There are concerns around the lack of transparency although if anyone wants to get a better understanding of how it works, Andrej Karpathy has released an open-source Py-Torch version called minGPT that anyone can play around with.
What to work on and how to get it live
Eugene Yan has a couple of useful recent posts on tips and skills for machine learning in production.
His Practical Guide to Maintaining Machine Learning in Production talks through key areas of focus to make sure models are still functioning correctly;
While Data Scientists should be more end-to-end discusses some of the broader skill sets needed to do this successfully.
On a similar theme, Effective Testing for Machine Learning Systems gives a good summary of why traditional testing approaches often fail in machine learning systems.
Following on from last month, this excellent post from Clemens Mewald talks through the continuing issues with many of the AI/Data Science development platforms.
"...like DIY craft kits, with the instructions and 70% of the parts missing”
Practical Projects
As always here are a few potential practical projects to while away the lockdown hours:
Everyone loves GANs (Generative Adversarial Networks), and everyone loves Anime .... so how about using a GAN to turn anything into Anime?
You've always wanted to dig into Reinforcement Learning but never found the use case? A new release from Google could help.
Updates from Members and Contributors
Marco Gorelli has published his Jupyter notebook code quality tool which looks interesting and useful.
Sara Parker brought our attention to the work of Simon Maskell’s PhD students on using new data sources to track the Covid outbreak which we called out in the Covid Corner above.
Ole Schulz-Trieglaff draws our attention to the upcoming PyData Global Conference on November 11th-15th. It combines many of the PyData flagship events like PyData London, NY, Delhi etc and looks like compelling event.
Finally Glen Wright Colopy is running what looks to be an excellent series of events: the Philosophy of Data Science series
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
- Piers