December Newsletter
Hi everyone-
Properly dark and cold now in the UK, and even some initial sightings of Christmas trees so it must be getting to the end of year... perhaps time for some satisfying data science reading materials while pondering what present to buy for your long lost auntie!
Following is the December edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity.
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.
Industrial Strength Data Science December 2021 Newsletter
RSS Data Science Section
Committee Activities
We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don't hesitate to drop us a line.
On Tuesday 23rd November we hosted our latest event "The National AI Strategy - boom or bust to your career in data science?" and it was another great success with a strong turnout.
First of all Seb Krier, Senior Technology Policy Researcher at the Stanford University Cyber Policy Centre, gave an excellent overview of the published National AI strategy using his extensive experience to provide insight into the strengths and weaknesses of the different focus areas, and how it compares to different approaches around the world.
Next, Adam Davison and Martin Goodson talked through the results of our recent data science practitioner survey on the government strategy proposals, highlighting areas of discrepancy and omission.
We then finished with a lively round-table discussion, additionally including Stian Westlake, Chief Executive of the RSS and Janet Bastiman, Chief Data Scientist at Napier AI.
We will publish a more detailed review and video in the coming weeks for those who missed out.
If anyone is interested in getting more involved in this discussion, we are collaborating with the UK Government's Office for AI to host a roundtable event on AI Governance and Regulation which is one of the 3 main pillars of the UK AI Strategy. We are seeking Data Science and AI experts and practitioners to participate - please express any interest by emailing weatheralljames@hotmail.com.
Many congratulations to DSS section committee's Rich Pugh who has been elected to the RSS Council - joining the DSS's Anjali Mazumder and Jim Weatherall... all part of our cunning plan for global domination!
Martin Goodson continues to run the excellent London Machine Learning meetup and is very active in with events. The last talk was on October 27th where Anees Kazi, senior research scientist at the chair of Computer Aided Medical Procedure and Augmented Reality (CAMPAR) at Technical University of Munich, discussed "Graph Convolutional Networks for Disease Prediction". Videos are posted on the meetup youtube channel - and future events will be posted here.
This Month in Data Science
Lots of exciting data science going on, as always!
Ethics and more ethics...
Bias, ethics and diversity continue to be hot topics in data science...
As discussed last month, it doesn't require sophisticated algorithms to cause confusion and spread miss-information. Misleading data published by the UK Health Security Agency was picked up in Brazil and used to fuel Anti-vaccination myths. There is more background on the challenges associated with the covid figures here.
Some positive moves around facial recognition...:
A joint UK and Australian investigation has found Clearview AI, famous for its easily accessible facial recognition system, to be in breach of privacy laws
NIST (the US National Institute of Standards and Technology) published more details of its approach to testing the quality of facial recognition software, which could become very useful in vendor selection processes and accreditation.
Facebook decided to shut down its own facial recognition system, citing 'societal concerns' (official release here)
"This change will represent one of the largest shifts in facial recognition usage in the technology’s history. More than a third of Facebook’s daily active users have opted in to our Face Recognition setting and are able to be recognized, and its removal will result in the deletion of more than a billion people’s individual facial recognition templates."
Some interesting commentary from Professor Stewart Russell on the dangers ahead as AI becomes more powerful, with a call for more codes of conduct, and cross border treaties. (Also, Prof. Russell will be giving the BBC Reith Lecture this year titled Living with Artificial Intelligence)
For example, asking AI to cure cancer as quickly as possible could be dangerous. “It would probably find ways of inducing tumours in the whole human population, so that it could run millions of experiments in parallel, using all of us as guinea pigs,” said Russell. “And that’s because that’s the solution to the objective we gave it; we just forgot to specify that you can’t use humans as guinea pigs and you can’t use up the whole GDP of the world to run your experiments and you can’t do this and you can’t do that.”
In terms of AI Governance, the EU and the US are, as is often the case, taking different approaches.
Some worrying examples of bias in medical devices, including oximeters and spirometers.
Not all doom and gloom though...
Apparently, irregular pupil shapes can help identify artificially generated human faces
And a truly data driven attempt to combat filter bubbles and bias in the news: https://www.improvethenews.org/, where you can see news on the same topic from 100s of different news sources and filter/categorise them based on a variety of data driven dimensions including political stance, establishment stance etc. For more background on this, as well as a wide array of AI topics, I highly recommend the interview with Max Tegmark on the 'People I Mostly Admire' podcast.
Developments in Data Science...
As always, lots of new developments...
The Conference on Neural Information Processing Systems (NIPS is one of the top machine learning conferences - a quick digest of all 2021 submissions here
As is often the case, Google is at the forefront of applied research- this time applying sensory substitution in reinforcement learning
“The brain is able to use information coming from the skin as if it were coming from the eyes. We don’t see with the eyes or hear with the ears, these are just the receptors, seeing and hearing in fact goes on in the brain.”
Useful - new approaches to evaluating reinforcement learning techniques in a more systematised way.
Lots of NLP developments this month...
What looks to be a promising idea of augmenting large language models with live internet search results to fine tune responses to queries
OpenAI has made progress with a system that solves 'Maths Word Problems'- notoriously hard for automated NLP approaches to understand.
The 'bigger is better' race continues with Microsoft and NVIDIA releasing their Megatron Turing-NLG model with 530 billion parameters, trumping GPT-3 which has a 'mere' 175 billion.
Meanwhile OpenAI has made GPT-3 generally available through its API
And more nuanced commentary on what all this means...
"This trend of massive investments of dozens of millions of dollars going into training ever more massive AI models appears to be here to stay, at least for now. Given these models are incredibly powerful this is very exciting, but the fact that primarily corporations with large monetary resources can create these models is worrying"
Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!
Google looks to be doubling down in the field of drug discovery, with the launch of a new company, Isomorphic Labs, leveraging the ground breaking DeepMind AlphaFold, and run by DeepMind co-founder Demis Hassabis
“Biology is likely far too complex and messy to ever be encapsulated as a simple set of neat mathematical equations. But just as mathematics turned out to be the right description language for physics, biology may turn out to be the perfect type of regime for the application of AI.”
More wildlife tracking... previously we had cattle in Australia, now chimps in West Africa-
Using ML models in production is not easy ... quite a lot of press and commentary on the plight of Zillow in the US
Zillow is a property listings business with a proprietary model that can provide a house price estimate for any house at any time
Zillow subsequently built a business (Zillow Offers) actually buying and selling houses using their model to identify profit making opportunities. However, they have now had to suspend the Offers operation after losing over $300m in just a few months (check out the revealing earnings call here)
“There was no problem with the algorithm as long as they stay within the boundaries of the business model and buy cookie-cutter homes that are easier to sell. There are a lot of things that affect the valuation of homes that even very sophisticated algorithms cannot catch"
How does that work?
A new section on understanding different approaches and techniques
A well written guide on the theory behind Deep Learning Optimisation
Excellent tutorial building Transformers from scratch
"Before we start, just a heads-up. We're going to be talking a lot about matrix multiplications and touching on backpropagation (the algorithm for training the model), but you don't need to know any of it beforehand. We'll add the concepts we need one at a time, with explanation.."
Understanding Graph Neural Networks with 'differential geometry and algebraic topology' ... I know, not the most most welcoming of titles, but it is well explained with lots of visual examples
This is great - A Visual Introduction to Language Models
For example, speech recognition systems need to disambiguate between phonetically similar phrases like “recognize speech” and “wreck a nice beach”, and a language model can help pick the one that sounds the most natural in a given context. For instance, a speech recognition system transcribing a lecture on audio systems should likely prefer "recognize speech", whereas a news flash about an extraterrestrial invasion of Miami should likely prefer "wreck a nice beach".
Gaussian processes - very powerful tools and worth exploring with this entertaining tutorial
"But I am going to define this stuff three times. Once for mum, once for dad, and once for the country."
Another excellent post- this time focusing on waves, spectral analysis and their link to machine learning
Machine learning model explainability continues to be a hot-topic- a useful guide to SHAP, one of the better approaches out there.
Finally, a useful discussion on outlier detection and how it relates to data drift
Practical tips
How to drive analytics and ML into production
We've previously highlighted the importance of MLOps and the standardisation of processes for updating and monitoring ML models in production. Another good podcast on the 'The Data Exchange' this time about ML Ops Anti-Patterns (the underlying research paper is here)
Speaking of MLOps - excellent summary of the platforms used across the big players, highlighting how much is still 'home grown' (labeled 'IH' below)
Finally, some useful tips and a systematic approach to improving existing ML systems
"Machine learning systems are extremely complex, and have a frustrating ability to erode abstractions between software components. This presents a wide array of challenges to the kind of iterative development that is essential for ML success.”
Bigger picture ideas
Longer thought provoking reads - a few more than normal, lean back and pour a drink!
New insight into the brain and how we make sense of our surroundings
"Abundant evidence and decades of sustained research suggest that the brain cannot simply be assembling sensory information, as though it were putting together a jigsaw puzzle, to perceive its surroundings. This is borne out by the fact that the brain can construct a scene based on the light entering our eyes, even when the incoming information is noisy and ambiguous."
True stories of algorithmic improvement - how have we been able to make our algorithms more efficient?
We know Deep Learning can be incredibly powerful, but is it ready for deployment in safety critical situations?
"I would love to incorporate deep learning into the design, manufacturing, and operations of our aircraft. But I need some guarantees."
More insight into how our brain functions, this time observing the possibility of back-propagation in the brain
A catchy title and well worth a listen: "Bernoulli's Fallacy & the Crisis of Modern Science"
Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:
A bit more visualisation focused this month:
First a few pointers on visualisation and graphics best practices, starting with guidelines and then progressing to attention, contrast and grouping
How about applying all this to chips (the potato variety...)
Or maybe to Alice in Wonderland
Covid Corner
As we head into winter, we continue to experience the conflicting emotions of relaxing regulations and behaviour with increasing Covid prevalence and hospitals at breaking point. And now there is a news of a new variant...
The latest ONS Coronavirus infection survey estimates the current prevalence of Covid in the community in England to be roughly 1 in 65 people which is somewhat better than last month (1 in 50) but still very high... Back in May the prevalence was less than 1 in 1000..
More or Less covers a recent case of Simpson's Paradox in the vaccination figures, when it appeared vaccinated people had higher death rates than non-vaccinated due to the confounding effect of age.
There is increasingly scrutiny of the UK Governments policy towards Covid since the end of lockdown in July, particularly with regards to children
"Whatever the reason, by half-term, only around 16 per cent of vaccinations in the cohort had been achieved. Meanwhile, school-age kids had caught Covid by the truckload. Over 7 per cent of the entire Year 7 to Year 11 cohort was infected on any day in the last week of October alone. Maybe that was the unspoken plan. Certainly the JCVI’s minutes – released at the end of October after lengthy delays – make grim reading in this respect. The idea, already noted, that “natural infection” might be better than vaccination for young people was under discussion even here. Somehow, catching Covid was proffered as a better way of not getting ill with Covid than preventing its worst effects with a proven vaccine."
Even BBC coverage has caused controversy. The recent government releases comparing the UK favourably to various European countries, was directly reported by the BBC but highly questioned by leading academics
And now we have a new 'omicron' variant, originating in Southern Africa ... although it's too early to tell exactly how dangerous it is yet
Updates from Members and Contributors
Professor Harin Sellahewa reports that nearly 50 of the University of Buckingham's first ever master’s level data science apprentices have graduated. The Integrated Master’s level Degree Apprenticeship course was set up two years ago to help address an urgent shortage of people with advanced digital skills and to produce expert data scientists by giving them the technological and business skills to transform their workplace. The graduates receive the MSc in Applied Data Science from Buckingham as well as the Level 7 Digital and Technology Solutions Specialist degree apprenticeship certificate from ESFA. The apprenticeship is provided in partnership with AVADO who work with businesses to train staff to develop the skills needed to compete in a digital world. Industry partners such as IBM, Tableau, TigerGraph and Zizo conducted practical workshops for the learners.
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.
- Piers
The views expressed are our own and do not necessarily represent those of the RSS