The deep learning revolution was becoming less dependent on vast quantities of data thanks to the latest raft of model training technologies, according to Rob Toews, investor at Highland Capital Partners.
With global spending on AI expected to top $50bn in 2020, developers were gaining more options for processing imperfect, associative and synthetic datasets, Toews told AJ Bertone, partner at In-Q-Tel, who was moderating the discussion on data’s role in AI.
In contradiction to the old mantra that data was the new oil, Toews argued data could neither be commoditised nor exchanged interchangably given the nuances associated with each specific data subtype.
Also, there was no zero-sum contest associated with capturing data, although the discussion later turned to situations where privacy concerns prevented transfers between different deep learning models.
Alex Ratner from programmatic training data modelling company Snorkel AI said these difficulties had made less likely the prospect of data being acquired, curated and protected by different organisations, undermining the prospect of widespread data exchanges between cloud computing giants.
Ratner added: “People are becoming more literate about data privacy issues and as thoughts shift to the model. The modern models that people train have hundreds of millions of parameters. With that number of tuneable ops, you could basically memorise people’s names and medical records.
“These models are very hard to interpret and it is hard to really guarantee and guard against [breaches]. So I think people will be more wary of directly transferred data. There may be more scope in terms of soaking some of that data off for things like transformers and retraining, and there are some techniques here that are very exciting, but there will be hurdles and for good reasons.”