{"id":84992,"date":"2025-10-16T19:29:07","date_gmt":"2025-10-16T19:29:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/84992\/"},"modified":"2025-10-16T19:29:07","modified_gmt":"2025-10-16T19:29:07","slug":"why-ai-startups-are-taking-data-into-their-own-hands","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/84992\/","title":{"rendered":"Why AI startups are taking data into their own hands"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">For one week this summer, Taylor and her roommate wore GoPro cameras strapped to their foreheads as they painted, sculpted, and did household chores. They were training an AI vision model, carefully syncing their footage so the system could get multiple angles on the same behavior. It was difficult work in many ways, but they were well paid for it \u2014 and it allowed Taylor to spend most of her day making art.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe woke up, did our regular routine, and then strapped the cameras on our head and synced the times together,\u201d she told me. \u201cThen we would make our breakfast and clean the dishes. Then we\u2019d go our separate ways and work on art.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">They were hired to produce five hours of synced footage each day, but Taylor quickly learned she needed to allot seven hours a day for the work, to leave enough time for breaks and physical recovery.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cIt would give you headaches,\u201d she said. \u201cYou take it off and there\u2019s just a red square on your forehead.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Taylor, who\u00a0asked not to give her last name, was working as a data freelancer for Turing Labs, an AI company which connected her to TechCrunch. Turing\u2019s goal wasn\u2019t to teach the AI how to make oil paintings, but to gain more abstract skills around sequential problem-solving and visual reasoning. Unlike a large language model, Turing\u2019s vision model would be trained entirely on video \u2014 and most of it would be collected directly by Turing.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Alongside artists like Taylor, Turing is contracting with chefs, construction workers, and electricians \u2014 anyone who works with their hands. Turing Chief AGI Officer Sudarshan Sivaraman told TechCrunch the manual collection is the only way to get a varied enough dataset.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe are doing it for so many different kinds of blue-collar work, so that we have a diversity of data in the pre-training phase,\u201d Sivaraman\u00a0told TechCrunch. \u201cAfter we capture all this information, the models will be able to understand how a certain task is performed.\u201d\u00a0<\/p>\n<p>Techcrunch event<\/p>\n<p>\n\t\t\t\t\t\t\t\t\tSan Francisco<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t|<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\tOctober 27-29, 2025\n\t\t\t\t\t\t\t<\/p>\n<p class=\"wp-block-paragraph\">Turing\u2019s work on vision models is part of a growing shift in how AI companies deal with data. Where training sets were once scraped freely from the web or collected from low-paid annotators, companies are now paying top dollar for carefully curated data. \u00a0<\/p>\n<p class=\"wp-block-paragraph\">With the raw power of AI\u00a0already established, companies are looking to proprietary training data as a competitive advantage. And instead of farming out the task to contractors, they\u2019re often taking on the work themselves.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The email company <a rel=\"nofollow noopener\" href=\"https:\/\/www.fyxer.com\/\" target=\"_blank\">Fyxer<\/a>, which uses AI models to sort emails and draft replies, is one example. \u00a0<\/p>\n<p class=\"wp-block-paragraph\">After some early experiments, founder Richard Hollingsworth discovered the best approach was to use an array of small models with tightly focused training data. Unlike Turing, Fyxer is building off someone else\u2019s foundation model \u2014 but the underlying insight is the same. \u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe realized that the quality of the data, not the quantity, is the thing that really defines the performance,\u201d Hollingsworth told me.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">In practical terms, that meant some unconventional personnel choices. In the early days, Fyxer engineers and managers were sometimes outnumbered four-to-one by the executive assistants needed to train the model, Hollingsworth says.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe used a lot of experienced executive assistants, because we needed to train on the fundamentals of whether an email should be responded to,\u201d he told TechCrunch. \u201cIt\u2019s a very people-oriented problem. Finding great people is very hard.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The pace of data collection never slowed down, but over time Hollingsworth became more precious about the data sets, preferring smaller sets of more tightly curated datasets when it came time for post-training. As he puts it, \u201cthe quality of the data, not the quantity, is the thing that really defines the performance.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">That\u2019s particularly true when synthetic data is used, magnifying both the scope of possible training scenarios and the impact of any flaws in the original dataset. On the vision side, Turing estimates that 75 to 80 percent of its data is synthetic, extrapolated from the original GoPro videos. But that makes it even more important to keep the original dataset as high-quality as possible.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cIf the pre-training data itself is not of good quality, then whatever you do with synthetic data is also not going to be of good quality,\u201d Sivaraman says.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Beyond concerns of quality, there\u2019s a powerful competitive logic behind keeping data collection in-house. For Fyxer, the hard work of data collection is one of the best moats the company has against competition. As Hollingsworth sees it, anyone can build an open-source model into their product \u2013 but not everyone can find expert annotators to train it into a workable product.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe believe that the best way to do it is through data,\u201d he told TechCrunch, \u201cthrough building custom models, through high quality, human led data training.\u201d\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"For one week this summer, Taylor and her roommate wore GoPro cameras strapped to their foreheads as they&hellip;\n","protected":false},"author":2,"featured_media":84993,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[218,54801,61,60,80,15920,54802],"class_list":{"0":"post-84992","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology","8":"tag-artificial-intelligence","9":"tag-fyxer","10":"tag-ie","11":"tag-ireland","12":"tag-technology","13":"tag-training-data","14":"tag-vision-model"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/84992","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=84992"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/84992\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/84993"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=84992"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=84992"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=84992"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}