Photo by Maxim Tolchinskiy

AI Transparency For the Greater Good

Transparency Coalition is focused on AI regulation for the greater good. We believe policy must be created, encompassing the many stakeholders AI affects. While much of our work focuses on forward-looking issues associated with generative AI (words/sentences, images, music or video content), our policy pillars also apply to predictive AI, where outputs include scores and yes/no recommendations.

We have two foundational beliefs as we advocate for meaningful policy reforms. The first is that regulation can, and must, provide a level playing field for all businesses. Systems should not be created allowing only the biggest to survive. The second: We cannot repeat the mistakes made with Social Media. Sadly, we’ve learned corporate growth and market interests will trump protecting our children and teens, consumer data privacy, and most recently copyright protection. This mistake cannot be repeated. We must act now. As we speak, we’re dealing with the consequences of lax regulation relating to Social Media.

Photo by Adi Goldstein

Here’s where we believe regulators should start, quickly.

  • There are a number of existing laws that can, and should, be applicable to the development and creation of AI systems, they simply need to be updated and enforced. This especially applies to existing Privacy and Copyright laws. Importantly, enforcement of existing laws and regulations, especially around Privacy, do not need to wait on legislative measures.

    News stories point to the strong possibility that generative AI model owners may be in violation of personal privacy laws. They need to demonstrate they are complying with applicable laws through training data audits.

  • The training data that is the critical input into any AI model determines the quality and accuracy of its output. It is the very foundation of AI systems. Given this, it’s critical AI model owners are required to have the right to use any training data being used, just like any other business would be required in any other business endeavor.

    Specifically, AI model owners must meet one of the following three criteria on all training data.:

    1) They own their training data.

    2) They have a license to use their training data, and can verify that the licensed training data was collected legally.

    3) They have an opt-in right to use personal data, including the “right to be forgotten” to meet privacy laws, or an opt-in right to other digitized information on the internet.

    Citizens did not post photos of their kids on social media expecting them to be sucked into LLMs. Individuals and companies did not provide digitized information for search indexing with the expectation that it would be used to train LLMs to enrich a handful of the largest companies in the world. AI model owners must have the right to use training data they employ, and that right must be auditable and traceable.

  • Consumers and small businesses sign up for many online services. They are presented with many pages of terms of service (ToS) already, with many opaque legal statements. If their data or searches or content created in the product is going to be used for AI model training, that fact needs to be clear with an opt-in element.

    Also, a small sentence buried in 12 pages of legalize saying “your data may be used to improve our products” should not cover AI uses. There is a big difference between tracking product bugs and use cases under that umbrella statement versus using everything a user does and everything they input or create to train an LLM. An opt-in element should be specifically required for AI model training, with a legitimate remedy besides the product being cut off should a user decline.

  • With the massive scale of training data being used in the largest generative AI models, a large scale audit mechanism must be required in order to enforce regulations. Given the global scale of billions of users accessible by the largest tech companies today, large fines on the order of percentages of revenue or other more significant penalties need to be in place for known large-scale violations of AI regulations.

    A scalable regulatory framework for audit and compliance consists of two key structures:

    Application Programming Interface (API) access

    Regulators must have API access to review training data inputs, where the API provides to an outline of files incorporated and/or personal data processed in AI model training . This format, as specified by the regulator, will be an outline of training data files similar in format to a .txt file today for efficient crawling of websites. The outline will contain the name of each training data file, size of file, keyword summary of the file, and licensing/opt-in right-to-use information at a minimum. From there, auditors can decide to spot-check a given few files out of tens or hundreds or thousands of files included in training data.

    An AI model clearinghouse.

    In order to review and approve generative AI projects, an AI model clearinghouse is needed. This model clearinghouse, where large AI projects are reviewed for approval, should not only focus on curation and quality assurance of training data for those projects above 10^24 TFLOPS outlined in the current executive order from the Biden administration, Importantly all sized models should be reviewed at varying levels of detail. The narrower and better understood the focus of the AI project, especially for very positive use cases like diagnoses in healthcare, the better curated the training data will be and hence better traceability of the AI model’s outputs.

TRANSPARENCY COALITION AI BILL TRACKER

Stay up to date on AI-related bills introduced in the 2024 state legislative session in all 50 states with our interactive map.

Contact Us

If you’re a policymaker or lawmaker and wish to contact us, please fill out this form.