How AWS S3 Tables redefine data storage and analytics

Data storage is shifting from static to dynamic with Amazon Web Services Inc.’s introduction of AWS S3 Tables. This move reflects broader industry trends toward open-table formats prioritizing flexibility and interoperability.

S3 Tables enable customers to work with mutable, structured query language-like datasets, a significant change from the read-only nature of Apache Parquet files, according to Andy Warfield (pictured), vice president and distinguished engineer at AWS.

Andy Warfield, vice president and distinguished engineer at Amazon Web Services, talks about AWS S3 Tables during theCUBE’s “Cloud AWS re:Invent coverage" - 2024.

AWS’ Andy Warfield talks with theCUBE about the impact of AWS S3 Tables on data storage and advancements in scalability, metadata and disaggregation.

“On the Iceberg side, the distinction between traditional Parquet and the OTFs is that it takes what was basically read-only tables … and makes them mutable,” Warfield said. “It brings them closer to being a more conventional SQL table. That is becoming a primitive in S3, so you’ll be able to create a table bucket, we call it, create a table inside it — it gets its own endpoint. It’s a first-class resource, which means you can set policy.”

Warfield spoke with theCUBE Research’s Dave Vellante and John Furrier for theCUBE’s “Cloud AWS re:Invent coverage,” during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the impact of AWS S3 Tables on data storage, integration with AWS services and advancements in scalability, metadata and disaggregation.

A new era of integration, metadata and scalability with AWS S3 Tables

AWS S3 Tables integrates with key AWS services, such as Amazon Firehose and QuickSight, enabling streamlined data ingestion and visualization. This advancement addresses a long-standing challenge for customers who had to “cobble together” separate solutions for data reads and writes, according to Warfield.

“With the S3 Tables launch, an Iceberg client talking directly to the table can read and write today,” Warfield said. “You can take any Firehose source and pop them into Iceberg tables in S3 tables. Then, you can take QuickSight, which has also added Iceberg support. You can stand up a dashboard and start pulling that stuff out.”

Metadata management is also seeing a shift in focus. By embedding metadata directly into the table infrastructure, AWS S3 Tables enables customers to create a “journal” of data changes, offering new opportunities for discovery and artificial intelligence-driven insights, according to Warfield.

“The metadata side takes that table and turns it into a system table that we manage,” he said. “Now, as you put data into S3 … you could turn on metadata, and we will fill a table, effectively like [change data capture]: the changes, change data control, the changes into your bucket, and populate a journal of all of the changes you’ve made to the bucket in a table.”

AWS’ use of Nitro-based disaggregation for AWS S3 Tables storage introduces new levels of flexibility and resilience. This architectural shift allows AWS to decouple storage from compute, significantly boosting system scalability and efficiency, according to Warfield.

“We took a decision about four years ago to start exploring another way of doing it and doing this disaggregation thing,” he said. “We stuck Nitro in the hard drive rack. Nitro’s virtualizing the drives, and it’s doing basically nothing else. The flexibility for developers is better. We can actually change instances as the workload on the drives change.”

The industry’s focus has shifted from conversations about storage performance to discussions centered on data value and discovery, according to Warfield. This change reflects the broader movement toward making data more accessible and meaningful at scale.

“Over the past few years, a lot of those conversations have shifted to us seeing customers build data lakes, and inside organizations, pull data from different bits of the organization and build new stuff,” Warfield said. “The data conversation most recently has shifted to, ‘There’s so much data.’ And it’s never a negative. It’s like, ‘How do I get to value fast on top of all that? How do I do discovery, and how do I do understanding?’”

This shift is perhaps best exemplified by AWS’ continued transformation of S3, according to Warfield. Initially marketed in 2006 as “storage for the internet,” S3 has since evolved into a comprehensive data platform capable of supporting advanced analytics, AI workloads and a range of application development needs.

“The AWS PR … it talks about S3, and it says, ‘S3 is storage for the internet,’” Warfield said. “And just as one sentence in the thing, and that is today, still 18 years later, how the team thinks about it. We’ve always said we will go where the internet takes us in terms of storage.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s “Cloud AWS re:Invent coverage”:

Photo: SiliconANGLE

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU