On July 5, at the webinar “Data and computing infrastructure”, PhD. Vo Sy Nam (Head of Translational Biomedical Informatics, Vingroup Big Data Institute) had a presentation called “Building data infrastructure: Vision and solutions.”
On July 5, webinar titled “Data and computing infrastructure” was organized to promote the national strategy on R&D and application of artificial intelligence. At the event, there is a team of speakers who are leading experts in the field of data, AI, coming from a variety of institutes, universities, enterprises. Representatives of the Ministry of Science and Technology, as well as Australian experts also attended the webinar. As one of those speakers, PhD. Vo Sy Nam (Head of Translational Biomedical Informatics, VinBigdata) shared his vision and solutions to build data infrastructure.
Specifically, at the presentation, Dr. Nam analyzed the importance of data sharing – lessons from the Covid-19 pandemic; the current situation of building open data infrastructure, as well as the largest biomedical data management, analysis and sharing platform in Vietnam – VinGen Data Portal.
Accordingly, on data infrastructure, Dr. Nam said that data quality is key in AI research because 80% of workload is data processing. The global data explosion has also brought a number of consequences, including the era of open data.
“Data infrastructure needs a long-term plan to be built, maintained and grown. Now, data is developing exponentially, and the era of open data has begun.” Dr. Vo Sy Nam confirmed that data sharing has been implemented before, in a number of developed countries. Till now, private enterprises and academic institutions have also released large-scale, fully labeled and described open data sources. In Vietnam, a number of programs have also been conducted to combine public and private resources, academic institutions and businesses in building a national data portal.
However, the main challenge of data infrastructure is system performance because of huge demand for resources, updated and homogeneity data as well as computation. Some solutions he proposed include adapting to the open data era, building a long-term plan for even 20-30 years; ensuring the quality and integrity of the data over time, meanwhile using tools to track deviations in the data stream.
In particular, to give an example of data infrastructure, Dr. Vo Sy Nam introduced VinGen Data Portal as the largest biomedical data management and analysis platform in Vietnam. “Currently, this data portal has nearly 5 thousand GB of data, 10 computing machines, more than 1000 computing cores and the data analysis includes labeling and fine-tuning”, Dr. Nam said. It is expected that by the end of July, VinGen Data Portal system will be updated with full data of 1,000 healthy Vietnamese genomes, thus being ready for the community to access and use for serving biomedical research in Vietnam and worldwide.
PhD. Vo Sy Nam is currently Head of Translational Biomedical Informatics, Vingroup Big Data Institute. Dr. Nam and his colleagues are in charge of researching and developing large-scale biomedical data analysis and annotation systems, as well as predictive solutions for disease risk and drug side effects. Among the projects his team are working on, the largest biomedical data sharing, management and analysis system in Vietnam VinGen Data Portal (https://genome.vinbigdata.org/) was announced from December 2020. The system stores more than 1200 TeraBytes of data and nearly 5000 biological samples related to the 1000 Vietnamese genome decoding project and other application research.
If interested, you can watch the webinar here.