企业绩效管理网

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 1354|回复: 9

Cube size shows no influence of dimension elements d ...

[复制链接]

86

主题

397

帖子

596

积分

高级会员

Rank: 4

积分
596
QQ
发表于 2014-6-19 01:36:58 | 显示全部楼层 |阅读模式
Dear all,

I am confused today as I am playing with a cube which has a very large dimension. ( Approx 500 000 elements ).
The cube in question is a staging cube that contain close to transactional information.
I understand this is a questionable design but it is not the point of my concern...
After processing several month of data in that cube, and it has reached a size close to 260 megabytes on the server hard drive.
The large dimension contains roughly 500k elements with data associated at that point of time.

The case :
I am then deleting all elements from the large dimension except Only 12 elements.
The update on the dimension is made using an XDI file upload.

No data are attached to the 12 elements remaining in the dimension in that cube.
( only a couple of attribute are populated for these but obviously they are not stored in the cube but rather in the dimension attribute cube...).

To my surprise,
on the hard drive the size of the cube does not change and remain 260 megabytes once the 500k dimension elements are deleted (except 12 with no data) even if :
- I restart the server
- I execute a save data all.

On the other end, as excpected,
the memory used by the server after the restart reflects clearly the reduction of size as it went from approximately 4.5 gigabytes used to 1.5 after deletion + restart.

The Questions :
Shouldn't the cube size on the server hard drive reflect to a certain extend the actual data volume stored in the cube ?

Could somebody please help me understand the rational behind this ?
回复

使用道具 举报

96

主题

400

帖子

617

积分

高级会员

Rank: 4

积分
617
QQ
发表于 2014-6-19 03:02:55 | 显示全部楼层
Interestingly the cube data size on the drive finally dropped after I reprocessed data into the cube, generate new elements into the dimension and did another save data.

I am not sure why in the sequence the size of the cube would not drop as soon as elements are deleted and the data saved or the server restarted.
回复 支持 反对

使用道具 举报

62

主题

411

帖子

557

积分

高级会员

Rank: 4

积分
557
QQ
发表于 2014-6-19 03:10:55 | 显示全部楼层
Olivier wrote:Interestingly the cube data size on the drive finally dropped after I reprocessed data into the cube, generate new elements into the dimension and did another save data.

I am not sure why in the sequence the size of the cube would not drop as soon as elements are deleted and the data saved or the server restarted.

I can make an educated guess; it's a data (not metadata) change that "flags" to TM1 whether it needs to re-save a particular cube. If there has been no change flagged since the last data save, then the cube will not be saved even if you do a Save All. This is a time saving feature, since it would be pointless for TM1 to re-save each and every cube regardless of whether it had changed. After all, the save process is often the primary performance bottleneck. You don't actually lose much by retaining the data in the .cub file; all that will happen upon startup is that the data that no longer has a valid element will fail to load.

The deletion of data via the deletion of elements is not (flagged as) an actual data change. (Were it otherwise, after each metadata change the server would need to somehow go through and see whether any populated cells had been lost, which would be a ridiculously time consuming task.) Consequently the cube would not be flagged for a save after you made your change, and thus it retained the same size. My bet is that had you checked the time and date of the .cub file it would have been from the save prior to you deleting the elements.
回复 支持 反对

使用道具 举报

66

主题

378

帖子

540

积分

高级会员

Rank: 4

积分
540
QQ
发表于 2014-6-19 03:57:39 | 显示全部楼层
Thanks for taking some time to comment Alan,
I was just curious to understand a bit better the behaviour...
My bet is that had you checked the time and date of the .cub file it would have been from the save prior to you deleting the elements.

Sounds a safer bet then "Americain" at the Melbourne cup

I do not recall the time stamp on the .cub file... but the save data post elementS deletion was much quicker then I expected...
so I think your guess is very accurate...

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
回复 支持 反对

使用道具 举报

70

主题

357

帖子

523

积分

高级会员

Rank: 4

积分
523
QQ
发表于 2014-6-19 04:05:25 | 显示全部楼层
Yes, I have noticed this too. It seems TM1 doesn't drop data when a dimension changes, nor does it flag that the data has changed so the cube is included in a Save Data All.

It's pretty stupid in this case, but TM1 has a bias toward speed optimization, as opposed to memory or disk consumption efficiency, so I guess that's the reason for the behaviour.
回复 支持 反对

使用道具 举报

58

主题

371

帖子

514

积分

高级会员

Rank: 4

积分
514
发表于 2014-6-19 04:09:25 | 显示全部楼层
It's pretty stupid in this case, but TM1 has a bias toward speed optimization, as opposed to memory or disk consumption efficiency

I am wondering if in this instance it is actually incurring a side effect on performance for calculations.

Assuming we understand the way the data are flagged as not having been changed as Alan guessed hence explaining the save data timing and cube size on the drive and as a result in memory.

Would that be fair to assume then that when the cube is up and running in memory,
having lost all these elements,
but not having flagged the data associated as changed,
when a view is called and calculated the consolidations calculation and or rules will face some sort of overheads ?
(due to the virtual state of old invalid data that do not have elements to sit against ?)

I will try to test against that when I have a chance.

I found that the selective mass deletion of elements is quite slow. I use an attribute to identify the set of element that have to be deleted.
To avoid impacting business, this housekeeping process is fired on friday nights when condition to trigger the chore are met ( periodic)
and my hope was to have a fresh lighter system ready to go on monday mornings.

The process executed to do this clean up is quite straight forward but timings is about 5000 seconds for 500 000 elements. ( approximate). Hence the friday scheduling.

I think one of the implication of this is that actually my system is "really" fresh and light only when the next load + save data all will have been done in that particular cube.

Again this is based on the assumption that calculation ( or cube) performance is impacted by the data not getting cleared straight after element deletion.

Note :
It is a bad design coming back to bite me...
we should have build a relational database to capture these transactional data...but you don't always do what you could/should/want...
Hopefully if performance becomes an issue for users, I can show the path to a better practise...
回复 支持 反对

使用道具 举报

71

主题

397

帖子

558

积分

高级会员

Rank: 4

积分
558
QQ
发表于 2014-6-19 04:47:54 | 显示全部楼层
I would say it's unlikely to cause a performance hit.

Internally, when a consolidated element is selected in a view, TM1 finds all leaf elements under the consolidated element to determine which values to sum. Additional data being on disk, or even in memory, wouldn't affect this process, regardless of feeder flags or cached calc values.

I'm not convinced this additional data exists in memory, just because it has not been re-saved to disk yet. Remember that the RAM TM1 reports to the OS as being "in use" is just the RAM it has reserved for use. TM1 doesn't tend to give back RAM that it has allocated (again, a bias toward speed and not efficiency), so you're unlikely to see a drop in reserved RAM, even if it is no longer used.

However, it's worth testing if you can find a way to do so, and please share your finding here, as I'm sure it would be very interesting to other members.
回复 支持 反对

使用道具 举报

81

主题

389

帖子

575

积分

高级会员

Rank: 4

积分
575
QQ
发表于 2014-6-19 04:51:31 | 显示全部楼层
Lazarus wrote:I would say it's unlikely to cause a performance hit.

Internally, when a consolidated element is selected in a view, TM1 finds all leaf elements under the consolidated element to determine which values to sum. Additional data being on disk, or even in memory, wouldn't affect this process, regardless of feeder flags or cached calc values.

I'm inclined to agree with both your reasoning and conclusions. However...
Lazarus wrote:I'm not convinced this additional data exists in memory, just because it has not been re-saved to disk yet.

I believe it does, at least in the session during which the deletion was done. (Not afterwards, though, I agree.)

I just obliterated all but 1 element in a dimension on a decent-sized (100 meg) cube, carefully watching the performance monitor stats both before and after for both the server and the cube. The stats were exactly the same, even after a data save (which, as I'd guessed, didn't update the .cub date and time). Not a byte was reduced from the cube, not a byte added to garbage. To get the reduction in the memory usage I had to re-start the server. During the load on restart none of the values relating to the missing elements could be loaded (as I'd mentioned earlier) and thus it had a much smaller memory footprint.

The one test that I didn't do (since I couldn't do both in the one set) was to make a data change in the cube after the deletion was done to see whether that sent all of the surplus memory to garbage.
回复 支持 反对

使用道具 举报

66

主题

395

帖子

544

积分

高级会员

Rank: 4

积分
544
QQ
发表于 2014-6-19 05:15:02 | 显示全部楼层
Youch, interesting result.

I wonder why saving a dimension after element deletion/s tends to be slower on bigger, more complex cubes. I always assumed TM1 was spending that time freeing up memory.

Thanks for spending the time investigating it.
回复 支持 反对

使用道具 举报

83

主题

388

帖子

565

积分

高级会员

Rank: 4

积分
565
QQ
发表于 2014-6-19 05:15:47 | 显示全部楼层
Every time you change a dimension then the rules of all cubes in which that dimension is used need to be recompiled. This recompilation can have a number of consequences, including cache invalidation, remapping of dependencies and regeneration of feeders. The larger and more complex those cubes are the more time that can take.
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|企业绩效管理网 ( 京ICP备14007298号   

GMT+8, 2023-10-3 18:52 , Processed in 0.102465 second(s), 11 queries , Memcache On.

Powered by Discuz! X3.1 Licensed

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表